Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Data duplication on multiple fields

Data duplication on multiple fields

Scheduled Pinned Locked Moved Database
4 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V Offline
    V Offline
    VK19
    wrote on last edited by
    #1

    Hello All, We have a table which has merged data from two different flat files. The total number of rows in table is about 2 million. We would like to now compare the data in the table, since it now has data from two different file sources. We need to compare the data on 6-7 fields within the same table. What would be the an accurate option to go about this? We also have a query which self joins itself and compares the fields in concern. Thank you for your time!

    J M J 3 Replies Last reply
    0
    • V VK19

      Hello All, We have a table which has merged data from two different flat files. The total number of rows in table is about 2 million. We would like to now compare the data in the table, since it now has data from two different file sources. We need to compare the data on 6-7 fields within the same table. What would be the an accurate option to go about this? We also have a query which self joins itself and compares the fields in concern. Thank you for your time!

      J Offline
      J Offline
      Jorgen Andersson
      wrote on last edited by
      #2

      In which way do you need to compare the the rows? Are you looking for duplicates? Partial or fully. Are there any text fields that needs fuzzy matching?

      Wrong is evil and must be defeated. - Jeff Ello

      1 Reply Last reply
      0
      • V VK19

        Hello All, We have a table which has merged data from two different flat files. The total number of rows in table is about 2 million. We would like to now compare the data in the table, since it now has data from two different file sources. We need to compare the data on 6-7 fields within the same table. What would be the an accurate option to go about this? We also have a query which self joins itself and compares the fields in concern. Thank you for your time!

        M Offline
        M Offline
        Mycroft Holmes
        wrote on last edited by
        #3

        add and identity column to the table concatenate the 7 fields into a key field (can use a view) group by the key field having a count > 1 You now have a list of the duplicated key values. select identity field using row_number() and partition over the key field delete anything with a row_number > 1

        Never underestimate the power of human stupidity RAH

        1 Reply Last reply
        0
        • V VK19

          Hello All, We have a table which has merged data from two different flat files. The total number of rows in table is about 2 million. We would like to now compare the data in the table, since it now has data from two different file sources. We need to compare the data on 6-7 fields within the same table. What would be the an accurate option to go about this? We also have a query which self joins itself and compares the fields in concern. Thank you for your time!

          J Offline
          J Offline
          jonyj
          wrote on last edited by
          #4

          What RDBMS are you using? You could do something like

          SELECT DISTINCT (your 7 columns) FROM table

          to get the unique values if that's what you're looking for.

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups