Data duplication on multiple fields

VK19

Hello All, We have a table which has merged data from two different flat files. The total number of rows in table is about 2 million. We would like to now compare the data in the table, since it now has data from two different file sources. We need to compare the data on 6-7 fields within the same table. What would be the an accurate option to go about this? We also have a query which self joins itself and compares the fields in concern. Thank you for your time!

Jorgen Andersson

In which way do you need to compare the the rows? Are you looking for duplicates? Partial or fully. Are there any text fields that needs fuzzy matching?

Wrong is evil and must be defeated. - Jeff Ello

Mycroft Holmes

add and identity column to the table concatenate the 7 fields into a key field (can use a view) group by the key field having a count > 1 You now have a list of the duplicated key values. select identity field using row_number() and partition over the key field delete anything with a row_number > 1

Never underestimate the power of human stupidity RAH

jonyj

What RDBMS are you using? You could do something like

SELECT DISTINCT (your 7 columns) FROM table

to get the unique values if that's what you're looking for.