Data duplication on multiple fields
-
Hello All, We have a table which has merged data from two different flat files. The total number of rows in table is about 2 million. We would like to now compare the data in the table, since it now has data from two different file sources. We need to compare the data on 6-7 fields within the same table. What would be the an accurate option to go about this? We also have a query which self joins itself and compares the fields in concern. Thank you for your time!
-
Hello All, We have a table which has merged data from two different flat files. The total number of rows in table is about 2 million. We would like to now compare the data in the table, since it now has data from two different file sources. We need to compare the data on 6-7 fields within the same table. What would be the an accurate option to go about this? We also have a query which self joins itself and compares the fields in concern. Thank you for your time!
In which way do you need to compare the the rows? Are you looking for duplicates? Partial or fully. Are there any text fields that needs fuzzy matching?
Wrong is evil and must be defeated. - Jeff Ello
-
Hello All, We have a table which has merged data from two different flat files. The total number of rows in table is about 2 million. We would like to now compare the data in the table, since it now has data from two different file sources. We need to compare the data on 6-7 fields within the same table. What would be the an accurate option to go about this? We also have a query which self joins itself and compares the fields in concern. Thank you for your time!
add and identity column to the table concatenate the 7 fields into a key field (can use a view) group by the key field having a count > 1 You now have a list of the duplicated key values. select identity field using row_number() and partition over the key field delete anything with a row_number > 1
Never underestimate the power of human stupidity RAH
-
Hello All, We have a table which has merged data from two different flat files. The total number of rows in table is about 2 million. We would like to now compare the data in the table, since it now has data from two different file sources. We need to compare the data on 6-7 fields within the same table. What would be the an accurate option to go about this? We also have a query which self joins itself and compares the fields in concern. Thank you for your time!