Compare 160 million records
-
Hello EveryBody.... I want help. can any one plz suggest me that what can i do becoz i want to compare the value from a database field that contains about 160 millions records. for Example table contains a field called Email taht contains about 160 millions Adresses. Now when some one enters new email Address. Now i want to compare with existing records and remove them if a duplicate Entry Exists.
-
Hello EveryBody.... I want help. can any one plz suggest me that what can i do becoz i want to compare the value from a database field that contains about 160 millions records. for Example table contains a field called Email taht contains about 160 millions Adresses. Now when some one enters new email Address. Now i want to compare with existing records and remove them if a duplicate Entry Exists.
AhmedNisar82 wrote:
what can i do becoz i want to compare the value from a database field that contains about 160 millions records
Index it first.
AhmedNisar82 wrote:
Now i want to compare with existing records and remove them if a duplicate Entry Exists.
Cache this information in your application memory and od the comparision from the in memory records.
Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.
-
AhmedNisar82 wrote:
what can i do becoz i want to compare the value from a database field that contains about 160 millions records
Index it first.
AhmedNisar82 wrote:
Now i want to compare with existing records and remove them if a duplicate Entry Exists.
Cache this information in your application memory and od the comparision from the in memory records.
Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.
Manas Bhardwaj wrote:
Cache this information in your application memory and od the comparision from the in memory records
That is a seriously screwed up idea! You are crazy if you think that's a good idea. Let SQL Server do what SQL Server is best at and have it do the comparison rather than saturate your network with 160 million rows of data and then bring the client PC to its knees because you've just dumped a shed-load of data in it. What do you think the memory requirements are for 160 million rows? Assume an integer primary key: 4 bytes Assume each address is an average of say 32 characters (using a western alphabet) 32 bytes. 36 bytes * 160 million = 5,760,000,000 In total that is just shy of 6Gb you want to dump in RAM! On top of that you have overhead for the operating system, other applications and so on. [UPDATE] Okay, I misready what sort of address. If we assume 20 bytes for an email address, the memory requirement is less, but it is still significant. 24 * 160 million = 3.8Gb
*Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter
Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.
modified on Monday, February 2, 2009 11:45 AM
-
Hello EveryBody.... I want help. can any one plz suggest me that what can i do becoz i want to compare the value from a database field that contains about 160 millions records. for Example table contains a field called Email taht contains about 160 millions Adresses. Now when some one enters new email Address. Now i want to compare with existing records and remove them if a duplicate Entry Exists.
AhmedNisar82 wrote:
Now i want to compare with existing records and remove them if a duplicate Entry Exists.
If you do this you won't have to remove stuff because you can prevent the duplicate being inserted in the first place.
IF EXISTS (SELECT * FROM MyTable WHERE Email = @newEmail)
BEGIN
-- Perform action based on email address already existsin
END
ELSE
BEGIN
-- Perform action based on email address not existing.
END*Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter
Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.
-
Manas Bhardwaj wrote:
Cache this information in your application memory and od the comparision from the in memory records
That is a seriously screwed up idea! You are crazy if you think that's a good idea. Let SQL Server do what SQL Server is best at and have it do the comparison rather than saturate your network with 160 million rows of data and then bring the client PC to its knees because you've just dumped a shed-load of data in it. What do you think the memory requirements are for 160 million rows? Assume an integer primary key: 4 bytes Assume each address is an average of say 32 characters (using a western alphabet) 32 bytes. 36 bytes * 160 million = 5,760,000,000 In total that is just shy of 6Gb you want to dump in RAM! On top of that you have overhead for the operating system, other applications and so on. [UPDATE] Okay, I misready what sort of address. If we assume 20 bytes for an email address, the memory requirement is less, but it is still significant. 24 * 160 million = 3.8Gb
*Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter
Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.
modified on Monday, February 2, 2009 11:45 AM
-
AhmedNisar82 wrote:
Now i want to compare with existing records and remove them if a duplicate Entry Exists.
If you do this you won't have to remove stuff because you can prevent the duplicate being inserted in the first place.
IF EXISTS (SELECT * FROM MyTable WHERE Email = @newEmail)
BEGIN
-- Perform action based on email address already existsin
END
ELSE
BEGIN
-- Perform action based on email address not existing.
END*Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter
Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.
-
thnx 4 ur comments.... my application runs on server not on the client side...... So.?????????
AhmedNisar82 wrote:
my application runs on server not on the client side
Be aware that "client" in the context that I used it meant any process outside of SQL Server i.e. Any client of SQL Server. The "client" could be on the same machine as SQL server, or another machine. Either way the effect of memory consumption is the same. If the process is on the same machine as SQL Server then you don't have the network bandwidth issues.
*Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter
Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.
-
thnx Colin. thats nice and helpfull. but what can i do with the old records saved on the server. Any suggestion.
You can find duplicates using something like this:
SELECT MAX(PrimaryKey), COUNT(*)
FROM MyEmailTable
GROUP BY EmailAddress
HAVING COUNT(*) >= 2The above gives you a list of duplicate rows. You can then use that to delete the duplicates. e.g.
DELETE FROM MyEmailTable
WHERE PrimaryKey IN
(
SELECT MAX(PrimaryKey)
FROM MyEmailTable
GROUP BY EmailAddress
HAVING COUNT(*) >= 2
)Also note that it will only remove one duplicate per email address, if there are multiple duplicates then you have to run it multiple times. NOTE: The above SQL hasn't been tried and may have some syntax errors. Always test on a development database. Do not run this on a live database until you are sure it works they way you want.
*Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter
Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.