Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Web Development
  3. ASP.NET
  4. Compare 160 million records

Compare 160 million records

Scheduled Pinned Locked Moved ASP.NET
databasehelptutorial
8 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R Offline
    R Offline
    RajaAhmed
    wrote on last edited by
    #1

    Hello EveryBody.... I want help. can any one plz suggest me that what can i do becoz i want to compare the value from a database field that contains about 160 millions records. for Example table contains a field called Email taht contains about 160 millions Adresses. Now when some one enters new email Address. Now i want to compare with existing records and remove them if a duplicate Entry Exists.

    M C 2 Replies Last reply
    0
    • R RajaAhmed

      Hello EveryBody.... I want help. can any one plz suggest me that what can i do becoz i want to compare the value from a database field that contains about 160 millions records. for Example table contains a field called Email taht contains about 160 millions Adresses. Now when some one enters new email Address. Now i want to compare with existing records and remove them if a duplicate Entry Exists.

      M Offline
      M Offline
      Manas Bhardwaj
      wrote on last edited by
      #2

      AhmedNisar82 wrote:

      what can i do becoz i want to compare the value from a database field that contains about 160 millions records

      Index it first.

      AhmedNisar82 wrote:

      Now i want to compare with existing records and remove them if a duplicate Entry Exists.

      Cache this information in your application memory and od the comparision from the in memory records.

      Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

      C 1 Reply Last reply
      0
      • M Manas Bhardwaj

        AhmedNisar82 wrote:

        what can i do becoz i want to compare the value from a database field that contains about 160 millions records

        Index it first.

        AhmedNisar82 wrote:

        Now i want to compare with existing records and remove them if a duplicate Entry Exists.

        Cache this information in your application memory and od the comparision from the in memory records.

        Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

        C Offline
        C Offline
        Colin Angus Mackay
        wrote on last edited by
        #3

        Manas Bhardwaj wrote:

        Cache this information in your application memory and od the comparision from the in memory records

        That is a seriously screwed up idea! You are crazy if you think that's a good idea. Let SQL Server do what SQL Server is best at and have it do the comparison rather than saturate your network with 160 million rows of data and then bring the client PC to its knees because you've just dumped a shed-load of data in it. What do you think the memory requirements are for 160 million rows? Assume an integer primary key: 4 bytes Assume each address is an average of say 32 characters (using a western alphabet) 32 bytes. 36 bytes * 160 million = 5,760,000,000 In total that is just shy of 6Gb you want to dump in RAM! On top of that you have overhead for the operating system, other applications and so on. [UPDATE] Okay, I misready what sort of address. If we assume 20 bytes for an email address, the memory requirement is less, but it is still significant. 24 * 160 million = 3.8Gb

        *Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter


        Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.

        modified on Monday, February 2, 2009 11:45 AM

        R 1 Reply Last reply
        0
        • R RajaAhmed

          Hello EveryBody.... I want help. can any one plz suggest me that what can i do becoz i want to compare the value from a database field that contains about 160 millions records. for Example table contains a field called Email taht contains about 160 millions Adresses. Now when some one enters new email Address. Now i want to compare with existing records and remove them if a duplicate Entry Exists.

          C Offline
          C Offline
          Colin Angus Mackay
          wrote on last edited by
          #4

          AhmedNisar82 wrote:

          Now i want to compare with existing records and remove them if a duplicate Entry Exists.

          If you do this you won't have to remove stuff because you can prevent the duplicate being inserted in the first place.

          IF EXISTS (SELECT * FROM MyTable WHERE Email = @newEmail)
          BEGIN
          -- Perform action based on email address already existsin
          END
          ELSE
          BEGIN
          -- Perform action based on email address not existing.
          END

          *Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter


          Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.

          R 1 Reply Last reply
          0
          • C Colin Angus Mackay

            Manas Bhardwaj wrote:

            Cache this information in your application memory and od the comparision from the in memory records

            That is a seriously screwed up idea! You are crazy if you think that's a good idea. Let SQL Server do what SQL Server is best at and have it do the comparison rather than saturate your network with 160 million rows of data and then bring the client PC to its knees because you've just dumped a shed-load of data in it. What do you think the memory requirements are for 160 million rows? Assume an integer primary key: 4 bytes Assume each address is an average of say 32 characters (using a western alphabet) 32 bytes. 36 bytes * 160 million = 5,760,000,000 In total that is just shy of 6Gb you want to dump in RAM! On top of that you have overhead for the operating system, other applications and so on. [UPDATE] Okay, I misready what sort of address. If we assume 20 bytes for an email address, the memory requirement is less, but it is still significant. 24 * 160 million = 3.8Gb

            *Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter


            Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.

            modified on Monday, February 2, 2009 11:45 AM

            R Offline
            R Offline
            RajaAhmed
            wrote on last edited by
            #5

            thnx 4 ur comments.... my application runs on server not on the client side...... So.?????????

            C 1 Reply Last reply
            0
            • C Colin Angus Mackay

              AhmedNisar82 wrote:

              Now i want to compare with existing records and remove them if a duplicate Entry Exists.

              If you do this you won't have to remove stuff because you can prevent the duplicate being inserted in the first place.

              IF EXISTS (SELECT * FROM MyTable WHERE Email = @newEmail)
              BEGIN
              -- Perform action based on email address already existsin
              END
              ELSE
              BEGIN
              -- Perform action based on email address not existing.
              END

              *Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter


              Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.

              R Offline
              R Offline
              RajaAhmed
              wrote on last edited by
              #6

              thnx Colin. thats nice and helpfull. but what can i do with the old records saved on the server. Any suggestion.

              C 1 Reply Last reply
              0
              • R RajaAhmed

                thnx 4 ur comments.... my application runs on server not on the client side...... So.?????????

                C Offline
                C Offline
                Colin Angus Mackay
                wrote on last edited by
                #7

                AhmedNisar82 wrote:

                my application runs on server not on the client side

                Be aware that "client" in the context that I used it meant any process outside of SQL Server i.e. Any client of SQL Server. The "client" could be on the same machine as SQL server, or another machine. Either way the effect of memory consumption is the same. If the process is on the same machine as SQL Server then you don't have the network bandwidth issues.

                *Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter


                Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.

                1 Reply Last reply
                0
                • R RajaAhmed

                  thnx Colin. thats nice and helpfull. but what can i do with the old records saved on the server. Any suggestion.

                  C Offline
                  C Offline
                  Colin Angus Mackay
                  wrote on last edited by
                  #8

                  You can find duplicates using something like this:

                  SELECT MAX(PrimaryKey), COUNT(*)
                  FROM MyEmailTable
                  GROUP BY EmailAddress
                  HAVING COUNT(*) >= 2

                  The above gives you a list of duplicate rows. You can then use that to delete the duplicates. e.g.

                  DELETE FROM MyEmailTable
                  WHERE PrimaryKey IN
                  (
                  SELECT MAX(PrimaryKey)
                  FROM MyEmailTable
                  GROUP BY EmailAddress
                  HAVING COUNT(*) >= 2
                  )

                  Also note that it will only remove one duplicate per email address, if there are multiple duplicates then you have to run it multiple times. NOTE: The above SQL hasn't been tried and may have some syntax errors. Always test on a development database. Do not run this on a live database until you are sure it works they way you want.

                  *Developer Day Scotland - Free community conference *Colin Angus Mackay's Blog *Latest Scottish Developers Newsletter


                  Vogon Building and Loan advise that your planet is at risk if you do not keep up repayments on any mortgage secured upon it. Remember that the force of gravity can go up as well as down.

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups