Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Will these two queries perform the same?

Will these two queries perform the same?

Scheduled Pinned Locked Moved Database
databasequestion
14 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Joe Smith IX

    Hi all, Let say I have a table with 100+ columns and millions of rows, and I only need to gather all data in column A and B. Will the second query below perform faster, or no difference whatsoever? 1: SELECT * FROM MyTable 2: SELECT A,B From MyTable Thanks for any pointers.

    C Offline
    C Offline
    Colin Angus Mackay
    wrote on last edited by
    #2

    The second query will perform faster because it will use less bandwidth to get the results to the client. The first query will return a large amount of data that is not needed. Let's say you have one million rows and each of your columns are integers (or 4 bytes on average) that means each row is 400 bytes (if you have 100 colums), which requires 400,000,000 bytes (+ some overhead for the protocol - Call it 400Mb) to get that data to the client. If you get only the columns you want that is 8 bytes per row which is 8,000,000 for all the million rows (+ some overhead for the protocall - Call it 8Mb). So, even if you are on a super-duper network 8Mb will transfer faster than 400Mb. And if you have lots of clients that is a lot of network bandwidth being saved.


    Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

    J 1 Reply Last reply
    0
    • J Joe Smith IX

      Hi all, Let say I have a table with 100+ columns and millions of rows, and I only need to gather all data in column A and B. Will the second query below perform faster, or no difference whatsoever? 1: SELECT * FROM MyTable 2: SELECT A,B From MyTable Thanks for any pointers.

      M Offline
      M Offline
      Mike Dimmick
      wrote on last edited by
      #3

      Colin's answer is correct. I will also add that if you have an index on columns A and B (or containing columns A and B in the index), SQL Server will consider that a covering index for the query and only scan the index, rather than scanning the table itself. Because the index occupies less space than the table itself, this will require less I/O and return the results faster.

      Stability. What an interesting concept. -- Chris Maunder

      C 1 Reply Last reply
      0
      • M Mike Dimmick

        Colin's answer is correct. I will also add that if you have an index on columns A and B (or containing columns A and B in the index), SQL Server will consider that a covering index for the query and only scan the index, rather than scanning the table itself. Because the index occupies less space than the table itself, this will require less I/O and return the results faster.

        Stability. What an interesting concept. -- Chris Maunder

        C Offline
        C Offline
        Colin Angus Mackay
        wrote on last edited by
        #4

        Good point - I'd forgotten about that.


        Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

        B 1 Reply Last reply
        0
        • J Joe Smith IX

          Hi all, Let say I have a table with 100+ columns and millions of rows, and I only need to gather all data in column A and B. Will the second query below perform faster, or no difference whatsoever? 1: SELECT * FROM MyTable 2: SELECT A,B From MyTable Thanks for any pointers.

          P Offline
          P Offline
          Paul Conrad
          wrote on last edited by
          #5

          Like Colin said about the network bandwidth, it will take less to only pull the fields that you need at the moment, and it is better for design and maintenance of the code to only pull what you need at that moment in the program. Another thing, do you need to pull the millions of rows at once?

          "Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon

          J 1 Reply Last reply
          0
          • C Colin Angus Mackay

            Good point - I'd forgotten about that.


            Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

            B Offline
            B Offline
            blakey404
            wrote on last edited by
            #6

            thats the main point - if columns a and b are indexed you will notice a huge % difference in performance - the number of I/O's will reduce even more.

            C 1 Reply Last reply
            0
            • B blakey404

              thats the main point - if columns a and b are indexed you will notice a huge % difference in performance - the number of I/O's will reduce even more.

              C Offline
              C Offline
              Colin Angus Mackay
              wrote on last edited by
              #7

              blakey404 wrote:

              thats the main point

              It's a point. I don't see how it is the main point. The OP asked about performance for a query. One aspect is indexes, another is returning just the information that is actually needed. Both equally valid, especially as there was no filtering (WHERE clause) involved.


              Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

              B 1 Reply Last reply
              0
              • C Colin Angus Mackay

                blakey404 wrote:

                thats the main point

                It's a point. I don't see how it is the main point. The OP asked about performance for a query. One aspect is indexes, another is returning just the information that is actually needed. Both equally valid, especially as there was no filtering (WHERE clause) involved.


                Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

                B Offline
                B Offline
                blakey404
                wrote on last edited by
                #8

                i would say its the main point because of how the indexes are stored. if the columns specified to be returned are both in a non-clustered index the data returned can be retrieved directly from that index, therefore not having to do x number of disk reads to return all the columns, as with a select *. try both in query analyzer and see the difference. returning index columns only from a non-clustered index is the fastest query possible, as it reduces logical reads.

                C 1 Reply Last reply
                0
                • B blakey404

                  i would say its the main point because of how the indexes are stored. if the columns specified to be returned are both in a non-clustered index the data returned can be retrieved directly from that index, therefore not having to do x number of disk reads to return all the columns, as with a select *. try both in query analyzer and see the difference. returning index columns only from a non-clustered index is the fastest query possible, as it reduces logical reads.

                  C Offline
                  C Offline
                  Colin Angus Mackay
                  wrote on last edited by
                  #9

                  blakey404 wrote:

                  returning index columns only from a non-clustered index is the fastest query possible, as it reduces logical reads.

                  I don't disagree with that. I'm just saying there are other ways that can help improve performance. Indexing is not the be all and end all of performance improvement.


                  Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

                  B 1 Reply Last reply
                  0
                  • C Colin Angus Mackay

                    blakey404 wrote:

                    returning index columns only from a non-clustered index is the fastest query possible, as it reduces logical reads.

                    I don't disagree with that. I'm just saying there are other ways that can help improve performance. Indexing is not the be all and end all of performance improvement.


                    Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

                    B Offline
                    B Offline
                    blakey404
                    wrote on last edited by
                    #10

                    oh i completely agree, i was just thinking about in this particular example - where it is very possible the 2 columns required may be indexed. :o)

                    1 Reply Last reply
                    0
                    • C Colin Angus Mackay

                      The second query will perform faster because it will use less bandwidth to get the results to the client. The first query will return a large amount of data that is not needed. Let's say you have one million rows and each of your columns are integers (or 4 bytes on average) that means each row is 400 bytes (if you have 100 colums), which requires 400,000,000 bytes (+ some overhead for the protocol - Call it 400Mb) to get that data to the client. If you get only the columns you want that is 8 bytes per row which is 8,000,000 for all the million rows (+ some overhead for the protocall - Call it 8Mb). So, even if you are on a super-duper network 8Mb will transfer faster than 400Mb. And if you have lots of clients that is a lot of network bandwidth being saved.


                      Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

                      J Offline
                      J Offline
                      Joe Smith IX
                      wrote on last edited by
                      #11

                      I can see that the second query is definitely faster if I run the queries in SQL Management Studio (I am using SQL Express 2005). Now, the question is: if I run the query from the code (VC++) like the following pseudo-code:

                      CADORecordset pRs = CADORecordset(&theApp.m_pDb);
                      if(pRs.Open(sz, CADORecordset::openQuery))
                      {
                      while(!pRs.IsEOF())
                      {
                      pRs.GetFieldValue("A", szA);
                      pRs.GetFieldValue("B", szB);
                      ... // do something with szA and szB

                      pRs.MoveNext();
                      

                      }
                      }

                      Will the first query still use less bandwith? Isn't the data transferred during the GetFieldValue function, so the two queries perform exactly the same?

                      C 1 Reply Last reply
                      0
                      • P Paul Conrad

                        Like Colin said about the network bandwidth, it will take less to only pull the fields that you need at the moment, and it is better for design and maintenance of the code to only pull what you need at that moment in the program. Another thing, do you need to pull the millions of rows at once?

                        "Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon

                        J Offline
                        J Offline
                        Joe Smith IX
                        wrote on last edited by
                        #12

                        No, I don't, as I process the data programmatically from VC++. See my reply to Colin above. I am not sure the bandwith usage is more for query #2 if queried as above, right?

                        P 1 Reply Last reply
                        0
                        • J Joe Smith IX

                          I can see that the second query is definitely faster if I run the queries in SQL Management Studio (I am using SQL Express 2005). Now, the question is: if I run the query from the code (VC++) like the following pseudo-code:

                          CADORecordset pRs = CADORecordset(&theApp.m_pDb);
                          if(pRs.Open(sz, CADORecordset::openQuery))
                          {
                          while(!pRs.IsEOF())
                          {
                          pRs.GetFieldValue("A", szA);
                          pRs.GetFieldValue("B", szB);
                          ... // do something with szA and szB

                          pRs.MoveNext();
                          

                          }
                          }

                          Will the first query still use less bandwith? Isn't the data transferred during the GetFieldValue function, so the two queries perform exactly the same?

                          C Offline
                          C Offline
                          Colin Angus Mackay
                          wrote on last edited by
                          #13

                          Joe Smith IX wrote:

                          Will the first query still use less bandwith? Isn't the data transferred during the GetFieldValue function, so the two queries perform exactly the same?

                          No, they wont. The data will still be transferred (because that's what you asked the database for) your application will just ignore the rest.


                          Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

                          1 Reply Last reply
                          0
                          • J Joe Smith IX

                            No, I don't, as I process the data programmatically from VC++. See my reply to Colin above. I am not sure the bandwith usage is more for query #2 if queried as above, right?

                            P Offline
                            P Offline
                            Paul Conrad
                            wrote on last edited by
                            #14

                            Query two shouldn't take up as much bandwidth as query one. The extra bandwidth depends on how many fields are being fetched from the table.

                            "Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon

                            1 Reply Last reply
                            0
                            Reply
                            • Reply as topic
                            Log in to reply
                            • Oldest to Newest
                            • Newest to Oldest
                            • Most Votes


                            • Login

                            • Don't have an account? Register

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • World
                            • Users
                            • Groups