Will these two queries perform the same?

Colin Angus Mackay

The second query will perform faster because it will use less bandwidth to get the results to the client. The first query will return a large amount of data that is not needed. Let's say you have one million rows and each of your columns are integers (or 4 bytes on average) that means each row is 400 bytes (if you have 100 colums), which requires 400,000,000 bytes (+ some overhead for the protocol - Call it 400Mb) to get that data to the client. If you get only the columns you want that is 8 bytes per row which is 8,000,000 for all the million rows (+ some overhead for the protocall - Call it 8Mb). So, even if you are on a super-duper network 8Mb will transfer faster than 400Mb. And if you have lots of clients that is a lot of network bandwidth being saved.

Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

Mike Dimmick

Colin's answer is correct. I will also add that if you have an index on columns A and B (or containing columns A and B in the index), SQL Server will consider that a covering index for the query and only scan the index, rather than scanning the table itself. Because the index occupies less space than the table itself, this will require less I/O and return the results faster.

Stability. What an interesting concept. -- Chris Maunder

Colin Angus Mackay

Good point - I'd forgotten about that.

Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

Paul Conrad

Like Colin said about the network bandwidth, it will take less to only pull the fields that you need at the moment, and it is better for design and maintenance of the code to only pull what you need at that moment in the program. Another thing, do you need to pull the millions of rows at once?

"Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon

blakey404

thats the main point - if columns a and b are indexed you will notice a huge % difference in performance - the number of I/O's will reduce even more.

Colin Angus Mackay

blakey404 wrote:

thats the main point

It's a point. I don't see how it is the main point. The OP asked about performance for a query. One aspect is indexes, another is returning just the information that is actually needed. Both equally valid, especially as there was no filtering (WHERE clause) involved.

Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

blakey404

i would say its the main point because of how the indexes are stored. if the columns specified to be returned are both in a non-clustered index the data returned can be retrieved directly from that index, therefore not having to do x number of disk reads to return all the columns, as with a select *. try both in query analyzer and see the difference. returning index columns only from a non-clustered index is the fastest query possible, as it reduces logical reads.

Colin Angus Mackay

blakey404 wrote:

returning index columns only from a non-clustered index is the fastest query possible, as it reduces logical reads.

I don't disagree with that. I'm just saying there are other ways that can help improve performance. Indexing is not the be all and end all of performance improvement.

Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

blakey404

oh i completely agree, i was just thinking about in this particular example - where it is very possible the 2 columns required may be indexed. :o)

Joe Smith IX

I can see that the second query is definitely faster if I run the queries in SQL Management Studio (I am using SQL Express 2005). Now, the question is: if I run the query from the code (VC++) like the following pseudo-code:

CADORecordset pRs = CADORecordset(&theApp.m_pDb);
if(pRs.Open(sz, CADORecordset::openQuery))
{
while(!pRs.IsEOF())
{
pRs.GetFieldValue("A", szA);
pRs.GetFieldValue("B", szB);
... // do something with szA and szB

pRs.MoveNext();

}
}

Will the first query still use less bandwith? Isn't the data transferred during the GetFieldValue function, so the two queries perform exactly the same?

Joe Smith IX

No, I don't, as I process the data programmatically from VC++. See my reply to Colin above. I am not sure the bandwith usage is more for query #2 if queried as above, right?

Colin Angus Mackay

Joe Smith IX wrote:

Will the first query still use less bandwith? Isn't the data transferred during the GetFieldValue function, so the two queries perform exactly the same?

No, they wont. The data will still be transferred (because that's what you asked the database for) your application will just ignore the rest.

Upcoming events: * Glasgow: Mock Objects, SQL Server CLR Integration, Reporting Services, db4o, Dependency Injection with Spring ... "I wouldn't say boo to a goose. I'm not a coward, I just realise that it would be largely pointless." My website

Paul Conrad

Query two shouldn't take up as much bandwidth as query one. The extra bandwidth depends on how many fields are being fetched from the table.

"Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon