Select * versus Select field1, field2,...,fieldn
-
I've never been a SQL genius, nor have I ever been tutored in it. I'm working on an app that needs every bit of data that is in the DB to be put into datasets. Easy, I thought, just use SELECT * ... it keeps your SQL queries clean and easy, and there is no bulk because we need every field anyway. One of my co-programmers called me a retard and told me that SELECT * is a lot slower than the alternative of directly accessing the fields. I put this to the test, but I'm not sure it's representative. Mocked up a simple console app that read from a table with +100 columns and +5000 lines, filled it in a dataset, added the datasets to a list, and then printed out a count of the list (the 25's, also the number of times the test is repeated) (This is printed because I learned that in code optimizing, variables that are never accessed are discarded) Clearly without the * takes longer than just starring. What are some of your thoughts, findings, comments or experiences with this?
With * : 25 //times test is executed 00:00:10.8395254 //time it takes in seconds Without * : 25 00:00:10.3142815 With * : 25 00:00:10.1382555 Without * : 25 00:00:09.9356686
Sorry if this should've been in SQL forum, but it's not really a request for help, I just hope to get a better understanding of selects.
Visual Studio can't evaluate this, can you?
public object moo { __get { return moo; } __set { moo = value; } }
-
I've never been a SQL genius, nor have I ever been tutored in it. I'm working on an app that needs every bit of data that is in the DB to be put into datasets. Easy, I thought, just use SELECT * ... it keeps your SQL queries clean and easy, and there is no bulk because we need every field anyway. One of my co-programmers called me a retard and told me that SELECT * is a lot slower than the alternative of directly accessing the fields. I put this to the test, but I'm not sure it's representative. Mocked up a simple console app that read from a table with +100 columns and +5000 lines, filled it in a dataset, added the datasets to a list, and then printed out a count of the list (the 25's, also the number of times the test is repeated) (This is printed because I learned that in code optimizing, variables that are never accessed are discarded) Clearly without the * takes longer than just starring. What are some of your thoughts, findings, comments or experiences with this?
With * : 25 //times test is executed 00:00:10.8395254 //time it takes in seconds Without * : 25 00:00:10.3142815 With * : 25 00:00:10.1382555 Without * : 25 00:00:09.9356686
Sorry if this should've been in SQL forum, but it's not really a request for help, I just hope to get a better understanding of selects.
Visual Studio can't evaluate this, can you?
public object moo { __get { return moo; } __set { moo = value; } }
I actually can imagine that "Select *" can be a lot faster than "Select field...", simply because there is no checking involved. I've never tested that, though, and we seldomly use tables with more than 20 fields. However, there is one major advantage for the "Select field..." approach: If anyone adds or deletes columns in the table definition, "Select *" will return a different result for the same query. "Select field..." will give you an error or warning, if a field was deleted, or the same result as before, if a field was added.
Cheers, Sebastian -- Ceterum censeo, borlandem esse delendam.
-
I actually can imagine that "Select *" can be a lot faster than "Select field...", simply because there is no checking involved. I've never tested that, though, and we seldomly use tables with more than 20 fields. However, there is one major advantage for the "Select field..." approach: If anyone adds or deletes columns in the table definition, "Select *" will return a different result for the same query. "Select field..." will give you an error or warning, if a field was deleted, or the same result as before, if a field was added.
Cheers, Sebastian -- Ceterum censeo, borlandem esse delendam.
I always thought that both approaches would generate the same result through the same process. But obviously that's not the case, unless replacing * by [field1],[field2],... takes 2 seconds on an SQL server :p
Visual Studio can't evaluate this, can you?
public object moo { __get { return moo; } __set { moo = value; } }
-
I've never been a SQL genius, nor have I ever been tutored in it. I'm working on an app that needs every bit of data that is in the DB to be put into datasets. Easy, I thought, just use SELECT * ... it keeps your SQL queries clean and easy, and there is no bulk because we need every field anyway. One of my co-programmers called me a retard and told me that SELECT * is a lot slower than the alternative of directly accessing the fields. I put this to the test, but I'm not sure it's representative. Mocked up a simple console app that read from a table with +100 columns and +5000 lines, filled it in a dataset, added the datasets to a list, and then printed out a count of the list (the 25's, also the number of times the test is repeated) (This is printed because I learned that in code optimizing, variables that are never accessed are discarded) Clearly without the * takes longer than just starring. What are some of your thoughts, findings, comments or experiences with this?
With * : 25 //times test is executed 00:00:10.8395254 //time it takes in seconds Without * : 25 00:00:10.3142815 With * : 25 00:00:10.1382555 Without * : 25 00:00:09.9356686
Sorry if this should've been in SQL forum, but it's not really a request for help, I just hope to get a better understanding of selects.
Visual Studio can't evaluate this, can you?
public object moo { __get { return moo; } __set { moo = value; } }
joon vh. wrote:
One of my co-programmers called me a retard and told me that SELECT * is a lot slower than the alternative of directly accessing the fields.
There is a historical basis to this, and is quite closely tied into older 4GL languages. Many years ago, I wrote a paper on optimizing Informix 4GL and looked into SELECT * versus SELECT field... Essentially, the SELECT * approach involved a lot of roundtripping in the code to identify the columns that were present in the table. SELECT field was different because it allowed the compiler to generate the underlying E-SQL far more efficiently. Now, a lot of database systems and languages used this approach hence the speed comment with regards for the SELECT. (BTW - changing the order of columns in Informix or adding new columns would require recompiling the code because the underlying bindings had changed). However, there is another more relevant issue here which relates to programmer laziness. Suppose that I want to retrieve 2 columns out of one of your 100+ column tables; I could write SELECT * and get the lot or I could write SELECT field1, field2 and just get the fields that I want. Clearly the first approach is more convenient for me when I'm feeling lazy, but the second is actually better. Quite simply, if I only need to use 2 columns it is better for me to explicitly retrieve the 2 columns because I will be moving less traffic over the network. With large tables, this becomes much more important. While it is tempting to use a dataset to hold all the columns, you should really resist this. Only retrieve the information that you actually need. BTW - this is also a problem with code generators like CodeSmith. Even though you can avoid the SELECT * trap, you still end up with classes that retrieve all of the columns even though you only need 2.
Deja View - the feeling that you've seen this post before.
-
joon vh. wrote:
One of my co-programmers called me a retard and told me that SELECT * is a lot slower than the alternative of directly accessing the fields.
There is a historical basis to this, and is quite closely tied into older 4GL languages. Many years ago, I wrote a paper on optimizing Informix 4GL and looked into SELECT * versus SELECT field... Essentially, the SELECT * approach involved a lot of roundtripping in the code to identify the columns that were present in the table. SELECT field was different because it allowed the compiler to generate the underlying E-SQL far more efficiently. Now, a lot of database systems and languages used this approach hence the speed comment with regards for the SELECT. (BTW - changing the order of columns in Informix or adding new columns would require recompiling the code because the underlying bindings had changed). However, there is another more relevant issue here which relates to programmer laziness. Suppose that I want to retrieve 2 columns out of one of your 100+ column tables; I could write SELECT * and get the lot or I could write SELECT field1, field2 and just get the fields that I want. Clearly the first approach is more convenient for me when I'm feeling lazy, but the second is actually better. Quite simply, if I only need to use 2 columns it is better for me to explicitly retrieve the 2 columns because I will be moving less traffic over the network. With large tables, this becomes much more important. While it is tempting to use a dataset to hold all the columns, you should really resist this. Only retrieve the information that you actually need. BTW - this is also a problem with code generators like CodeSmith. Even though you can avoid the SELECT * trap, you still end up with classes that retrieve all of the columns even though you only need 2.
Deja View - the feeling that you've seen this post before.
thanks for your detailed historical reply :) I love CS history. I knew the part about the laziness, that's why I explicitely said in my original post that I was actually in need of all the values, or if I didn't it was Shirley my intention. I had a program that generated 16 MB of traffic at each boot, and when I removed/optimized all of the SELECT *'s it was reduced by 50%. You think it's safe to say that even today, MSSQL still works by the same principle, and it would generally take longer to use the stars?
Visual Studio can't evaluate this, can you?
public object moo { __get { return moo; } __set { moo = value; } }
-
I've never been a SQL genius, nor have I ever been tutored in it. I'm working on an app that needs every bit of data that is in the DB to be put into datasets. Easy, I thought, just use SELECT * ... it keeps your SQL queries clean and easy, and there is no bulk because we need every field anyway. One of my co-programmers called me a retard and told me that SELECT * is a lot slower than the alternative of directly accessing the fields. I put this to the test, but I'm not sure it's representative. Mocked up a simple console app that read from a table with +100 columns and +5000 lines, filled it in a dataset, added the datasets to a list, and then printed out a count of the list (the 25's, also the number of times the test is repeated) (This is printed because I learned that in code optimizing, variables that are never accessed are discarded) Clearly without the * takes longer than just starring. What are some of your thoughts, findings, comments or experiences with this?
With * : 25 //times test is executed 00:00:10.8395254 //time it takes in seconds Without * : 25 00:00:10.3142815 With * : 25 00:00:10.1382555 Without * : 25 00:00:09.9356686
Sorry if this should've been in SQL forum, but it's not really a request for help, I just hope to get a better understanding of selects.
Visual Studio can't evaluate this, can you?
public object moo { __get { return moo; } __set { moo = value; } }
My understanding says that if your tables are properly indexed proper constrains have been set then fetching some columns rather then all the columns of the table is much more faster. Also if you have some grouping clause then it is just some more over head ...
Regards, Jaiprakash M Bankolli jaiprakash.bankolli@gmail.com