Selective queries?

User 4483848

With SQL people often say you shouldn't do 'SELECT *'. I tend to write highly optimized and selective queries. Do others do selective queries or do you think this is not necessary? The benefit will obviously vary depending on the size and usage of the table. I'm considering simplifying my architecture by doing all of the SELECTing on the web server. There will always be special cases eg. massive tables, but everything is fairly small in this particular application. There are two main benefits I can see from selective queries, less data is transferred and covered indexes can be much smaller. I'm not sure that the amount of data makes much difference when we'll only be getting one screen (eg. 10-50 records) of data at a time. It would be great to hear what others do and think.

Lost User

You are doing it the right way.

PIEBALDconsult

If you want all the columns, then I see no problem with using *. The primary issue is when you use * even when you want only a few columns, and some of the unneeded columns contain large data. This can also happen when a new column is added. Additionally, there may be times when a column is removed or renamed -- this will likely cause a problem, but do you want the problem to be reported when the data is queried or farther downstream? Early detection is probably better. Someone here (other than me) wrote a good rant against * some years back, but I'm having trouble finding it. Edit: SQL Server DO's and DONT's[^] SQL Wizardry Part 2 - Select, beyond the basics[^]

You'll never get very far if all you do is follow instructions.

Tim Carmichael

To add to the other comments, using 'select *' can also cause issues in columns are added or the order rearranged. If your application is expecting data in a particular column, then it may not be there; if your application is not expecting the columns that have been added, why bother spending the effort to retrieve the data and parse out the unwanted column?

Lost User

Member 4487083 wrote:

With SQL people often say you shouldn't do 'SELECT *'.

I also tell people not to run their stupid queries without starting a transaction that can be safely rolled back. Again, you DON'T do a SELECT *. It's not that you save a lot by omitting a DateTime column - but it would prevent that blob-field of 2Gb each that was added last month to be pulled over the network with each and every friggin' request, killing the network and the database-server. Or a nice calculated field that cripples the DB-server. It doesn't take much time, and makes the application a bit more robust. Makes it easier for me to debug when I get thrown in your team as a maintenance-programmer. Yes, it takes extra time, but it has a good ROI. It's not a religious thing - I won't go medievel if you do a simple "SELECT * FROM". Still, if you do it in a query that contains several joins you'll get this lecture, as each extra table means another chance at pulling columns you don't need.

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^]

jschell

Member 4487083 wrote:

There will always be special cases eg. massive tables, but everything is fairly small in this particular application.

If there were in fact many columns and you only want a couple then you should do that regardless of any other consideration. However I do it because there is no guarantee to ordering with '*' and that can matter in a variety of ways such as when using external APIs, dumping data, etc.

Jorgen Andersson

To add to the other comments. If you use named columns the possibility of using a covering index increases. Here[^] is some recommended reading on the subject.

Wrong is evil and must be defeated. - Jeff Ello[^]

User 4483848

Eddy Vluggen wrote:

Again, you DON'T do a SELECT *. It's not that you save a lot by omitting a DateTime column - but it would prevent that blob-field of 2Gb each that was added last month to be pulled over the network with each and every friggin' request

That wouldn't be an issue in my case. I would never store that kind of stuff in this particular database as it's in Azure and would cost me a a lot of money. Something large would go in a blob or somewhere else.

User 4483848

jschell wrote:

However I do it because there is no guarantee to ordering with '*' and that can matter in a variety of ways such as when using external APIs, dumping data, etc.

I actually use an ORM so it doesn't use * but lists every column (same thing from a coding point of view).

Tim Carmichael

Just remembered why else I don't like 'Select *'. I had to debug a stored procedure that was failing on a union because the writer had a 'Select *' on a table from a database unioned with a 'Select *' on an archival version of the table from another database. When the vendor the system was purchased from updated the production table to add a column, the union blew up because the tables didn't match, and, the added column was not necessary for the archival copy.

Mycroft Holmes

Like you I use and ORM (I think all of us do), however I never use *, always explicitly list the columns, just because it is good discipline :-O . I also work with small datasets and almost never store blob/binary.

Never underestimate the power of human stupidity RAH

User 4483848

The comments have been useful, and pretty much match my thoughts. I was starting to think that maybe my ideas were a little out of date. NOSQL is moving more towards denormalization, which goes against a lot of what we do with relational databases. At the time of writing the post, I hadn't put much thought into joins which will have the biggest problems. I'm looking at some other ways to simplify my architecture without sacrificing best practices and performance.

User 4483848

I've been looking at repository pattern implementations (using .Net and EF). Many of them aren't selective. One of the most common methods is GetByID. This gets the whole record. Is this really bad? It will return one record, and in most systems it'll use the primary key. One of the most shocking things I have seen is List GetAll(). This returns ALL records in a table. This is clearly going to become an issue in large systems. Another way some systems work is by using DDD and having aggregates return all related data. This tends to make the code simple, but I have doubts about scalability. Has anybody got experience with a system like this?