Search Query: My Boss is a Genius
-
We manage an application that searches an internal database. It's slow... takes about 3+ seconds to search. The reason is that there are tons of records and our search query is basically this:
SELECT * FROM SomeTable WHERE Field LIKE '%Something%'
That "LIKE" (with the initial "%") prevents SQL from using indexes for that field (which leads to a full table scan, meaning hundreds of thousands of records must be loaded). To fix that, I have the brilliant idea to create a new table and split the field across 72 columns (the current max length of the field). Each column would store the field with one less leading character than the previous column. The new query would look something like this:
SELECT * FROM SomeNewTable WHERE Field1 LIKE 'Something%' OR Field2 LIKE 'Something%' ... OR Field72 LIKE 'Something%'
That's a lot of data, but SQL Server 2005 has a limit of 200+ indexes on any table, so I figure fair trade. However, after showing this to my boss, he comes up with the idea to instead store a new column per word in the search string rather than per character. So we might only have 8 columns instead of 72:
SELECT * FROM SomeNewTable WHERE Field1 LIKE 'Something%' OR Field2 LIKE 'Something%' ... OR Field8 LIKE 'Something%'
I thought that sounded pretty good, except I thought then auto-complete wouldn't work until a full word was typed. However, thinking about it some more, auto-complete will work, considering people typically type words starting with the first letter (e.g., people will generally type "something" starting with "some" rather than starting with "thing"). My boss just prevented me from creating a horrific table. He's a genius. Now I just need to consider a few niggling options: different tables for clustered indexes, a single column with each word rather than split-up columns, combinations of sequential words, and SQL Server full text indexing (not even sure if that would apply to this situation, but I'll soon see). Seems I have some design and experimenting ahead of me. :)
-
We manage an application that searches an internal database. It's slow... takes about 3+ seconds to search. The reason is that there are tons of records and our search query is basically this:
SELECT * FROM SomeTable WHERE Field LIKE '%Something%'
That "LIKE" (with the initial "%") prevents SQL from using indexes for that field (which leads to a full table scan, meaning hundreds of thousands of records must be loaded). To fix that, I have the brilliant idea to create a new table and split the field across 72 columns (the current max length of the field). Each column would store the field with one less leading character than the previous column. The new query would look something like this:
SELECT * FROM SomeNewTable WHERE Field1 LIKE 'Something%' OR Field2 LIKE 'Something%' ... OR Field72 LIKE 'Something%'
That's a lot of data, but SQL Server 2005 has a limit of 200+ indexes on any table, so I figure fair trade. However, after showing this to my boss, he comes up with the idea to instead store a new column per word in the search string rather than per character. So we might only have 8 columns instead of 72:
SELECT * FROM SomeNewTable WHERE Field1 LIKE 'Something%' OR Field2 LIKE 'Something%' ... OR Field8 LIKE 'Something%'
I thought that sounded pretty good, except I thought then auto-complete wouldn't work until a full word was typed. However, thinking about it some more, auto-complete will work, considering people typically type words starting with the first letter (e.g., people will generally type "something" starting with "some" rather than starting with "thing"). My boss just prevented me from creating a horrific table. He's a genius. Now I just need to consider a few niggling options: different tables for clustered indexes, a single column with each word rather than split-up columns, combinations of sequential words, and SQL Server full text indexing (not even sure if that would apply to this situation, but I'll soon see). Seems I have some design and experimenting ahead of me. :)
Thinking about this some more, I think I'm going to choose a combination of the above ideas. So the search query will be:
SELECT * FROM SomeNewTable WHERE Field LIKE 'Something%'
However, instead of containing 1 record for "ABC", it will contain 3 records ("ABC", "BC", and "C"). That's potentially a lot of data. However, since the max length is known to be 72 characters, it's not really a big deal. Fast and simple. :)
-
We manage an application that searches an internal database. It's slow... takes about 3+ seconds to search. The reason is that there are tons of records and our search query is basically this:
SELECT * FROM SomeTable WHERE Field LIKE '%Something%'
That "LIKE" (with the initial "%") prevents SQL from using indexes for that field (which leads to a full table scan, meaning hundreds of thousands of records must be loaded). To fix that, I have the brilliant idea to create a new table and split the field across 72 columns (the current max length of the field). Each column would store the field with one less leading character than the previous column. The new query would look something like this:
SELECT * FROM SomeNewTable WHERE Field1 LIKE 'Something%' OR Field2 LIKE 'Something%' ... OR Field72 LIKE 'Something%'
That's a lot of data, but SQL Server 2005 has a limit of 200+ indexes on any table, so I figure fair trade. However, after showing this to my boss, he comes up with the idea to instead store a new column per word in the search string rather than per character. So we might only have 8 columns instead of 72:
SELECT * FROM SomeNewTable WHERE Field1 LIKE 'Something%' OR Field2 LIKE 'Something%' ... OR Field8 LIKE 'Something%'
I thought that sounded pretty good, except I thought then auto-complete wouldn't work until a full word was typed. However, thinking about it some more, auto-complete will work, considering people typically type words starting with the first letter (e.g., people will generally type "something" starting with "some" rather than starting with "thing"). My boss just prevented me from creating a horrific table. He's a genius. Now I just need to consider a few niggling options: different tables for clustered indexes, a single column with each word rather than split-up columns, combinations of sequential words, and SQL Server full text indexing (not even sure if that would apply to this situation, but I'll soon see). Seems I have some design and experimenting ahead of me. :)
Assuming you're talking real words in that field, rather than some codification, then I would think looking at full text indexing might be worth a shot.
MVVM# - See how I did MVVM my way ___________________________________________ Man, you're a god. - walterhevedeich 26/05/2011 .\\axxx (That's an 'M')
-
Assuming you're talking real words in that field, rather than some codification, then I would think looking at full text indexing might be worth a shot.
MVVM# - See how I did MVVM my way ___________________________________________ Man, you're a god. - walterhevedeich 26/05/2011 .\\axxx (That's an 'M')
It was a bit of a simplification. Some are real words, some are codes (without spaces), and often people will type "word1 word2" expecting a result that contains those words in that order. And there are some special characters that they may or may not type in, so I'll probably also have alternate versions without those characters they can search too. I'll probably take the custom approach rather than the full text indexing, as I'm not sure it would be able to handle the fringe scenarios we want to support. But yeah. Sure am glad I didn't resort to a table with 72 columns (or worse, 72 tables). :)