Query Optimization - MCAD question.

CillyMe

Here's a MCAD question.. See if any of you wizards can help me get thru this: You need to tune the following query: SELECT P.PolicyNumber, P.IssueState, AP.Agent FROM Policy AS P JOIN AgentPolicy AS AP ON (P.PolicyNumber = AP.PolicyNumber) WHERE IssueState='NY' AND PolicyDate BETWEEN '2/1/2001' AND '3/4/2002' AND FaceAmount > 10000 They also presented to you the "Show Execution Plan" from Query Analyzer: Cluster Index Scan - 95% cost --> This tell me that the index scan is the blood sucker - ie. Should work on index --> This points to option (d) and (e) The objective is to tune the query. You have the following options: a. Rewrite the query to eliminate BETWEEN keyword and rewrite the query without it (Okay, this is wrong answer, i guess no one have trouble with this.) b. Add a HASH join hint (Reference: http://www.sql-server-performance.com/hints\_join.asp) c. Add a WITH(INDEX(0)) table hint to "Policy" table (QUESTION 3: How do you do a index hint by the way??) d. Update Statistics on "Policy" table. e. Execute DBCC DBREINDEX on "Policy" table. Choose one. My shot at it would be to eliminate (a), (b) and (c) because of my faith in "Query Optimizer" that it will do a better job than me. Now, (d) and (e) can be equally correct.And I can't pick one over the other. The model answer is (d) - but it doesn't tell you why (e) is eliminated. The question itself didn't state whether the index is fragmented, but neither did it state that the statistics is up-to-date. QUESTION 2: In addition to the question itself, is there any example in which we need to "hint" whether the query should be executed using HASH/MERGE or LOOP join? Doesn't the query optimizer do this for you and that Query Optimizer does a better job? QUESTION 3: For option (c), their explanation for rejecting the option is as follows: "The estimated execution plan shows that a clustered index scan is going to be used, not a seek. It'd be better if the query used an inedx seek instead. We can try to force an index seek with a table hint. However, we shold use the hint inedx(1), not the hint index(0)" -- What??? How do we write an index hint in a SQL query? Anyway, i thought this option should be eliminated because normally we rely on Query Optimizer and not to hardcode query hints. Thanks in advance.

Jeff Varszegi

I'm not an expert DBA, just a programmer with some decent database experience, mostly from the design side, but here's my take: Question 1: Can a clustered index be fragmented? I always thought that a clustered index was special because the physical ordering of the rows in the data pages IS the index order. It can be expensive for inserts in the middle (think of inserting in the middle of an array-- gotta shift stuff) but it means that you have fewer levels of index to wade through to reach a record, and also that the index doesn't get fragmented. Now, if the server didn't realize that the pages containing certain values were being accessed very frequently, it wouldn't always cache the correct things in memory AND it wouldn't be able to figure out the best way to go about fetching certain rows. You get the idea. Question 2: That's some freaky sh*t there. I've never used a LOOP join (and I don't even remember studying it, either), and I use SQL Server every day now to good effect. Two other SQL Server DBAs I know have always said that it's best to let the query optimizer do its thing, and I've always done that. I am studying Oracle now, and it has much richer hint support. A colleague who has Oracle experience says that hint knowledge is essential for an Oracle DBA. In SQL Server, I've always just found it most valuable to keep a clear head when constructing queries, construct indexes correctly, keep statistics up to date, etc. About the freakiest I've gotten in my quest for performance has been to use DBCC PINTABLE in a few cases, but I think that all such tricks are rendered of negligible value in a database that's well-constructed and set up on a database that has the disks set up correctly, enough RAM, etc. Question 3: You got me there. I agree with you-- I've never learned how to use index hints in SQL Server, and I don't even remember seeing anything about 'em in the docs. Regards, Jeff Varszegi

CillyMe

Apparently, there's a lot you can do tuning queries without resorting to table/query/index hints, but, there're all those exceptions and rules. But it seems like there's no one place where they list all you need to know and the necessary information is scatter all over internet. Examples of fragmented discussion: 1: http://www.sql-server-performance.com/hints\_join.asp 2: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqlmag01/html/TuningofaDifferentSort.asp Example of a more complete treatment on performance tuning in general: http://www.sql-server-performance.com/ (I think these guys do a mch better job than BOL, but then again, not complaining - BOL is already a lot better than MSDN/MFC) I still needs answer to my questions though, and no, for QUESTION 1, I didn't ask if a cluster can be fragmented. Thanks though, for taking the time.

Jeff Varszegi

CillyMe wrote: for QUESTION 1, I didn't ask if a cluster can be fragmented You were asking why they eliminated option 'e'-- I was just trying to answer! One of the main reasons you'd use DBCC DBREINDEX is to rebuild a fragmented index, but in this case that probably wouldn't help because the index can't be fragmented. I got my MCSD recently, and then since I'd used the SQL Server test as my elective, I followed up with my MCDBA. There wasn't a single question on any sort of hints on any SQL Server test that I took, for what it's worth. I wouldn't spend too much time on it unless you're just really interested. Regards, Jeff Varszegi

CillyMe

You got MCSD? Master!! Thats what I'm working at right now. Anyway, why did you say Jeff Varszegi wrote: index can't be fragmented. I mean, after a bunch of inserts/deletes, any index can get fragmented.

Jeff Varszegi

Well, I never actually got down into actually testing for myself the physical file's structural changes when inserting and deleting, but that's not my understanding. Here's what I found in Books Online when I searched for 'clustered index key': After the row with the first value is found using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent. For example, if an application frequently executes a query to retrieve records between a range of dates, a clustered index can quickly locate the row containing the beginning date, and then retrieve all adjacent rows in the table until the last date is reached. Also, this: Clustered indexes are not a good choice for: - Columns that undergo frequent changes This results in the entire row moving (because SQL Server must keep the data values of a row in physical order). This is an important consideration in high-volume transaction processing systems where data tends to be volatile. Like I said, I freely admit that I am not a complete expert on SQL Server. Anything I say should be taken with a grain of salt. :) I'm sure you'll do really well on all your tests, because I don't think that the average schmuck who takes these things is so concerned with getting everything really straight. It's nice to see. Regards, Jeff Varszegi

Jeff Varszegi

Oh, one more thing: you know about Transcender, right? I don't view their test-prep software so much as a cramming aid as an extra source of valuable information. They write paragraphs-long explanations for every right and wrong choice on their practice tests, and it's happened more than once that their explanations sparked extra explorations on my part. I think their stuff is really worth the money. Regards, Jeff Varszegi

CillyMe

Yes, I'm taking the exams now. Find it pretty useful in filling the loopholes. The most annoying part is when you bought the wrong prep package that gives wrong answers or lame explanations - lame vendors. MeasuXXXXX for instance. They offer online tutoriung along with prep packages. But you can only ask questions regarding specific question on their prep. Depending on the person you talk to, if you ask anything slightly deviate from the question itself, they'll tell you it's outside their jurisdiction. There're some good tutors who'd take you all the way through. But for the most part, their online help does not better than forum like this.

Mike Dimmick

Going through the options: a) has no effect, since that's purely syntactic convenience: SQL Server translates that to PolicyDate >= '2/1/2001' AND PolicyDate <= '3/4/2002' anyway. b) A hash might work better, but it's clear that the index isn't being used, as you say. A hash join constructs a hash table of one side of the join, which the other looks up in, whereas a loop join simply loops over each side of the result set looking for matches. c) According to http://www.sql-server-performance.com/hints_table.asp[^], WITH(INDEX(0)) forces a table scan - precisely what we're trying to avoid. That same page recommends using index names rather than index IDs, because the IDs can change if the indexes are dropped and recreated - in other words, they're fragile. d) Updates any index statistics that are out of date. SQL Server uses the statistics to determine how good an index might be for finding the requested information. e) Rebuilding the index will incidentally update the statistics, but it's a lot of work that might not be necessary (and locks everyone out of the table while it does it). Assuming that appropriate indexes already exist (and I don't know which table contains which fields), then updating statistics might work. If there are no appropriate indexes, it won't help - at least one of the tables will have to be scanned to find the rows matching the WHERE clause, then those rows joined onto the other table. The problem description doesn't say whether the index scan was caused by the join operation or by the original lookup. Answering your question 3, I looked up 'hash join' in my copy of Inside SQL Server 2000 (which I thoroughly recommend). If there isn't a good index for the join (in this case, no index on AgentPolicy.PolicyNumber), a hash join might be cheaper. However, it can only be done for an equijoin (i.e. the join condition uses '=') and can take a lot of memory. The query optimiser won't use it if it thinks it will need to do more I/O to store and retrieve the temporary hash table than it will to repeatedly retrieve the original data. This is one reason to keep hints out of your queries - they tend to be very system specific. What helps one system might hurt another.