Reporting against a volatile, high traffic high availability system
-
Ive never found a decent solution to this problem, and as its coming up again I thought i'd throw this out to the wider community. Im dealing with a database (SQL2005) which is: 1) Quite Large and growing rapidly 2) High traffic 24/7 As is often the case, one of the things that is required is reporting on various aspects of the data. However, as we know any reads (especially for the purpose of reports) will adversly affect the performance of the live activity going on. In the past I have solved this mainly 1 way, and that is to on a scheduled basis to take the delta changes in the tables im interested in and write that data to another source and only report against this "achived" data. The problem with this is that you dont get "live reports" you get reports on data older than x hours (where x is your scheduled interval). I am in a situation where reporting on Live data is pretty essential, but keeping the database as performant as possible is just as essential. To me these 2 requirements seem at loggerheads with each other. I obviously cant report on the live tables, as the overall application would slow to a halt. I also cant archive data on a timely basis because then the reports are not live. Any thoughts? Is there some new-fangled sql tool that solves this problem? Have you faced a similar problem and found a suitable solution? Is there a gap in my sql knowoledge on this subject?
-
Ive never found a decent solution to this problem, and as its coming up again I thought i'd throw this out to the wider community. Im dealing with a database (SQL2005) which is: 1) Quite Large and growing rapidly 2) High traffic 24/7 As is often the case, one of the things that is required is reporting on various aspects of the data. However, as we know any reads (especially for the purpose of reports) will adversly affect the performance of the live activity going on. In the past I have solved this mainly 1 way, and that is to on a scheduled basis to take the delta changes in the tables im interested in and write that data to another source and only report against this "achived" data. The problem with this is that you dont get "live reports" you get reports on data older than x hours (where x is your scheduled interval). I am in a situation where reporting on Live data is pretty essential, but keeping the database as performant as possible is just as essential. To me these 2 requirements seem at loggerheads with each other. I obviously cant report on the live tables, as the overall application would slow to a halt. I also cant archive data on a timely basis because then the reports are not live. Any thoughts? Is there some new-fangled sql tool that solves this problem? Have you faced a similar problem and found a suitable solution? Is there a gap in my sql knowoledge on this subject?
there's no single tool to solve your problem that i'm aware of, but in large high traffic databases everything comes into play. Index's clustered and otherwise, foreign keys, extra joins or operations, having conditionals in the where instead of part of the join can affect performance of a query or SP, even the physical layout of the data on the hard drive(s) come into play. I recently had to insert 400 million records into a database that wasnt large enough and it took forever waiting for the database to insert, grow, insert more, grow...so recreated the DB(not usually an option) to the correct size greatly improved performance of the inserts and any query run against them since it was all contigious data on the disk. With large amounts of inserts and updates to a table its possible any index could be corrupted and the DB is trying to re organize it each time, you may also have the wrong kind of indexes or too many of them on a table. you could run your query in query analyzer or parts of your query and look at the execution plan or the estimated execution plan to see what it taking up your time and try to optimize that section. after a few weeks of trying and some changes we can now search out 1 record in that 400 million, and tell you various things about it (no its not just one big table with everything it joins to several other tables with 1M+ records in them using a 7 or 10 char string) in less than a second. btw we're using sql server 2k. it worked almost as fast on our old 2 proc server..now we have dual quad cores. you may have to upgrade your hardware, it may not solve the problem but it may make life a little better. without knowing your database or the data you are trying to report on i doubt anyone would be able to help you more than general pointers, if you have specific questions about queries or how to optimize tables feel free to ask.
Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.
-
Ive never found a decent solution to this problem, and as its coming up again I thought i'd throw this out to the wider community. Im dealing with a database (SQL2005) which is: 1) Quite Large and growing rapidly 2) High traffic 24/7 As is often the case, one of the things that is required is reporting on various aspects of the data. However, as we know any reads (especially for the purpose of reports) will adversly affect the performance of the live activity going on. In the past I have solved this mainly 1 way, and that is to on a scheduled basis to take the delta changes in the tables im interested in and write that data to another source and only report against this "achived" data. The problem with this is that you dont get "live reports" you get reports on data older than x hours (where x is your scheduled interval). I am in a situation where reporting on Live data is pretty essential, but keeping the database as performant as possible is just as essential. To me these 2 requirements seem at loggerheads with each other. I obviously cant report on the live tables, as the overall application would slow to a halt. I also cant archive data on a timely basis because then the reports are not live. Any thoughts? Is there some new-fangled sql tool that solves this problem? Have you faced a similar problem and found a suitable solution? Is there a gap in my sql knowoledge on this subject?
A common solution is to use replication to maintain a reporting version of the database. Obviously is not quite real time, but its usually near enough. It also allows for some reindexing/denormailisation to aid reporting performance. Hope this is of some interest.
Bob Ashfield Consultants Ltd
-
A common solution is to use replication to maintain a reporting version of the database. Obviously is not quite real time, but its usually near enough. It also allows for some reindexing/denormailisation to aid reporting performance. Hope this is of some interest.
Bob Ashfield Consultants Ltd
Thanks Bob, yes it very much is of some interest. I know SQL Server very well indeed, that is except for replication which ive not honestly come into contact with. I had a feeling it was relevant to this situation so I started reading up last night. You have just confirmed that my hunch was at least along the right lines. Thanks again.
-
there's no single tool to solve your problem that i'm aware of, but in large high traffic databases everything comes into play. Index's clustered and otherwise, foreign keys, extra joins or operations, having conditionals in the where instead of part of the join can affect performance of a query or SP, even the physical layout of the data on the hard drive(s) come into play. I recently had to insert 400 million records into a database that wasnt large enough and it took forever waiting for the database to insert, grow, insert more, grow...so recreated the DB(not usually an option) to the correct size greatly improved performance of the inserts and any query run against them since it was all contigious data on the disk. With large amounts of inserts and updates to a table its possible any index could be corrupted and the DB is trying to re organize it each time, you may also have the wrong kind of indexes or too many of them on a table. you could run your query in query analyzer or parts of your query and look at the execution plan or the estimated execution plan to see what it taking up your time and try to optimize that section. after a few weeks of trying and some changes we can now search out 1 record in that 400 million, and tell you various things about it (no its not just one big table with everything it joins to several other tables with 1M+ records in them using a 7 or 10 char string) in less than a second. btw we're using sql server 2k. it worked almost as fast on our old 2 proc server..now we have dual quad cores. you may have to upgrade your hardware, it may not solve the problem but it may make life a little better. without knowing your database or the data you are trying to report on i doubt anyone would be able to help you more than general pointers, if you have specific questions about queries or how to optimize tables feel free to ask.
Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.
-
A common solution is to use replication to maintain a reporting version of the database. Obviously is not quite real time, but its usually near enough. It also allows for some reindexing/denormailisation to aid reporting performance. Hope this is of some interest.
Bob Ashfield Consultants Ltd
Bob, any idea what a reasonably lag time is for the replicated system (presumably configurable but guidlines?)
Never underestimate the power of human stupidity RAH
-
Bob, any idea what a reasonably lag time is for the replicated system (presumably configurable but guidlines?)
Never underestimate the power of human stupidity RAH
Naturally, it depends on transaction volumes and how important it is, but normally for a reporting database I would expect between 1 and 5 minutes, as opposed to a hot standby where it should be as close as possible to real time. This msdn article gives a good overview of the overheads etc.
Bob Ashfield Consultants Ltd
-
Ive never found a decent solution to this problem, and as its coming up again I thought i'd throw this out to the wider community. Im dealing with a database (SQL2005) which is: 1) Quite Large and growing rapidly 2) High traffic 24/7 As is often the case, one of the things that is required is reporting on various aspects of the data. However, as we know any reads (especially for the purpose of reports) will adversly affect the performance of the live activity going on. In the past I have solved this mainly 1 way, and that is to on a scheduled basis to take the delta changes in the tables im interested in and write that data to another source and only report against this "achived" data. The problem with this is that you dont get "live reports" you get reports on data older than x hours (where x is your scheduled interval). I am in a situation where reporting on Live data is pretty essential, but keeping the database as performant as possible is just as essential. To me these 2 requirements seem at loggerheads with each other. I obviously cant report on the live tables, as the overall application would slow to a halt. I also cant archive data on a timely basis because then the reports are not live. Any thoughts? Is there some new-fangled sql tool that solves this problem? Have you faced a similar problem and found a suitable solution? Is there a gap in my sql knowoledge on this subject?
As Bob said replication could be one solution. Especially transactional replication in your case. Another option could be log shipping. With log shipping you may have to redesign your backup strategy, but on the other hand you gain a standby database, which can be used for reporting.
The need to optimize rises from a bad design