One large table vs. several smaller tables

Joe Smith IX

Hi all, I have a question on designing the database for my project. Here is the story. I am using VC++ connecting to SQL Express 2005. I have several tables that will have between 1000-5000 new entries a day. Will the "AddNew()", "Edit(), and "Find()" functions get slower on those tables as the entries grow beyond, let say, one million? Should I create one table per year (but then I need to run queries on each table, which I prefer not to do)? One note, these tables has indexes, of course. Other questions that I have: - which one is faster: FIND/FINDFIRST or SEEK? - (VC++) Is querying "SELECT" as SQL command much faster than running Find/Seek thru _RecordsetPtr? Afik, both are done by DB engine on the server side, not by the client side, correct? Could anyone please give me some light on these? thanks.

Jim Conigliaro

If the tables are indexed, then as your table grows, the insert (and sometimes the edit) operations tend to slow down - this is because the database needs to manage the indexes during the insert. However, a million rows (on a properly indexed table) should not be a problem for a select statement. The referse holds true on a non-indexed table - inserts tend to remain fast but select operations slow down. There are a couple of design patterns that are applicable. One is to create multiple tables, one per month, one per year, whatever is necessary. This keeps the indivual table size small. You would then create a VIEW to represents a union of all your tables so your select statements always pull from a single entity. Another design pattern is to keep all of your data in a single, indexed, table but have a separate, non indexed, table that acts as an insert queue. New data gets inserted into the queue and that data is moved into the permenant table on a nightly basis (inerted into permenant table, delete from queue). You would still need a VIEW to provided a union between your queue and the permenant table, but you don't need to update the view as new tables are added.

Jim Conigliaro jconigliaro@ieee.org
http://www.jimconigliaro.com

Joe Smith IX

Thanks a lot for your input. I think I prefer your second option (using a queue), although I have no idea what a VIEW is. I am pretty newbie in DB. What is the quickest/simplest way to copy data from one table (queue) to another (master)? Do I need to use stored procedure (which I don't know yet)? Since my program will be run by multi-users in a 24/7 warehouse, I cannot do nightly transfer. So this i what I think: have an idle counter, if it hits (let's say) one hour, then that workstation start transferring one entry at a time (checking if it's still idle after each transfer). Of course I need a "setting" table in database indicating that "transferring in process" so that any other 1-hour-idle workstations won't try doing the same thing. Will this work? Thanks again for any help.

Jim Conigliaro

This isn't a trivial process. Though simple on the surface, there is a lot you need to think about. First, transfering one entry at a time will hurt performance. You would be better off ignoring the queuing concept entirely and just inserting directly to your main table. The general logic you want to follow is this: Start the transaction insert from queue into master delete from queue commit transaction syntax for inserting from one table to another is: INSERT INTO TABLE2 (COL1, COL2, COL3) SELECT COL1, COL4, COL7 FROM TABLE1 Use a lock hint to lock your tables until the transaction is completed, so for example you might have the following code: BEGIN TRANSACTION INSERT INTO TABLE2 (COL1, COL2, COL3) SELECT COL1, COL4, COL7 FROM WITH (TABLOCKX) DELETE FROM TABLE2 COMMIT For performance purposes, you may also want to drop the indexes in your main table prior to the insert and then recreate them after the insert.

Jim Conigliaro jconigliaro@ieee.org
http://www.jimconigliaro.com

Joe Smith IX

Won't it take a long time to re-create index after deleting it? After re-thinking about this, I started to wonder: how beneficial is it to use a temp queue? How slow is it to insert an entry to one-million-entry table? Do you have any approximate number? If it's still acceptable (maybe less than 10 seconds), than I'll just drop this queue idea. I will still create one table per year, though. Thanks for all your help.

Jim Conigliaro

Typically, anything greater than a few milliseconds is considered poor performance. You can easily achieve sub-10 second performance on an insert in a million entry table - assuming that your server is appropriately scaled for the job. For a million record table, you should still be able to achieve sub-second inserts. Make sure you do the math on your performance analysis. For example, if you shoot for 10 seconds in your insert time: 10 seconds * 5000 inserts/day = 50000 seconds/day spent inserting = 13.8 hours/day spent inserting records Create your table, inject a few million records into the table and then do some performance analysis. If the performance is acceptable, then you have your answer!

Jim Conigliaro jconigliaro@ieee.org
http://www.jimconigliaro.com