Best practice in large tables
-
Hello all :) This may seem a simple question, but I'm not looking for a solution, I'm looking for a best practice. I have a SQLTable that contains 293 columns, and may even contain thousands of rows. Such a massive table could be a source of performance issues. My question is: Is it satisfactory to keep the table this way, or should it be better if I divide the table on more than one tables, and link them using Foreign Key constraints? For example each Table containing 50 columns or so? Waiting eagerly for your expertise concerning this issue.
Regards:rose:
-
Hello all :) This may seem a simple question, but I'm not looking for a solution, I'm looking for a best practice. I have a SQLTable that contains 293 columns, and may even contain thousands of rows. Such a massive table could be a source of performance issues. My question is: Is it satisfactory to keep the table this way, or should it be better if I divide the table on more than one tables, and link them using Foreign Key constraints? For example each Table containing 50 columns or so? Waiting eagerly for your expertise concerning this issue.
Regards:rose:
Nader Elshehabi wrote:
I have a SQLTable that contains 293 columns, and may even contain thousands of rows. Such a massive table could be a source of performance issues.
Absolutely - But, it really depends on how you want to use the table.
Nader Elshehabi wrote:
Is it satisfactory to keep the table this way, or should it be better if I divide the table on more than one tables, and link them using Foreign Key constraints?
To me that suggests that the table is not normalised. All data in the table must be dependent on the primary key. If you have repeating data in a table that suggests that a separate table needs to be created. For example: A table Family: MotherName, FatherName, Child1Name, Child2Name, Child3Name, Child4Name This has some limitations. First a maximum of 4 children are permitted. It assumes everyone lives in a perfect home and all siblings share the same parents. So, if you have repeating columns (often characterised by the need to include a digit) then that needs to be separated out into a new table.
Nader Elshehabi wrote:
For example each Table containing 50 columns or so?
You should search for useful relationships rather than splitting at an arbitrary number of columns. By splitting at every 50th column you run the risk of splitting related data (or data that is often used together) into separate tables which will slow down any query. If there is no way to normalise the data then consider what groups of columns tend to be used together. Then split it that way ensuring that each table maintains a one-to-one relationship with the others. This will cause some issues during an insert where some rows won't exist yet. So, design it to accept a zero-to-one relationship - that way missing rows in other tables don't affect the whole. You might also find that there a groups of columns where if one column isn't filled, neither are others in the group. You can then use this knowledge to eliminate the need to store a row where no information is used.
Upcoming events: * Edinburgh: Web Security Conference Day for Windows Developers (12th April) * Glasgow: Introduction to AJAX (2nd May), SQL Server, Mock Objects My
-
Nader Elshehabi wrote:
I have a SQLTable that contains 293 columns, and may even contain thousands of rows. Such a massive table could be a source of performance issues.
Absolutely - But, it really depends on how you want to use the table.
Nader Elshehabi wrote:
Is it satisfactory to keep the table this way, or should it be better if I divide the table on more than one tables, and link them using Foreign Key constraints?
To me that suggests that the table is not normalised. All data in the table must be dependent on the primary key. If you have repeating data in a table that suggests that a separate table needs to be created. For example: A table Family: MotherName, FatherName, Child1Name, Child2Name, Child3Name, Child4Name This has some limitations. First a maximum of 4 children are permitted. It assumes everyone lives in a perfect home and all siblings share the same parents. So, if you have repeating columns (often characterised by the need to include a digit) then that needs to be separated out into a new table.
Nader Elshehabi wrote:
For example each Table containing 50 columns or so?
You should search for useful relationships rather than splitting at an arbitrary number of columns. By splitting at every 50th column you run the risk of splitting related data (or data that is often used together) into separate tables which will slow down any query. If there is no way to normalise the data then consider what groups of columns tend to be used together. Then split it that way ensuring that each table maintains a one-to-one relationship with the others. This will cause some issues during an insert where some rows won't exist yet. So, design it to accept a zero-to-one relationship - that way missing rows in other tables don't affect the whole. You might also find that there a groups of columns where if one column isn't filled, neither are others in the group. You can then use this knowledge to eliminate the need to store a row where no information is used.
Upcoming events: * Edinburgh: Web Security Conference Day for Windows Developers (12th April) * Glasgow: Introduction to AJAX (2nd May), SQL Server, Mock Objects My
Thank you Colin for your prompt reply I really appreciate it. The table describes a medical procedure which contains 293 parameters to be filled in the report. Some of them of course can be null, but non of them is a repetition. Each procedure -ie. row- is given a GUID unique to that procedure used as the primary key. My idea about dividing the table wasn't at the exact 50th column -maybe I wasn't clear on this point, sorry-. The procedure can be -somehow- roughly divided into several steps, each containing about 50 parameters. As you said this will give me a great headache in maintaining the relations between the tables using the procedure's GUID as a foreign key. Also in the quesries, this is going to be very troublesome. Currently the table may work fine, but my concern is when it's filled with thousands of rows. Is it worth the effort to divide the table the way you suggested, or will it work fine as a single 293 columns containing table in the future?
Regards:rose:
-
Hello all :) This may seem a simple question, but I'm not looking for a solution, I'm looking for a best practice. I have a SQLTable that contains 293 columns, and may even contain thousands of rows. Such a massive table could be a source of performance issues. My question is: Is it satisfactory to keep the table this way, or should it be better if I divide the table on more than one tables, and link them using Foreign Key constraints? For example each Table containing 50 columns or so? Waiting eagerly for your expertise concerning this issue.
Regards:rose:
I don't think you need to worry about thousands of rows as much as what is the total number of bytes with your 293 columns. If you are using sql 2005 then your page size is 8060 bytes. You really don't want a single row to go over the page size, that can cause a lot of performace issues. If you can say that your total max row size would be 6000 bytes then I wouldn't worry about performace. If you max row size is over 8060 bytes then you really should break up the columns. If you only have thousands like 10,000 rows in a table there is a pretty good chance that any query wouldn't even use an index. Which is the next point. Performace has more to do with indexes being set up properly then number of columns. Sql server always looks up full pages. So the smaller the row size the more rows that will be read with one page read. Anyway, that is probably a lot more then you wanted. Here's a link to potential problems you might have: http://www.sqlservercentral.com/columnists/awarren/2862.asp[^] Hope that helps. Ben
-
I don't think you need to worry about thousands of rows as much as what is the total number of bytes with your 293 columns. If you are using sql 2005 then your page size is 8060 bytes. You really don't want a single row to go over the page size, that can cause a lot of performace issues. If you can say that your total max row size would be 6000 bytes then I wouldn't worry about performace. If you max row size is over 8060 bytes then you really should break up the columns. If you only have thousands like 10,000 rows in a table there is a pretty good chance that any query wouldn't even use an index. Which is the next point. Performace has more to do with indexes being set up properly then number of columns. Sql server always looks up full pages. So the smaller the row size the more rows that will be read with one page read. Anyway, that is probably a lot more then you wanted. Here's a link to potential problems you might have: http://www.sqlservercentral.com/columnists/awarren/2862.asp[^] Hope that helps. Ben
Thank you so much kubben. You reply has been very helpful and guiding.
Kindest Regards:rose: