When to NOT normalize?
-
Hi all, A bit of quick info before I come to my question: My company uses SQL Server 2008, two databases for one application and almost no schemas. Our biggest customer now has a DB of 40 gig (took them a few years to get that 'big'). One of the databases is for storing production data and the other is for some data our application needs and user settings. Now I recently had a discussion with my boss about one particular table in one of our databases. It stores settings on stock, sales, client servers, user rights, user settings and general settings (no joke... a topic about it would even shame the hall of shame). I told my boss that normalization might be very necessary for this table, but he told me that normalizing is not always a good thing. In the case of this particular table I know it is, but I am more interested in when we should not normalize? One of my boss' arguments was that one table to store 'all' settings is rather handy because it keeps the amount of tables low and you can always find a setting (for whatever) in one table. I know there are some rules for normalizing and denormalizing, but rules are meant to be broken ;) So I was wondering if my boss' argument about having lesser tables is valid at all? And I would like to see some real-life examples of people who chose not to normalize and for what reason.
-
Hi all, A bit of quick info before I come to my question: My company uses SQL Server 2008, two databases for one application and almost no schemas. Our biggest customer now has a DB of 40 gig (took them a few years to get that 'big'). One of the databases is for storing production data and the other is for some data our application needs and user settings. Now I recently had a discussion with my boss about one particular table in one of our databases. It stores settings on stock, sales, client servers, user rights, user settings and general settings (no joke... a topic about it would even shame the hall of shame). I told my boss that normalization might be very necessary for this table, but he told me that normalizing is not always a good thing. In the case of this particular table I know it is, but I am more interested in when we should not normalize? One of my boss' arguments was that one table to store 'all' settings is rather handy because it keeps the amount of tables low and you can always find a setting (for whatever) in one table. I know there are some rules for normalizing and denormalizing, but rules are meant to be broken ;) So I was wondering if my boss' argument about having lesser tables is valid at all? And I would like to see some real-life examples of people who chose not to normalize and for what reason.
This is a very good question - in fact, there's a lengthy discussion on the subject in my favorite SQL tuning book[^]. To me, denormalization is always an exercise in query optimization. I start off with my "ideal" (i.e. normalized) schema, populate its key parts with random but realistic data, quickly prototype my queries, and then look for particularly ugly execution plans. What your boss had in mind is probably minimizing the number of joins: if you always look for all (or a large subset of all) settings at once, and you keep doing it over and over again, using a single table for all settings will reduce the number of joins, and therefore improve your speed. However, I would not make a decision one way or the other without running a query analyzer: optimizing without a profiler is like driving in the dark without a map.
-
Hi all, A bit of quick info before I come to my question: My company uses SQL Server 2008, two databases for one application and almost no schemas. Our biggest customer now has a DB of 40 gig (took them a few years to get that 'big'). One of the databases is for storing production data and the other is for some data our application needs and user settings. Now I recently had a discussion with my boss about one particular table in one of our databases. It stores settings on stock, sales, client servers, user rights, user settings and general settings (no joke... a topic about it would even shame the hall of shame). I told my boss that normalization might be very necessary for this table, but he told me that normalizing is not always a good thing. In the case of this particular table I know it is, but I am more interested in when we should not normalize? One of my boss' arguments was that one table to store 'all' settings is rather handy because it keeps the amount of tables low and you can always find a setting (for whatever) in one table. I know there are some rules for normalizing and denormalizing, but rules are meant to be broken ;) So I was wondering if my boss' argument about having lesser tables is valid at all? And I would like to see some real-life examples of people who chose not to normalize and for what reason.
Your boss is right and you are right, there is no magic rule that can be applied, it all comes down to "depends on requirements". If you have a single table with an expanding number of columns to hold the settings then I would change the design. I often use a SettingType, SettingValue table and store the lot in there. Hoewever you cannot use FKs in that scenario so it is less than ideal. I do have 1 general rule and it is based on the database usage, if it is a transactional database for supporting applications I normalise as much as is reasonable. If it is a reporting database, this includes BI support DBs then normalisation goes out the window in favour of query optimisation. Try servicing Oracle BI, it will drive a relational database person nuts.
Never underestimate the power of human stupidity RAH
-
Hi all, A bit of quick info before I come to my question: My company uses SQL Server 2008, two databases for one application and almost no schemas. Our biggest customer now has a DB of 40 gig (took them a few years to get that 'big'). One of the databases is for storing production data and the other is for some data our application needs and user settings. Now I recently had a discussion with my boss about one particular table in one of our databases. It stores settings on stock, sales, client servers, user rights, user settings and general settings (no joke... a topic about it would even shame the hall of shame). I told my boss that normalization might be very necessary for this table, but he told me that normalizing is not always a good thing. In the case of this particular table I know it is, but I am more interested in when we should not normalize? One of my boss' arguments was that one table to store 'all' settings is rather handy because it keeps the amount of tables low and you can always find a setting (for whatever) in one table. I know there are some rules for normalizing and denormalizing, but rules are meant to be broken ;) So I was wondering if my boss' argument about having lesser tables is valid at all? And I would like to see some real-life examples of people who chose not to normalize and for what reason.
There is no specific rule as to when to normalize/denormalize. The answer is "it depends" In general it is a trade off between Query response vs Cost of data inconsistency. Normalized data keep the data consistent at a cost of query time. See Database normalization[^] and Denormalization[^] for some descriptions in wikipedia.
Yusuf May I help you?
-
Hi all, A bit of quick info before I come to my question: My company uses SQL Server 2008, two databases for one application and almost no schemas. Our biggest customer now has a DB of 40 gig (took them a few years to get that 'big'). One of the databases is for storing production data and the other is for some data our application needs and user settings. Now I recently had a discussion with my boss about one particular table in one of our databases. It stores settings on stock, sales, client servers, user rights, user settings and general settings (no joke... a topic about it would even shame the hall of shame). I told my boss that normalization might be very necessary for this table, but he told me that normalizing is not always a good thing. In the case of this particular table I know it is, but I am more interested in when we should not normalize? One of my boss' arguments was that one table to store 'all' settings is rather handy because it keeps the amount of tables low and you can always find a setting (for whatever) in one table. I know there are some rules for normalizing and denormalizing, but rules are meant to be broken ;) So I was wondering if my boss' argument about having lesser tables is valid at all? And I would like to see some real-life examples of people who chose not to normalize and for what reason.
Yep, it depends on the situation. I try to normalize as much as I can and then deal with troublesome processes. One technique I have used is when it comes time to perform a bunch of queries against complex data, I make a denormalized copy of the rows I need (I hesitate to say "temporary table"). This allows me to perform the queries without tying up the actual tables -- a little like a mini reporting database that gets created on the fly I guess. This has been useful in systems with a lot of updates that require frequent display to multiple users.
-
Yep, it depends on the situation. I try to normalize as much as I can and then deal with troublesome processes. One technique I have used is when it comes time to perform a bunch of queries against complex data, I make a denormalized copy of the rows I need (I hesitate to say "temporary table"). This allows me to perform the queries without tying up the actual tables -- a little like a mini reporting database that gets created on the fly I guess. This has been useful in systems with a lot of updates that require frequent display to multiple users.
Thanks for the answers :D So basically just normalise as much as possible until performance issues occur? Since we use a relational database with lots of read AND write (orders, stock, etc.) it would be best to normalize where possible? One thing I didn't get from the denormalizing DB article on wikipedia was the usage of views. Is making a view a way of denormalizing? Because in a view you use joins too, but in the end you can just select from one table(view) :)
-
Hi all, A bit of quick info before I come to my question: My company uses SQL Server 2008, two databases for one application and almost no schemas. Our biggest customer now has a DB of 40 gig (took them a few years to get that 'big'). One of the databases is for storing production data and the other is for some data our application needs and user settings. Now I recently had a discussion with my boss about one particular table in one of our databases. It stores settings on stock, sales, client servers, user rights, user settings and general settings (no joke... a topic about it would even shame the hall of shame). I told my boss that normalization might be very necessary for this table, but he told me that normalizing is not always a good thing. In the case of this particular table I know it is, but I am more interested in when we should not normalize? One of my boss' arguments was that one table to store 'all' settings is rather handy because it keeps the amount of tables low and you can always find a setting (for whatever) in one table. I know there are some rules for normalizing and denormalizing, but rules are meant to be broken ;) So I was wondering if my boss' argument about having lesser tables is valid at all? And I would like to see some real-life examples of people who chose not to normalize and for what reason.
Naerling wrote:
One of my boss' arguments was that one table to store 'all' settings is rather handy because it keeps the amount of tables low and you can always find a setting (for whatever) in one table.
That's one argument. Where's the rest?
Naerling wrote:
I know there are some rules for normalizing and denormalizing, but rules are meant to be broken
That in itself isn't a reason to break the rules in a production environment. You can always test it on a VM, if required. First up, I'd collect a list of arguments on why it shouldn't be normalized. You bosses' argument isn't really an argument to skip the normalization-process, a database doesn't perform better if the table-count is lower. Quite the opposite is true; if the datacollection within that table has several fields that different parts of the system uses, then updating a setting could become a costly operation since it would also have to update several indexes - possibly blocking other updates. I tend to normalize every entity in the database, so help me Codd, to BNF. The only reason to denormalize that I stumbled across was partitioning :)
I are Troll :suss:
-
Naerling wrote:
One of my boss' arguments was that one table to store 'all' settings is rather handy because it keeps the amount of tables low and you can always find a setting (for whatever) in one table.
That's one argument. Where's the rest?
Naerling wrote:
I know there are some rules for normalizing and denormalizing, but rules are meant to be broken
That in itself isn't a reason to break the rules in a production environment. You can always test it on a VM, if required. First up, I'd collect a list of arguments on why it shouldn't be normalized. You bosses' argument isn't really an argument to skip the normalization-process, a database doesn't perform better if the table-count is lower. Quite the opposite is true; if the datacollection within that table has several fields that different parts of the system uses, then updating a setting could become a costly operation since it would also have to update several indexes - possibly blocking other updates. I tend to normalize every entity in the database, so help me Codd, to BNF. The only reason to denormalize that I stumbled across was partitioning :)
I are Troll :suss:
'The rest' of his arguments are: "it's easy that all settings are in one table", "you always know where to find a setting", "normalizing isn't always a good thing" and so on ;P In practice this isn't even true, because the 'type of setting' is stored as a varchar which isn't really documented anywhere. So user settings can be found "where [category] = 'usersetting'" or "where [category] = 'setting'" and maybe "where [category] = options". And then there's a column called sub which specifies what setting it is. Don't use a 'like' in this table, because 'setting' could also give settings on stock or sales. But if you do use '=' for 'sales' you won't find 'sales_statusses' etc... :doh: Then there's the employee column which does not hold a FK to the employee table because in most cases there is no employee for a particular setting. So I recently found lots of settings for non-existant employee's... It's really great :) Actually there is no key or constraint at this table at all! There is good news though. It's a very old table which was created with the best of intents at a time that my company did not know very much about SQL Server (migrated from DBF). We only keep it because redesigning this part of our DB (and software) would cost to much time at this moment. My boss does not think the table is good, it's just 'not that bad' :)
-
Thanks for the answers :D So basically just normalise as much as possible until performance issues occur? Since we use a relational database with lots of read AND write (orders, stock, etc.) it would be best to normalize where possible? One thing I didn't get from the denormalizing DB article on wikipedia was the usage of views. Is making a view a way of denormalizing? Because in a view you use joins too, but in the end you can just select from one table(view) :)
Naerling wrote:
the usage of views
Temporary denormalization, kinda like I described, but more temporary.
-
'The rest' of his arguments are: "it's easy that all settings are in one table", "you always know where to find a setting", "normalizing isn't always a good thing" and so on ;P In practice this isn't even true, because the 'type of setting' is stored as a varchar which isn't really documented anywhere. So user settings can be found "where [category] = 'usersetting'" or "where [category] = 'setting'" and maybe "where [category] = options". And then there's a column called sub which specifies what setting it is. Don't use a 'like' in this table, because 'setting' could also give settings on stock or sales. But if you do use '=' for 'sales' you won't find 'sales_statusses' etc... :doh: Then there's the employee column which does not hold a FK to the employee table because in most cases there is no employee for a particular setting. So I recently found lots of settings for non-existant employee's... It's really great :) Actually there is no key or constraint at this table at all! There is good news though. It's a very old table which was created with the best of intents at a time that my company did not know very much about SQL Server (migrated from DBF). We only keep it because redesigning this part of our DB (and software) would cost to much time at this moment. My boss does not think the table is good, it's just 'not that bad' :)
Naerling wrote:
"it's easy that all settings are in one table"
For whom? The computer doesn't care.
Naerling wrote:
"you always know where to find a setting"
He wouldn't be able to find them if they were divided over two tables? <religious rant> By that same logic, let's put everything in a *single* table; that way you always know in what table your data resides. Let's put everything under a single schema, for a single user and use a single password to access that single table. How about using a single VARCHAR-column in that table? That way you know exactly what column to search on and what column to retrieve from that table. </religious rant>
Naerling wrote:
"normalizing isn't always a good thing"
That's not even an argument, it's an opinion.
Naerling wrote:
Actually there is no key or constraint at this table at all!
..then why use a database at all?
Naerling wrote:
There is good news though. It's a very old table which was created with the best of intents at a time that my company did not know very much about SQL Server (migrated from DBF). We only keep it because redesigning this part of our DB (and software) would cost to much time at this moment. My boss does not think the table is good, it's just 'not that bad'
On Error Resume Next
:laugh:I are Troll :suss:
-
Naerling wrote:
"it's easy that all settings are in one table"
For whom? The computer doesn't care.
Naerling wrote:
"you always know where to find a setting"
He wouldn't be able to find them if they were divided over two tables? <religious rant> By that same logic, let's put everything in a *single* table; that way you always know in what table your data resides. Let's put everything under a single schema, for a single user and use a single password to access that single table. How about using a single VARCHAR-column in that table? That way you know exactly what column to search on and what column to retrieve from that table. </religious rant>
Naerling wrote:
"normalizing isn't always a good thing"
That's not even an argument, it's an opinion.
Naerling wrote:
Actually there is no key or constraint at this table at all!
..then why use a database at all?
Naerling wrote:
There is good news though. It's a very old table which was created with the best of intents at a time that my company did not know very much about SQL Server (migrated from DBF). We only keep it because redesigning this part of our DB (and software) would cost to much time at this moment. My boss does not think the table is good, it's just 'not that bad'
On Error Resume Next
:laugh:I are Troll :suss:
I couldn't agree with you more! Luckily not every table in our DB is like that. It's really the worst example I could think of by far ;) Yesterday I made some DB design with 9 tables and 2-3 columns per table. I had expected my boss to say I should use less tables. Instead he said I should've made 1 extra (maybe he reads this forum too?) :laugh: Don't start about on On Error Resume Next... I know some very nice .NET examples of that. But that's another topic ;P
-
I couldn't agree with you more! Luckily not every table in our DB is like that. It's really the worst example I could think of by far ;) Yesterday I made some DB design with 9 tables and 2-3 columns per table. I had expected my boss to say I should use less tables. Instead he said I should've made 1 extra (maybe he reads this forum too?) :laugh: Don't start about on On Error Resume Next... I know some very nice .NET examples of that. But that's another topic ;P