Database design suggestions/feedback/pointers wanted

Alsvha

Hiya all. I'm currently trying to find some sources/feedback upon various methods of designing a database. Up to now, the database design I've been working with have been relatively simple and thus not so difficult in modeling, but now I think I'll be faced with a more difficult project and thus am trying to find some inspiration. If I have to map a large amount of products to a database, and each product can have a (large) amount of various attributes across multiple languages. The attributes for one product might not all be used for any other product and vice versa - so I'm ponding what would be the best way of doing this. To simplify: Supposed we have product X and Y and Z. Product X have attributes A, B, C Product Y have attributes D, E, F Product Z have attributes A, B, F Now multiply this up with a factor 50 per attribute and thousands of products over various languages and markets and we get the scope. Now, how I usually handle attributes is to normalise them out into a seperate table and use a connection table for the value, so I have a table for product, attribute and attribute value entity with a key from the product and attribute. Pretty standard I think.... However, with a scope like this, we'd be looking at a big table of attributes, products and then subsequently a very big table of values. This would/could seriously hamper the lookup times from the database. An alternative method I could think off would be to make some sort of dynamic table creation, so each product when created, got a table assigned to itself with the attributes. So I'd have a product table, and for each of these a currentProduct_attribute table which would contain all the values. This would provide a faster lookup, from what I can see at the expense of a very large number of tables and various administrative problems if a product is to be move/deleted etc. Another alternative could be to integrate the attributes into the product table with additional fields there. It would however cause huge product table with many unused fields in each row. Alternative one could sort of "bitmap" the attributes in the product table. Does anybody have any feedback or pointers - or possible references (please, if possible) I could take with me in my considerations? I've tried googeling my problem, but it is difficult finding something concrete and some relevant keywords to search for. Thanks in advance.

--------------------------- 127.0.0.1 - Sweet 127.0.0.1

123 0

It seems you've got a handle on the three basic alternatives: (1) A single table with all the attributes, many of the columns containing nulls in many of the rows. (2) A single table with the common attributes and additional, separate tables for attributes specific to each product (or product type). (3) A single table with the common attributes and an additional table keyed by product and attribute type with attribute value in a third non-key column. The fact that you're having trouble deciding between these means that your application would probably benefit from a carefully crafted combination of the three. So, assuming you can't break the whole application into separate and consistent "subsystems", I would recommend that you: Push as much as you can into option (1) - even if some nulls result. For many of the products, this may be all that is necessary, resulting in a simple and easy-to-query database of "basic" products; then Add some option (2) tables for those products that are unique enough to warrant them - products with well-defined attributes that are likely to be queried. To minimize the number of addtional tables, allow yourself some nulls in the new tables as well. Finally, Use option (3) for descriptive attributes that are less likely to be queried but are necessary primarily for display purposes. Don't fall into the typical designer's trap that says a single storage paradigm is appropriate in all cases; this simply isn't true. A typical product catalog will contain items with descriptions only, some with descriptions and a table of measurements, some with descriptions, measurements, and special shipping instructions, etc. Think of how you would present the information on paper, and your thinking will clear up. A good carpenter will use a hammer here, a screwdriver there, and a chisel somewhere else - all on the same job.

Alsvha

Thanks for your feedback - it is very much valued from me :) So basically your suggestion to the problem is that I attempt to analyze which data is most used, and optimize the read/query of these while using more "traditionally/school example normalization" for the remainder, less used data?

--------------------------- 127.0.0.1 - Sweet 127.0.0.1

Alsvha

Much appreciated. E-mail should be on the way.

123 0

Alsvha wrote:

So basically your suggestion to the problem is that I attempt to analyze which data is most used, and optimize the read/query of these while using more "traditionally/school example normalization" for the remainder, less used data?

Basically, yes, if I understand you correctly (and you've understood me). But... The important thing is to think like your users and their customers, not like a developer. These people are simply trying to get something done: the customer wants to place an order, the user wants to fill it. Your job is to make that easy for them. They don't care about abstract database design considerations, normalization, etc. If you say to them, "this is handled differently than that because it is different" they won't object; but if you make them handle similar things in different ways, they will not be happy. Likewise, if you unnaturally force different things into a single mold they will think you are making things unnecessarily difficult (and they'd be right); only when things are essentially the same should they be treated the same. I recommend, again, that you think in terms of a printed catalog with an order form inside. Get some catalogs - not websites, but real printed catalogs that someone has put some thought into - and look them over. Here in the States I would get a hardware catalog (Home Depot or Lowes), a media catalog (Circuit City or Media Play), a general-purpose catalog (Sears or Penny's), and perhaps an auto-parts catalog (Pep Boys). If my client already had a printed catalog, I would use that too - but I wouldn't limit myself to that. I would study the different kinds of items in the different catalogs and how they are presented. Then I would make up a similar catalog and order form for my application. By the time I was done, the form that the database should take would be obvious.

Alsvha

Yeah - I know the "thinking like..." and it is funny you should mention thinking from the perspective of a printing catalogue, as that is excatly what the scope of the project is, creating a new catalogue. So that is very much my approach to the situation/problem/project :) The "problem" is however as mentioned, that the amount of data is very large with the variable attributes, so instead of attempting to fit it into some standard design (what I would label as "school/textboox solutions"), which I upfront would know are very sluggish and slow performing for extraction of this amount of data (however, would be easier to update and manage) - I am attempting to draw inspiration as to the various other methods of designing the database; such as pro and cons with dynamic table creation, many nullables in each row of a table and so on. I'm attempting to keep it on a somewhat abstract level to better get the whole picture. But as always - thanks for your feedback.

123 0

Alsvha wrote:

...it is funny you should mention thinking from the perspective of a printing catalogue, as that is excatly what the scope of the project is, creating a new catalogue... I am attempting to draw inspiration as to the various other methods of designing the database; such as pro and cons with dynamic table creation, many nullables in each row of a table and so on... I'm attempting to keep it on a somewhat abstract level to better get the whole picture.

Our company, years ago, developed a program specifically designed to handle large amounts of printed information of all kinds in the simplest possible way - we call it a wysiwyg "pagebase" and since the pages themselves are the database, it is fast, efficient, easy to understand and use, etc - there is no "translation" from database format to viewing format and back, and the user's idea of the thing is identical to the developer's. We've been using it for all of our corporate data for almost ten years now. I can send you a PDF of the documentation (about 100 pages) if you're interested in this kind of "inspiration", but you'll have to email me directly so I have a place to send it (you can use the email button below).