Using SQL as categorized tagging backend
-
Background: I've been requested to make an app that keeps track of complaint files that come in. These are non-standard and come in the form of PDF, doc, txt and xls. What needs to be tracked would be things like PO, Employee, Customer, etc. There can be multiple values to each of these or no value at all. Problem: Since I don't want to try to read from all of these files, I figure a categorized tagging system through SQL server might work. I'm somewhat reluctant to develop such a system, however, due to performance reasons. Possible solutions: The first solution I had in mind is using 3 tables. One table contains the file information. Another table contains the tag names. The third table relates to the file and tagname tables and holds the values of a tag associated with a file. I was thinking I could just index the column containing the tag values and query through it. The second solution involves one table containing the file name, delimited string of tag names and a delimited string of tag values. Issues: I'm not sure how costly the first solution will be on the server. I'm not wanting full text indexing, just an index on one varchar(30-ish) column. The second solution would be pretty simple to use regular expressions in my front-end to search through the values. But I wouldn't want to query against this since string comparisons are costly enough when they're straight 1-1. Which of these would be the better way to go? Would I absolutely be raping my server if I implementing my first solution? Any input is appreciated.
-
Background: I've been requested to make an app that keeps track of complaint files that come in. These are non-standard and come in the form of PDF, doc, txt and xls. What needs to be tracked would be things like PO, Employee, Customer, etc. There can be multiple values to each of these or no value at all. Problem: Since I don't want to try to read from all of these files, I figure a categorized tagging system through SQL server might work. I'm somewhat reluctant to develop such a system, however, due to performance reasons. Possible solutions: The first solution I had in mind is using 3 tables. One table contains the file information. Another table contains the tag names. The third table relates to the file and tagname tables and holds the values of a tag associated with a file. I was thinking I could just index the column containing the tag values and query through it. The second solution involves one table containing the file name, delimited string of tag names and a delimited string of tag values. Issues: I'm not sure how costly the first solution will be on the server. I'm not wanting full text indexing, just an index on one varchar(30-ish) column. The second solution would be pretty simple to use regular expressions in my front-end to search through the values. But I wouldn't want to query against this since string comparisons are costly enough when they're straight 1-1. Which of these would be the better way to go? Would I absolutely be raping my server if I implementing my first solution? Any input is appreciated.
The first one is definitely better. You have far more options to tune the statements if you encounter bottlenecks when you have done database modelling as it should be done. The second one leads to several problems. Slow queries, poor maintainability, non-efficient indexing etc. Just to throw another possibility, you could also consider xml-field on the same table as document names. It can be easily updated, using at client side is typically very straightforward and also it can be indexed. Hope this helps, Mika
The need to optimize rises from a bad design. My articles[^]
-
The first one is definitely better. You have far more options to tune the statements if you encounter bottlenecks when you have done database modelling as it should be done. The second one leads to several problems. Slow queries, poor maintainability, non-efficient indexing etc. Just to throw another possibility, you could also consider xml-field on the same table as document names. It can be easily updated, using at client side is typically very straightforward and also it can be indexed. Hope this helps, Mika
The need to optimize rises from a bad design. My articles[^]
-
Background: I've been requested to make an app that keeps track of complaint files that come in. These are non-standard and come in the form of PDF, doc, txt and xls. What needs to be tracked would be things like PO, Employee, Customer, etc. There can be multiple values to each of these or no value at all. Problem: Since I don't want to try to read from all of these files, I figure a categorized tagging system through SQL server might work. I'm somewhat reluctant to develop such a system, however, due to performance reasons. Possible solutions: The first solution I had in mind is using 3 tables. One table contains the file information. Another table contains the tag names. The third table relates to the file and tagname tables and holds the values of a tag associated with a file. I was thinking I could just index the column containing the tag values and query through it. The second solution involves one table containing the file name, delimited string of tag names and a delimited string of tag values. Issues: I'm not sure how costly the first solution will be on the server. I'm not wanting full text indexing, just an index on one varchar(30-ish) column. The second solution would be pretty simple to use regular expressions in my front-end to search through the values. But I wouldn't want to query against this since string comparisons are costly enough when they're straight 1-1. Which of these would be the better way to go? Would I absolutely be raping my server if I implementing my first solution? Any input is appreciated.
-
Most definately agree with Mika, option 1 is the way to go. As for performance, it shouldn't be a big problem unless you expect hundreds of millions of records - in which case you may want to look at partitioned tables.
Bob Ashfield Consultants Ltd