Relational databases, XML, or OLAP?

T2102

I strongly advise against 1,000 tables. This is not a scalable solution and many of your queries will take a very long time as you will need to join many tables.

Luc Pattyn

I wouldn't store similar information on different stocks in different tables, just the one table with a field added to identify the stock would be fine. It would be able to provide more functionality with less code, as you can now easily search/list over many stocks, and never need to enumerate the tables. :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]

I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages

James Shao

Thanks for the replies guys. But if I add a field to identify the stocks, it would be one huge table and it won't look as elegant. :( By the way, are you suggesting the following structure: Time StockName Open High Low Close EPS Dividend Return% ... ... ... 1/2/07 Citi 20 23 19 20.5 5m 5 4% ... ... ... .... .... .... 11/25/09 Citi 4 5 3 4.5 2m 2 -10% ... ... ... 1/2/07 BofA 35 37 32 36.4 7m 10 7% ... ... ... .... .... .... 11/25/09 BofA 12 13 11 10 3m 1 -7% ... ... ... 1/2/07 MSFT 45 47 41 42.5 28m 3 7% ... ... ... .... .... .... 11/25/09 MSFT 4 5 3 4.5 2m 2 -10% ... ... ... If this is true, assuming that the table will contain 1000 stocks with 50 attributes for a 3-year period, this table will need to contain: 1000 x 50 x 260(business days) x 3 = 39,000,000 stock prices. Is this a little bit too large for one table? (If it is, as an alternative perhaps I could break 1000 stocks down into several smaller tables based on the market in which they are traded in? So one table for each market. What do you think? Thanks!

Luc Pattyn

James Shao wrote:

are you suggesting...

yes I am. I haven't done this with millions of records, my DB apps aren't large, but yes that is what I would do. I avoid tables with identical structure, I only use one Persons table, one Vehicles table, etc. :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]

I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages

James Shao

Thanks a lot, I'll give this a try and come back if I've encountered errors. :)

T2102

Breaking it down by market is not a good idea, but you do need to know what market something traded on. A single stock may trade on multiple markets in the US, even when you ignore after-hours trading and dark pools. I've personally worked with global financial databases with billions of records. The DBA sets the appropriate indices and partitions and performance was fine. What you do not want to do is design something fragmented where it is hard to change the structure later without losing your data. As far as memory usage, you can use SELECT statements and limit the amount of memory that sql server will use. So if you have a 10 GB table, you can tell sql server not to use more than 500 MB of RAM for instance.

James Shao

Hi Ted, thank you for the suggestions. I also feel that fragmenting the data is not a good idea. But if I dump them all into one table, that would create big redundancy in the date column (my 1st column), since I'll need to repeat it for every stock I have in my database. Is this okay? Thanks! :)

T2102

Yes, you should use the date in your table (which is 4 bytes). If you created another table holding dates to try to reduce the size, then you would have an additional bottleneck. Your primary key will be composed of multiple columns including ID, Date, and possibly source/exchange. If you really needed to save space in your table, you could use a smallint (2 bytes) for your date and map the minimum small int to Jan 1, 1970 or wherever you will start your database. Then it will be quick to add/subtract an offset to go back and forth from Excel's date format.

Mycroft Holmes

My databases ARE in the millions of records, Luc is correct have 1 structure with all the information. Do not store any data twice (database design 101) eg is you are have industry or sector for an equity then have an equity table with the static data and a related table with the tick information. If table size becomes an issue there are many better option than splitting by stock name. Partitioning by time is a better solution (partition by year would do). Doesn't express include analysis services (have not checked)

Never underestimate the power of human stupidity RAH

James Shao

Thank you Ted, I'll give it a try and see how it goes. :)