Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
CODE PROJECT For Those Who Code
  • Home
  • Articles
  • FAQ
Community
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Relational databases, XML, or OLAP?

Relational databases, XML, or OLAP?

Scheduled Pinned Locked Moved Database
databaseperformancexmlquestionsql-server
11 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J James Shao

    Hi guys, I'm relatively new to databases and have been playing with SQL Server 2008 Express for some time. I want to build a database on stock prices and various other stock-related indicators across time. So basically it's going to have three dimensions: name of the stock, stock attributes, and time. The user will be able to query for certain stock's performance on a particular attribute across a designated time period. Normally this would be a perfect job for OLAP (multidimensional database). However, due to financial constraint and other reasons :( , I am not considering OLAP at the moment. With my limited knowledge on databases, I think I could achieve the above by using: 1) Relational database: if I need to track 1000 stocks, I will construct 1000 tables, with time and stock attributes on each stock table. 2) XML: I am relatively new to XML too, but it seems that with some clever XPath/XQuery coding, dumping all the data into one huge XML database is not a bad idea (or is it?), as long as data can be effectively retrieved. In terms of speed/performance, maintenance convenience, and server-memory efficiency, which of the above two is a better choice? Or do I really need to migrate to using OLAP? Sorry if this is a simple question, I am a total newbie regarding databases. :) Thanks in advance!!

    T Offline
    T Offline
    T2102
    wrote on last edited by
    #2

    I strongly advise against 1,000 tables. This is not a scalable solution and many of your queries will take a very long time as you will need to join many tables.

    1 Reply Last reply
    0
    • J James Shao

      Hi guys, I'm relatively new to databases and have been playing with SQL Server 2008 Express for some time. I want to build a database on stock prices and various other stock-related indicators across time. So basically it's going to have three dimensions: name of the stock, stock attributes, and time. The user will be able to query for certain stock's performance on a particular attribute across a designated time period. Normally this would be a perfect job for OLAP (multidimensional database). However, due to financial constraint and other reasons :( , I am not considering OLAP at the moment. With my limited knowledge on databases, I think I could achieve the above by using: 1) Relational database: if I need to track 1000 stocks, I will construct 1000 tables, with time and stock attributes on each stock table. 2) XML: I am relatively new to XML too, but it seems that with some clever XPath/XQuery coding, dumping all the data into one huge XML database is not a bad idea (or is it?), as long as data can be effectively retrieved. In terms of speed/performance, maintenance convenience, and server-memory efficiency, which of the above two is a better choice? Or do I really need to migrate to using OLAP? Sorry if this is a simple question, I am a total newbie regarding databases. :) Thanks in advance!!

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #3

      I wouldn't store similar information on different stocks in different tables, just the one table with a field added to identify the stock would be fine. It would be able to provide more functionality with less code, as you can now easily search/list over many stocks, and never need to enumerate the tables. :)

      Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]


      I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages


      J 1 Reply Last reply
      0
      • L Luc Pattyn

        I wouldn't store similar information on different stocks in different tables, just the one table with a field added to identify the stock would be fine. It would be able to provide more functionality with less code, as you can now easily search/list over many stocks, and never need to enumerate the tables. :)

        Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]


        I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages


        J Offline
        J Offline
        James Shao
        wrote on last edited by
        #4

        Thanks for the replies guys. But if I add a field to identify the stocks, it would be one huge table and it won't look as elegant. :( By the way, are you suggesting the following structure: Time StockName Open High Low Close EPS Dividend Return% ... ... ... 1/2/07 Citi 20 23 19 20.5 5m 5 4% ... ... ... .... .... .... 11/25/09 Citi 4 5 3 4.5 2m 2 -10% ... ... ... 1/2/07 BofA 35 37 32 36.4 7m 10 7% ... ... ... .... .... .... 11/25/09 BofA 12 13 11 10 3m 1 -7% ... ... ... 1/2/07 MSFT 45 47 41 42.5 28m 3 7% ... ... ... .... .... .... 11/25/09 MSFT 4 5 3 4.5 2m 2 -10% ... ... ... If this is true, assuming that the table will contain 1000 stocks with 50 attributes for a 3-year period, this table will need to contain: 1000 x 50 x 260(business days) x 3 = 39,000,000 stock prices. Is this a little bit too large for one table? (If it is, as an alternative perhaps I could break 1000 stocks down into several smaller tables based on the market in which they are traded in? So one table for each market. What do you think? Thanks!

        L T 2 Replies Last reply
        0
        • J James Shao

          Thanks for the replies guys. But if I add a field to identify the stocks, it would be one huge table and it won't look as elegant. :( By the way, are you suggesting the following structure: Time StockName Open High Low Close EPS Dividend Return% ... ... ... 1/2/07 Citi 20 23 19 20.5 5m 5 4% ... ... ... .... .... .... 11/25/09 Citi 4 5 3 4.5 2m 2 -10% ... ... ... 1/2/07 BofA 35 37 32 36.4 7m 10 7% ... ... ... .... .... .... 11/25/09 BofA 12 13 11 10 3m 1 -7% ... ... ... 1/2/07 MSFT 45 47 41 42.5 28m 3 7% ... ... ... .... .... .... 11/25/09 MSFT 4 5 3 4.5 2m 2 -10% ... ... ... If this is true, assuming that the table will contain 1000 stocks with 50 attributes for a 3-year period, this table will need to contain: 1000 x 50 x 260(business days) x 3 = 39,000,000 stock prices. Is this a little bit too large for one table? (If it is, as an alternative perhaps I could break 1000 stocks down into several smaller tables based on the market in which they are traded in? So one table for each market. What do you think? Thanks!

          L Offline
          L Offline
          Luc Pattyn
          wrote on last edited by
          #5

          James Shao wrote:

          are you suggesting...

          yes I am. I haven't done this with millions of records, my DB apps aren't large, but yes that is what I would do. I avoid tables with identical structure, I only use one Persons table, one Vehicles table, etc. :)

          Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]


          I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages


          J 1 Reply Last reply
          0
          • L Luc Pattyn

            James Shao wrote:

            are you suggesting...

            yes I am. I haven't done this with millions of records, my DB apps aren't large, but yes that is what I would do. I avoid tables with identical structure, I only use one Persons table, one Vehicles table, etc. :)

            Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]


            I only read code that is properly indented, and rendered in a non-proportional font; hint: use PRE tags in forum messages


            J Offline
            J Offline
            James Shao
            wrote on last edited by
            #6

            Thanks a lot, I'll give this a try and come back if I've encountered errors. :)

            M 1 Reply Last reply
            0
            • J James Shao

              Thanks for the replies guys. But if I add a field to identify the stocks, it would be one huge table and it won't look as elegant. :( By the way, are you suggesting the following structure: Time StockName Open High Low Close EPS Dividend Return% ... ... ... 1/2/07 Citi 20 23 19 20.5 5m 5 4% ... ... ... .... .... .... 11/25/09 Citi 4 5 3 4.5 2m 2 -10% ... ... ... 1/2/07 BofA 35 37 32 36.4 7m 10 7% ... ... ... .... .... .... 11/25/09 BofA 12 13 11 10 3m 1 -7% ... ... ... 1/2/07 MSFT 45 47 41 42.5 28m 3 7% ... ... ... .... .... .... 11/25/09 MSFT 4 5 3 4.5 2m 2 -10% ... ... ... If this is true, assuming that the table will contain 1000 stocks with 50 attributes for a 3-year period, this table will need to contain: 1000 x 50 x 260(business days) x 3 = 39,000,000 stock prices. Is this a little bit too large for one table? (If it is, as an alternative perhaps I could break 1000 stocks down into several smaller tables based on the market in which they are traded in? So one table for each market. What do you think? Thanks!

              T Offline
              T Offline
              T2102
              wrote on last edited by
              #7

              Breaking it down by market is not a good idea, but you do need to know what market something traded on. A single stock may trade on multiple markets in the US, even when you ignore after-hours trading and dark pools. I've personally worked with global financial databases with billions of records. The DBA sets the appropriate indices and partitions and performance was fine. What you do not want to do is design something fragmented where it is hard to change the structure later without losing your data. As far as memory usage, you can use SELECT statements and limit the amount of memory that sql server will use. So if you have a 10 GB table, you can tell sql server not to use more than 500 MB of RAM for instance.

              J 1 Reply Last reply
              0
              • T T2102

                Breaking it down by market is not a good idea, but you do need to know what market something traded on. A single stock may trade on multiple markets in the US, even when you ignore after-hours trading and dark pools. I've personally worked with global financial databases with billions of records. The DBA sets the appropriate indices and partitions and performance was fine. What you do not want to do is design something fragmented where it is hard to change the structure later without losing your data. As far as memory usage, you can use SELECT statements and limit the amount of memory that sql server will use. So if you have a 10 GB table, you can tell sql server not to use more than 500 MB of RAM for instance.

                J Offline
                J Offline
                James Shao
                wrote on last edited by
                #8

                Hi Ted, thank you for the suggestions. I also feel that fragmenting the data is not a good idea. But if I dump them all into one table, that would create big redundancy in the date column (my 1st column), since I'll need to repeat it for every stock I have in my database. Is this okay? Thanks! :)

                T 1 Reply Last reply
                0
                • J James Shao

                  Hi Ted, thank you for the suggestions. I also feel that fragmenting the data is not a good idea. But if I dump them all into one table, that would create big redundancy in the date column (my 1st column), since I'll need to repeat it for every stock I have in my database. Is this okay? Thanks! :)

                  T Offline
                  T Offline
                  T2102
                  wrote on last edited by
                  #9

                  Yes, you should use the date in your table (which is 4 bytes). If you created another table holding dates to try to reduce the size, then you would have an additional bottleneck. Your primary key will be composed of multiple columns including ID, Date, and possibly source/exchange. If you really needed to save space in your table, you could use a smallint (2 bytes) for your date and map the minimum small int to Jan 1, 1970 or wherever you will start your database. Then it will be quick to add/subtract an offset to go back and forth from Excel's date format.

                  J 1 Reply Last reply
                  0
                  • J James Shao

                    Thanks a lot, I'll give this a try and come back if I've encountered errors. :)

                    M Offline
                    M Offline
                    Mycroft Holmes
                    wrote on last edited by
                    #10

                    My databases ARE in the millions of records, Luc is correct have 1 structure with all the information. Do not store any data twice (database design 101) eg is you are have industry or sector for an equity then have an equity table with the static data and a related table with the tick information. If table size becomes an issue there are many better option than splitting by stock name. Partitioning by time is a better solution (partition by year would do). Doesn't express include analysis services (have not checked)

                    Never underestimate the power of human stupidity RAH

                    1 Reply Last reply
                    0
                    • T T2102

                      Yes, you should use the date in your table (which is 4 bytes). If you created another table holding dates to try to reduce the size, then you would have an additional bottleneck. Your primary key will be composed of multiple columns including ID, Date, and possibly source/exchange. If you really needed to save space in your table, you could use a smallint (2 bytes) for your date and map the minimum small int to Jan 1, 1970 or wherever you will start your database. Then it will be quick to add/subtract an offset to go back and forth from Excel's date format.

                      J Offline
                      J Offline
                      James Shao
                      wrote on last edited by
                      #11

                      Thank you Ted, I'll give it a try and see how it goes. :)

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups