Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Database best practice - large tables.

Database best practice - large tables.

Scheduled Pinned Locked Moved Database
questiondatabasesysadminhelptutorial
13 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Colin Angus Mackay

    Bad Robot wrote:

    Would it be better to split the table into smaller tables?

    Does the data suggest you should do that?

    Bad Robot wrote:

    Consider a factory. The factory pollutes. In order to keep track of the pollution, the factory owner has to fill in some 20 values of different kinds. Normally you would place these values on the factory table, but since there are so many of them, you might be able to put the pollution data in another table called PollutionData. But somehow I can't get this to work properbly. I have the factory referencing the PollutionId in the PollutionData table.

    Is your polution data of the kind like: COPollution1 CO2Pollution1 SO2Pollution1 COPollution2 CO2Pollution2 SO2Pollution2 etc. If so then that is an obvious candidate for a one-to-many join as there are repeating fields (with an index marker in the name) If you have 20 non-repeating columns, are they (as a block) optional? In otherwords: Can a factory exist without these values? From what you've said already, regardless of the details of how the data relates, it is obvious that the Factory is the parent. A factory pollutes, without the factory you don't need the pollution data. Therefore, the PollutionData should reference the Factory table. The PollutionData table should have a FactoryId column.

    Bad Robot wrote:

    A cascade delete will only work the other way around!

    Well, put it the other way around. From your description, it seems that is the correct way to go.

    Bad Robot wrote:

    And I want the database server to produce an error if someone tries to delete pollution data referenced by a factory.

    Difficult one this. If you forbid this, then you cannot delete a factory either. I'd say you enforce this by using only stored procedures to access your data. The stored procs can enforce that rule.

    Bad Robot wrote:

    As I see it - the only way to solve this is to put the pollution columns into the Factories table, but I have this kind "oh noes - this is wrong"-feeling - and that's just because there are so many columns.

    If there are no repeating columns, and the data must always exist with a factory then it is not wrong. You can have upto 1024 columns in a table

    M Offline
    M Offline
    MY1201
    wrote on last edited by
    #4

    Colin Angus Mackay wrote:

    If you have 20 non-repeating columns, are they (as a block) optional? In otherwords: Can a factory exist without these values?

    There are no repeating columns in pollution data. Factories must come up with values for some of the columns. Some factories needs to fill in all the columns and others do not. It's Depending on the type of pollution and size of the factory.

    Colin Angus Mackay wrote:

    From what you've said already, regardless of the details of how the data relates, it is obvious that the Factory is the parent. A factory pollutes, without the factory you don't need the pollution data. Therefore, the PollutionData should reference the Factory table. The PollutionData table should have a FactoryId column.

    If I give the PollutionData table a FactoryId this would give me the opportunity to create more than one row of PollutionData referencing the same factory. This is not good since it could give the clients some trouble finding out which one to use. Best regards Soeren

    C 1 Reply Last reply
    0
    • E Eric Dahlvang

      Bad Robot wrote:

      I have the factory referencing the PollutionId in the PollutionData table.

      You should have the polution table referencing the factoryid in the factory table.

      CREATE TABLE [Factory] (
          [FactoryID] [int] IDENTITY (1, 1) NOT NULL ,
          [FactoryName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
          CONSTRAINT [PK_Factory] PRIMARY KEY  CLUSTERED
          (
              [FactoryID]
          )  ON [PRIMARY]
      ) ON [PRIMARY]
      GO

      CREATE TABLE [Pollution] (
          [PollutionID] [int] IDENTITY (1, 1) NOT NULL ,
          [FactoryID] [int] NULL ,
          [PollutionDesc] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
          CONSTRAINT [PK_Pollution] PRIMARY KEY  CLUSTERED
          (
              [PollutionID]
          )  ON [PRIMARY] ,
          CONSTRAINT [FK_Pollution_Factory] FOREIGN KEY
          (
              [FactoryID]
          ) REFERENCES [Factory] (
              [FactoryID]
          ) ON DELETE CASCADE
      ) ON [PRIMARY]
      GO

      insert into factory (FactoryName) values ('Foo1')
      insert into Pollution (FactoryID,PollutionDesc) values (SCOPE_IDENTITY(),'Garbage')

      insert into factory (FactoryName) values ('Foo2')
      declare @nF1ID int
      select @nF1ID = SCOPE_IDENTITY( )
      insert into Pollution (FactoryID,PollutionDesc) values (@nF1ID,'Toxic Waste')
      insert into Pollution (FactoryID,PollutionDesc) values (@nF1ID,'Sewage')

      insert into factory (FactoryName) values ('Foo3')
      insert into Pollution (FactoryID,PollutionDesc) values (SCOPE_IDENTITY( ),'Tar')

      delete from factory where FactoryID = @nF1ID
      Go

      Cascading works fine in the above example.

      --EricDV Sig--------- Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peters

      M Offline
      M Offline
      MY1201
      wrote on last edited by
      #5

      Thanks for the fine example. Except this gives the factory the possibility to have more than one row of pollutiondata. Unfortunately this is not allowed in the datamodel! :(( I created this at first, but it kept getting the system design into trouble. Best regards Soeren

      E 1 Reply Last reply
      0
      • M MY1201

        Thanks for the fine example. Except this gives the factory the possibility to have more than one row of pollutiondata. Unfortunately this is not allowed in the datamodel! :(( I created this at first, but it kept getting the system design into trouble. Best regards Soeren

        E Offline
        E Offline
        Eric Dahlvang
        wrote on last edited by
        #6

        That's pretty easy to fix:

        drop table pollution
        drop table factory

        CREATE TABLE [Factory] (
            [FactoryID] [int] IDENTITY (1, 1) NOT NULL ,
            [FactoryName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
            CONSTRAINT [PK_Factory] PRIMARY KEY  CLUSTERED
            (
                [FactoryID]
            )  ON [PRIMARY]
        ) ON [PRIMARY]
        GO
        CREATE TABLE [Pollution] (
            [PollutionID] [int] IDENTITY (1, 1) NOT NULL ,
            [FactoryID] [int] NOT NULL ,
            [PollutionDesc] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
            CONSTRAINT [PK_Pollution] PRIMARY KEY  CLUSTERED
            (
                [PollutionID]
            )  ON [PRIMARY] ,
            CONSTRAINT [FK_Pollution_Factory] FOREIGN KEY
            (
                [FactoryID]
            ) REFERENCES [Factory] (
                [FactoryID]
            ) ON DELETE CASCADE
        ) ON [PRIMARY]
        GO

        insert into factory (FactoryName) values ('Foo1')
        insert into Pollution (FactoryID,PollutionDesc) values (SCOPE_IDENTITY(),'Garbage')

        insert into factory (FactoryName) values ('Foo2')
        declare @nF1ID int
        select @nF1ID = SCOPE_IDENTITY( )
        insert into Pollution (FactoryID,PollutionDesc) values (@nF1ID,'Toxic Waste')

        insert into factory (FactoryName) values ('Foo3')
        insert into Pollution (FactoryID,PollutionDesc) values (SCOPE_IDENTITY( ),'Tar')

        delete from factory where FactoryID = @nF1ID
        Go

        Last modified: 1hr 14mins after originally posted --

        WTF was this all about??? (See post below)

        --EricDV Sig--------- Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peters

        -- modified at 14:31 Thursday 12th October, 2006

        C 1 Reply Last reply
        0
        • M MY1201

          Colin Angus Mackay wrote:

          If you have 20 non-repeating columns, are they (as a block) optional? In otherwords: Can a factory exist without these values?

          There are no repeating columns in pollution data. Factories must come up with values for some of the columns. Some factories needs to fill in all the columns and others do not. It's Depending on the type of pollution and size of the factory.

          Colin Angus Mackay wrote:

          From what you've said already, regardless of the details of how the data relates, it is obvious that the Factory is the parent. A factory pollutes, without the factory you don't need the pollution data. Therefore, the PollutionData should reference the Factory table. The PollutionData table should have a FactoryId column.

          If I give the PollutionData table a FactoryId this would give me the opportunity to create more than one row of PollutionData referencing the same factory. This is not good since it could give the clients some trouble finding out which one to use. Best regards Soeren

          C Offline
          C Offline
          Colin Angus Mackay
          wrote on last edited by
          #7

          Bad Robot wrote:

          If I give the PollutionData table a FactoryId this would give me the opportunity to create more than one row of PollutionData referencing the same factory.

          Not if you also make FactoryId in the PollutionData table the primary key. Primary keys must be unique - therefore it will only permit you to insert one PollutionData row per Factory.


          Upcoming Scottish Developers events: * UK Security Evangelists On Tour (2nd November, Edinburgh) * Developer Day Scotland: are you interested in speaking or attending? My: Website | Blog

          1 Reply Last reply
          0
          • E Eric Dahlvang

            That's pretty easy to fix:

            drop table pollution
            drop table factory

            CREATE TABLE [Factory] (
                [FactoryID] [int] IDENTITY (1, 1) NOT NULL ,
                [FactoryName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
                CONSTRAINT [PK_Factory] PRIMARY KEY  CLUSTERED
                (
                    [FactoryID]
                )  ON [PRIMARY]
            ) ON [PRIMARY]
            GO
            CREATE TABLE [Pollution] (
                [PollutionID] [int] IDENTITY (1, 1) NOT NULL ,
                [FactoryID] [int] NOT NULL ,
                [PollutionDesc] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
                CONSTRAINT [PK_Pollution] PRIMARY KEY  CLUSTERED
                (
                    [PollutionID]
                )  ON [PRIMARY] ,
                CONSTRAINT [FK_Pollution_Factory] FOREIGN KEY
                (
                    [FactoryID]
                ) REFERENCES [Factory] (
                    [FactoryID]
                ) ON DELETE CASCADE
            ) ON [PRIMARY]
            GO

            insert into factory (FactoryName) values ('Foo1')
            insert into Pollution (FactoryID,PollutionDesc) values (SCOPE_IDENTITY(),'Garbage')

            insert into factory (FactoryName) values ('Foo2')
            declare @nF1ID int
            select @nF1ID = SCOPE_IDENTITY( )
            insert into Pollution (FactoryID,PollutionDesc) values (@nF1ID,'Toxic Waste')

            insert into factory (FactoryName) values ('Foo3')
            insert into Pollution (FactoryID,PollutionDesc) values (SCOPE_IDENTITY( ),'Tar')

            delete from factory where FactoryID = @nF1ID
            Go

            Last modified: 1hr 14mins after originally posted --

            WTF was this all about??? (See post below)

            --EricDV Sig--------- Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peters

            -- modified at 14:31 Thursday 12th October, 2006

            C Offline
            C Offline
            Colin Angus Mackay
            wrote on last edited by
            #8

            That won't fix that part of the problem. Making FactoryId the primary key or putting a unique constraint on it will.


            Upcoming Scottish Developers events: * UK Security Evangelists On Tour (2nd November, Edinburgh) * Developer Day Scotland: are you interested in speaking or attending? My: Website | Blog

            M E 2 Replies Last reply
            0
            • C Colin Angus Mackay

              That won't fix that part of the problem. Making FactoryId the primary key or putting a unique constraint on it will.


              Upcoming Scottish Developers events: * UK Security Evangelists On Tour (2nd November, Edinburgh) * Developer Day Scotland: are you interested in speaking or attending? My: Website | Blog

              M Offline
              M Offline
              MY1201
              wrote on last edited by
              #9

              Of course! I could do that. But what would be the best way to go? I kind of find the idea of making factoryId the primary key a bad idea. I don't know exactly why, but something tells me... Primary Key in one table, but also primary key in another table - well - I don't know. :) UNIQUE constraint could be a way to go. Please correct me if I'm wrong. Best Regards Soeren

              C 1 Reply Last reply
              0
              • C Colin Angus Mackay

                That won't fix that part of the problem. Making FactoryId the primary key or putting a unique constraint on it will.


                Upcoming Scottish Developers events: * UK Security Evangelists On Tour (2nd November, Edinburgh) * Developer Day Scotland: are you interested in speaking or attending? My: Website | Blog

                E Offline
                E Offline
                Eric Dahlvang
                wrote on last edited by
                #10

                Colin Angus Mackay wrote:

                That won't fix that part of the problem. Making FactoryId the primary key or putting a unique constraint on it will.

                Wow, I can really be stupid sometimes.  What was I thinking?  Thanks for pointing that out.

                CREATE TABLE [Pollution] (
                    [PollutionID] [int] IDENTITY (1, 1) NOT NULL ,
                    [FactoryID] [int] NOT NULL ,
                    [PollutionDesc] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
                    CONSTRAINT [PK_Pollution] PRIMARY KEY  CLUSTERED
                    (
                        [PollutionID]
                    )  ON [PRIMARY] ,
                    CONSTRAINT [IX_FactoryID] UNIQUE  NONCLUSTERED
                    (
                        [FactoryID]
                    )  ON [PRIMARY]
                ,
                    CONSTRAINT [FK_Pollution_Factory] FOREIGN KEY
                    (
                        [FactoryID]
                    ) REFERENCES [Factory] (
                        [FactoryID]
                    ) ON DELETE CASCADE
                ) ON [PRIMARY]
                GO

                --EricDV Sig--------- Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peters

                M 1 Reply Last reply
                0
                • E Eric Dahlvang

                  Colin Angus Mackay wrote:

                  That won't fix that part of the problem. Making FactoryId the primary key or putting a unique constraint on it will.

                  Wow, I can really be stupid sometimes.  What was I thinking?  Thanks for pointing that out.

                  CREATE TABLE [Pollution] (
                      [PollutionID] [int] IDENTITY (1, 1) NOT NULL ,
                      [FactoryID] [int] NOT NULL ,
                      [PollutionDesc] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
                      CONSTRAINT [PK_Pollution] PRIMARY KEY  CLUSTERED
                      (
                          [PollutionID]
                      )  ON [PRIMARY] ,
                      CONSTRAINT [IX_FactoryID] UNIQUE  NONCLUSTERED
                      (
                          [FactoryID]
                      )  ON [PRIMARY]
                  ,
                      CONSTRAINT [FK_Pollution_Factory] FOREIGN KEY
                      (
                          [FactoryID]
                      ) REFERENCES [Factory] (
                          [FactoryID]
                      ) ON DELETE CASCADE
                  ) ON [PRIMARY]
                  GO

                  --EricDV Sig--------- Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peters

                  M Offline
                  M Offline
                  MY1201
                  wrote on last edited by
                  #11

                  Thank you very much guys! I think this would actually be the solution to my problem! :-D Best regards Soeren

                  1 Reply Last reply
                  0
                  • M MY1201

                    Of course! I could do that. But what would be the best way to go? I kind of find the idea of making factoryId the primary key a bad idea. I don't know exactly why, but something tells me... Primary Key in one table, but also primary key in another table - well - I don't know. :) UNIQUE constraint could be a way to go. Please correct me if I'm wrong. Best Regards Soeren

                    C Offline
                    C Offline
                    Colin Angus Mackay
                    wrote on last edited by
                    #12

                    Bad Robot wrote:

                    Primary Key in one table, but also primary key in another table

                    Is valid for one-to-one or one-to-zero joins. If you are always going to have one factory row with a single corresponding pollution row then you have a one-to-one relationship. One-to-One relationships are a bit dodgy (I think) because it really means that the data belongs in the parent table. However, most one-to-one relationships are really one-to-zero relationships. e.g. If there was only ever going to be one pollution row for a corresponding factory row, but the pollution row was optional. So, it exists zero or one times for every factory row.

                    Bad Robot wrote:

                    UNIQUE constraint could be a way to go

                    That just creates redundant data because the primary key is always unique anyway. Better to eliminate the redundant data and share the same values for the primary key in both tables.


                    Upcoming Scottish Developers events: * UK Security Evangelists On Tour (2nd November, Edinburgh) * Developer Day Scotland: are you interested in speaking or attending? My: Website | Blog

                    1 Reply Last reply
                    0
                    • M MY1201

                      Hi there! I have a question regarding the structure of a database. I'm currently designing af table that will end up with around 50 columns of different kinds. Now my question is: Would it be better to split the table into smaller tables? I have tried this but I keep running into integrity problems, because the database server will have to delete rows in other tables as well. To clear things up I will give an example: Consider a factory. The factory pollutes. In order to keep track of the pollution, the factory owner has to fill in some 20 values of different kinds. Normally you would place these values on the factory table, but since there are so many of them, you might be able to put the pollution data in another table called PollutionData. But somehow I can't get this to work properbly. I have the factory referencing the PollutionId in the PollutionData table. A cascade delete will only work the other way around! Whenever someone deletes the pollution data, the factory is deleted. Which of course is wrong. I want the pollution data deleted if the factory is deleted. And I want the database server to produce an error if someone tries to delete pollution data referenced by a factory. As I see it - the only way to solve this is to put the pollution columns into the Factories table, but I have this kind "oh noes - this is wrong"-feeling - and that's just because there are so many columns. Any suggestions would be greatly appreciated! :) Best regards Soeren

                      A Offline
                      A Offline
                      Akhilesh Yadav
                      wrote on last edited by
                      #13

                      Create cluster. Now it is upto you whether u want to go for one big table or few small tables. Hope this helps u...

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups