Database best practice - large tables.

MY1201

Colin Angus Mackay wrote:

If you have 20 non-repeating columns, are they (as a block) optional? In otherwords: Can a factory exist without these values?

There are no repeating columns in pollution data. Factories must come up with values for some of the columns. Some factories needs to fill in all the columns and others do not. It's Depending on the type of pollution and size of the factory.

Colin Angus Mackay wrote:

From what you've said already, regardless of the details of how the data relates, it is obvious that the Factory is the parent. A factory pollutes, without the factory you don't need the pollution data. Therefore, the PollutionData should reference the Factory table. The PollutionData table should have a FactoryId column.

If I give the PollutionData table a FactoryId this would give me the opportunity to create more than one row of PollutionData referencing the same factory. This is not good since it could give the clients some trouble finding out which one to use. Best regards Soeren

MY1201

Thanks for the fine example. Except this gives the factory the possibility to have more than one row of pollutiondata. Unfortunately this is not allowed in the datamodel! :(( I created this at first, but it kept getting the system design into trouble. Best regards Soeren

Eric Dahlvang

That's pretty easy to fix:

drop table pollution
drop table factory

CREATE TABLE [Factory] (
    [FactoryID] [int] IDENTITY (1, 1) NOT NULL ,
    [FactoryName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    CONSTRAINT [PK_Factory] PRIMARY KEY CLUSTERED
    (
        [FactoryID]
    ) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [Pollution] (
    [PollutionID] [int] IDENTITY (1, 1) NOT NULL ,
    [FactoryID] [int] NOT NULL ,
    [PollutionDesc] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    CONSTRAINT [PK_Pollution] PRIMARY KEY CLUSTERED
    (
        [PollutionID]
    ) ON [PRIMARY] ,
    CONSTRAINT [FK_Pollution_Factory] FOREIGN KEY
    (
        [FactoryID]
    ) REFERENCES [Factory] (
        [FactoryID]
    ) ON DELETE CASCADE
) ON [PRIMARY]
GO

insert into factory (FactoryName) values ('Foo1')
insert into Pollution (FactoryID,PollutionDesc) values (SCOPE_IDENTITY(),'Garbage')

insert into factory (FactoryName) values ('Foo2')
declare @nF1ID int
select @nF1ID = SCOPE_IDENTITY( )
insert into Pollution (FactoryID,PollutionDesc) values (@nF1ID,'Toxic Waste')

insert into factory (FactoryName) values ('Foo3')
insert into Pollution (FactoryID,PollutionDesc) values (SCOPE_IDENTITY( ),'Tar')

delete from factory where FactoryID = @nF1ID
Go

Last modified: 1hr 14mins after originally posted --

WTF was this all about??? (See post below)

--EricDV Sig--------- Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peters

-- modified at 14:31 Thursday 12th October, 2006

Colin Angus Mackay

Bad Robot wrote:

If I give the PollutionData table a FactoryId this would give me the opportunity to create more than one row of PollutionData referencing the same factory.

Not if you also make FactoryId in the PollutionData table the primary key. Primary keys must be unique - therefore it will only permit you to insert one PollutionData row per Factory.

Upcoming Scottish Developers events: * UK Security Evangelists On Tour (2nd November, Edinburgh) * Developer Day Scotland: are you interested in speaking or attending? My: Website | Blog

Colin Angus Mackay

That won't fix that part of the problem. Making FactoryId the primary key or putting a unique constraint on it will.

Upcoming Scottish Developers events: * UK Security Evangelists On Tour (2nd November, Edinburgh) * Developer Day Scotland: are you interested in speaking or attending? My: Website | Blog

MY1201

Of course! I could do that. But what would be the best way to go? I kind of find the idea of making factoryId the primary key a bad idea. I don't know exactly why, but something tells me... Primary Key in one table, but also primary key in another table - well - I don't know. :) UNIQUE constraint could be a way to go. Please correct me if I'm wrong. Best Regards Soeren

Eric Dahlvang

Colin Angus Mackay wrote:

That won't fix that part of the problem. Making FactoryId the primary key or putting a unique constraint on it will.

Wow, I can really be stupid sometimes. What was I thinking? Thanks for pointing that out.

CREATE TABLE [Pollution] (
    [PollutionID] [int] IDENTITY (1, 1) NOT NULL ,
    [FactoryID] [int] NOT NULL ,
    [PollutionDesc] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
    CONSTRAINT [PK_Pollution] PRIMARY KEY CLUSTERED
    (
        [PollutionID]
    ) ON [PRIMARY] ,
    CONSTRAINT [IX_FactoryID] UNIQUE NONCLUSTERED
    (
        [FactoryID]
    ) ON [PRIMARY] ,
    CONSTRAINT [FK_Pollution_Factory] FOREIGN KEY
    (
        [FactoryID]
    ) REFERENCES [Factory] (
        [FactoryID]
    ) ON DELETE CASCADE
) ON [PRIMARY]
GO

--EricDV Sig--------- Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peters

MY1201

Thank you very much guys! I think this would actually be the solution to my problem! :-D Best regards Soeren

Colin Angus Mackay

Bad Robot wrote:

Primary Key in one table, but also primary key in another table

Is valid for one-to-one or one-to-zero joins. If you are always going to have one factory row with a single corresponding pollution row then you have a one-to-one relationship. One-to-One relationships are a bit dodgy (I think) because it really means that the data belongs in the parent table. However, most one-to-one relationships are really one-to-zero relationships. e.g. If there was only ever going to be one pollution row for a corresponding factory row, but the pollution row was optional. So, it exists zero or one times for every factory row.

Bad Robot wrote:

UNIQUE constraint could be a way to go

That just creates redundant data because the primary key is always unique anyway. Better to eliminate the redundant data and share the same values for the primary key in both tables.

Upcoming Scottish Developers events: * UK Security Evangelists On Tour (2nd November, Edinburgh) * Developer Day Scotland: are you interested in speaking or attending? My: Website | Blog

Akhilesh Yadav

Create cluster. Now it is upto you whether u want to go for one big table or few small tables. Hope this helps u...