DAL Design Question
-
Hi gurus, I'm creating a web application for the internet with scalability in mind. In my application, I'll have forums used by different groups of people (each group will have their own forums). I will start from 1000 or 2000 groups with the potential to grow until 5000 groups (it's guaranteed that I won't exceed that number of groups, 5000, based on the nature of my application). I was thinking that having all the forum posts in one table will cause many problems like having a very large index and slow search among other problems (as I want to have many indexes on the table like the PostID, ForumID, GroupID and PostDate), so, I was thinking that it might be better to have a separate table for each group's forum posts in order to have small indexes so that I have faster searches and inserts take less time (esp. that some groups are expected to have a large number of posts per day) and also to be easier to move the tables to other database servers in case the application grows and so the web farm. Now I'm really confused how to design my DAL. Assuming that the structure of the forum posts table is like that (this is just a simplified structure not the real one): CREATE TABLE ForumPosts_x ( PostID INT IDENTITY(1,1) PRIMARY KEY, ForumID INT NOT NULL, -- which refers to the ForumID in another table named Forums which includes all forums for all groups ParentPostID INT NULL, PostSubject NVARCHAR(200) NOT NULL, PostText NVARCHAR(5000) NOT NULL, PostDate DATETIME NOT NULL DEFAULT GETUTCDATE() ) Note that there's no group ID as x in the table name will be the group id e.g. ForumPosts_19 for group id 19 Now as to designing my DAL, should I: 1. Create stored procedures with dynamic SQL and pass the group id to the procedure i.e. use EXEC and sp_executesql (there's an interesting article on the subject here: http://www.sommarskog.se/dynamic\_sql.html) For example: CREATE PROCEDURE InsertForumPost @GroupID INT, @ForumID INT, @ParentPostID INT, @PostSubject NVARCHAR(200), @PostText NVARCHAR(5000) AS DECLARE @tablename NVARCHAR(50), @sql NVARCHAR(4000) SET @tablename = N'ForumPosts_' + @GroupID SET @sql = N'INSERT INTO dbo.' + quotename(@tblname) + ' (ForumID, ParentPostID, PostSubject, PostText) VALUES (' + '@ForumID, @ParentPostID, @PostSubject, @PostText)' EXEC sp_executesql @sql, N'@ForumID INT, @ParentPostID INT, @PostSubject NVARCHAR(200), @PostText NVARCHAR(5000)', @ForumID, @ParentPostID, @PostSubject, @PostText 2. Create the procedures with static SQL for each group.
-
Hi gurus, I'm creating a web application for the internet with scalability in mind. In my application, I'll have forums used by different groups of people (each group will have their own forums). I will start from 1000 or 2000 groups with the potential to grow until 5000 groups (it's guaranteed that I won't exceed that number of groups, 5000, based on the nature of my application). I was thinking that having all the forum posts in one table will cause many problems like having a very large index and slow search among other problems (as I want to have many indexes on the table like the PostID, ForumID, GroupID and PostDate), so, I was thinking that it might be better to have a separate table for each group's forum posts in order to have small indexes so that I have faster searches and inserts take less time (esp. that some groups are expected to have a large number of posts per day) and also to be easier to move the tables to other database servers in case the application grows and so the web farm. Now I'm really confused how to design my DAL. Assuming that the structure of the forum posts table is like that (this is just a simplified structure not the real one): CREATE TABLE ForumPosts_x ( PostID INT IDENTITY(1,1) PRIMARY KEY, ForumID INT NOT NULL, -- which refers to the ForumID in another table named Forums which includes all forums for all groups ParentPostID INT NULL, PostSubject NVARCHAR(200) NOT NULL, PostText NVARCHAR(5000) NOT NULL, PostDate DATETIME NOT NULL DEFAULT GETUTCDATE() ) Note that there's no group ID as x in the table name will be the group id e.g. ForumPosts_19 for group id 19 Now as to designing my DAL, should I: 1. Create stored procedures with dynamic SQL and pass the group id to the procedure i.e. use EXEC and sp_executesql (there's an interesting article on the subject here: http://www.sommarskog.se/dynamic\_sql.html) For example: CREATE PROCEDURE InsertForumPost @GroupID INT, @ForumID INT, @ParentPostID INT, @PostSubject NVARCHAR(200), @PostText NVARCHAR(5000) AS DECLARE @tablename NVARCHAR(50), @sql NVARCHAR(4000) SET @tablename = N'ForumPosts_' + @GroupID SET @sql = N'INSERT INTO dbo.' + quotename(@tblname) + ' (ForumID, ParentPostID, PostSubject, PostText) VALUES (' + '@ForumID, @ParentPostID, @PostSubject, @PostText)' EXEC sp_executesql @sql, N'@ForumID INT, @ParentPostID INT, @PostSubject NVARCHAR(200), @PostText NVARCHAR(5000)', @ForumID, @ParentPostID, @PostSubject, @PostText 2. Create the procedures with static SQL for each group.
Trust your database server. In general having loads of indexes will slow your inserts, not your selects. You won't need to do any partitioning unless you have a _lot_ of data. You can send multiple queries and get multiple responses in one round trip. The stored procedures aren't giving you any benefit here - its just creating noise. If I was designing this app i'd probably go with a Thread table inheriting Post, and giving Post a reference to the parent Thread (and possibly also the Post that was replied to - if you wanted to track that). Add your Thread table referencing Group as well for your groups. That lets you pull whole threads out off one index in one select, and Threads for each Group. Then I'd use Diamond Binding to handle my DAL...
Mark Churchill Director Dunn & Churchill Free Download:
Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio. -
Trust your database server. In general having loads of indexes will slow your inserts, not your selects. You won't need to do any partitioning unless you have a _lot_ of data. You can send multiple queries and get multiple responses in one round trip. The stored procedures aren't giving you any benefit here - its just creating noise. If I was designing this app i'd probably go with a Thread table inheriting Post, and giving Post a reference to the parent Thread (and possibly also the Post that was replied to - if you wanted to track that). Add your Thread table referencing Group as well for your groups. That lets you pull whole threads out off one index in one select, and Threads for each Group. Then I'd use Diamond Binding to handle my DAL...
Mark Churchill Director Dunn & Churchill Free Download:
Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio. -
Mark Churchill wrote:
Then I'd use Diamond Binding to handle my DAL
ROTFLMAO You so had me until that part :laugh:
led mike
Well *I* don't have to weigh up cost/benefit because I can click a button and get a license ;)
Mark Churchill Director Dunn & Churchill Free Download:
Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio. -
Trust your database server. In general having loads of indexes will slow your inserts, not your selects. You won't need to do any partitioning unless you have a _lot_ of data. You can send multiple queries and get multiple responses in one round trip. The stored procedures aren't giving you any benefit here - its just creating noise. If I was designing this app i'd probably go with a Thread table inheriting Post, and giving Post a reference to the parent Thread (and possibly also the Post that was replied to - if you wanted to track that). Add your Thread table referencing Group as well for your groups. That lets you pull whole threads out off one index in one select, and Threads for each Group. Then I'd use Diamond Binding to handle my DAL...
Mark Churchill Director Dunn & Churchill Free Download:
Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.Hi Mark, thanks a lot for your help, but would you mind explaining in more details?
Mark Churchill wrote:
You can send multiple queries and get multiple responses in one round trip.
Well, actually I doubt this could be of much benefit in my case as I'm creating a web application so my queries will always be based on user actions so there's no way to send multiple queries at the same time in my case.
Mark Churchill wrote:
The stored procedures aren't giving you any benefit here - its just creating noise.
Could you explain more please? I understand that you want me to go with one table so why not use sprocs in that case? We won't have the problem of caching query plans for every table as we're going to use only one table.
Mark Churchill wrote:
If I was designing this app i'd probably go with a Thread table inheriting Post, and giving Post a reference to the parent Thread (and possibly also the Post that was replied to - if you wanted to track that). Add your Thread table referencing Group as well for your groups. That lets you pull whole threads out off one index in one select, and Threads for each Group. Then I'd use Diamond Binding to handle my DAL...
I'm a little lost here, what exactly do you mean by inheritance here? I was going to use a ParentPostID INT NULL field in the ForumPosts table (see the CREATE TABLE section in my original post) which will refer to the thread, is that what you mean? I was also going to index that field so that I can find threads and replies fast if this what you mean by pulling out the threads. So I was exactly going to index the PostID, GroupID, ParentPostID and PostDate fields (this is why I thought about separating the data into tables, one table for each group, as I thought I would have too many indexes, 4 indexes as you can see, and this could affect the inserts performance tremendously) By the way, if one table will be used, the PostID will be by group not an identity field. I was only going to use identity fields if had a table for each group but not with a single table (this is done for scalability sake, to be easy to move data to other databases or even tables with the same structure in the same database), I'll have another table with the last id used for every group e.g. CREATE TABLE LastUsedID ( GroupID INT NOT NULL, LastUsedID INT NOT NULL ) Thanks again for all your h
-
Hi Mark, thanks a lot for your help, but would you mind explaining in more details?
Mark Churchill wrote:
You can send multiple queries and get multiple responses in one round trip.
Well, actually I doubt this could be of much benefit in my case as I'm creating a web application so my queries will always be based on user actions so there's no way to send multiple queries at the same time in my case.
Mark Churchill wrote:
The stored procedures aren't giving you any benefit here - its just creating noise.
Could you explain more please? I understand that you want me to go with one table so why not use sprocs in that case? We won't have the problem of caching query plans for every table as we're going to use only one table.
Mark Churchill wrote:
If I was designing this app i'd probably go with a Thread table inheriting Post, and giving Post a reference to the parent Thread (and possibly also the Post that was replied to - if you wanted to track that). Add your Thread table referencing Group as well for your groups. That lets you pull whole threads out off one index in one select, and Threads for each Group. Then I'd use Diamond Binding to handle my DAL...
I'm a little lost here, what exactly do you mean by inheritance here? I was going to use a ParentPostID INT NULL field in the ForumPosts table (see the CREATE TABLE section in my original post) which will refer to the thread, is that what you mean? I was also going to index that field so that I can find threads and replies fast if this what you mean by pulling out the threads. So I was exactly going to index the PostID, GroupID, ParentPostID and PostDate fields (this is why I thought about separating the data into tables, one table for each group, as I thought I would have too many indexes, 4 indexes as you can see, and this could affect the inserts performance tremendously) By the way, if one table will be used, the PostID will be by group not an identity field. I was only going to use identity fields if had a table for each group but not with a single table (this is done for scalability sake, to be easy to move data to other databases or even tables with the same structure in the same database), I'll have another table with the last id used for every group e.g. CREATE TABLE LastUsedID ( GroupID INT NOT NULL, LastUsedID INT NOT NULL ) Thanks again for all your h
I wrote:
You can send multiple queries and get multiple responses in one round trip.
This was in reference to "...which I'm highly considering but a little concerned about how to execute multiple SQL statements..." - this isn't anything to worry about. A query isn't restricted to multiple statements - so you can send "select foo; select baz" in one ExecuteDataSet() and get back a DataSet containing multiple DataTables.
I wrote:
The stored procedures aren't giving you any benefit here - its just creating noise.
The stored procedures were just performing basic CRUD operations. SQL server will cache the execution plan for your ad-hoc queries anyway. For a simplistic view, stored procedures are for providing abstraction/code reuse rather than performance (some would also say they help with security).
Waleed Eissa wrote:
I'm a little lost here, what exactly do you mean by inheritance here?
A thread is basically a post that also has some extra information, like a title, a group it belongs in, etc. Say your database structure has a table, Product (Id, Description) and ServiceProduct(Id, CostPerHour) with ServiceProduct.Id being a foreign key to Product.Id. This defines that for every set of ServiceProduct data there is Product data, meaning ServiceProduct inherits Product, which is pretty analogous to how inheritance relationships work in code. This can be handy. Using a ParentPostId field makes it difficult to pull a whole thread out the database. Given a parent post I would have to do an index scan to get the 2nd post, then again to get the 3rd, etc. Say you have this setup: Post (Id, Author, BodyText, TimePosted, ParentThreadId) and Thread (Id, Title, GroupId) Thread.Id is a fk to Post.Id (inheritance) Post.ParentThreadId is a fk to Thread.Id (reference) This means that I can easily select threads in a group (Thread by GroupId), Posts in a Thread (Post by ParentThread). If you are feeling uncomfortable with the inheritance relationship, then you could just have a Thread table that acts as a bit of a stub to group posts. It might be worth having a look at how forums like phpbb handle their database structure (considering I'm coming up with this on the fly). I'm not comfortable with the LastUsedId. It seems incredibly unlikely you would approach the 4 billion odd posts that just an int would provide. SQL Server co
-
Hi gurus, I'm creating a web application for the internet with scalability in mind. In my application, I'll have forums used by different groups of people (each group will have their own forums). I will start from 1000 or 2000 groups with the potential to grow until 5000 groups (it's guaranteed that I won't exceed that number of groups, 5000, based on the nature of my application). I was thinking that having all the forum posts in one table will cause many problems like having a very large index and slow search among other problems (as I want to have many indexes on the table like the PostID, ForumID, GroupID and PostDate), so, I was thinking that it might be better to have a separate table for each group's forum posts in order to have small indexes so that I have faster searches and inserts take less time (esp. that some groups are expected to have a large number of posts per day) and also to be easier to move the tables to other database servers in case the application grows and so the web farm. Now I'm really confused how to design my DAL. Assuming that the structure of the forum posts table is like that (this is just a simplified structure not the real one): CREATE TABLE ForumPosts_x ( PostID INT IDENTITY(1,1) PRIMARY KEY, ForumID INT NOT NULL, -- which refers to the ForumID in another table named Forums which includes all forums for all groups ParentPostID INT NULL, PostSubject NVARCHAR(200) NOT NULL, PostText NVARCHAR(5000) NOT NULL, PostDate DATETIME NOT NULL DEFAULT GETUTCDATE() ) Note that there's no group ID as x in the table name will be the group id e.g. ForumPosts_19 for group id 19 Now as to designing my DAL, should I: 1. Create stored procedures with dynamic SQL and pass the group id to the procedure i.e. use EXEC and sp_executesql (there's an interesting article on the subject here: http://www.sommarskog.se/dynamic\_sql.html) For example: CREATE PROCEDURE InsertForumPost @GroupID INT, @ForumID INT, @ParentPostID INT, @PostSubject NVARCHAR(200), @PostText NVARCHAR(5000) AS DECLARE @tablename NVARCHAR(50), @sql NVARCHAR(4000) SET @tablename = N'ForumPosts_' + @GroupID SET @sql = N'INSERT INTO dbo.' + quotename(@tblname) + ' (ForumID, ParentPostID, PostSubject, PostText) VALUES (' + '@ForumID, @ParentPostID, @PostSubject, @PostText)' EXEC sp_executesql @sql, N'@ForumID INT, @ParentPostID INT, @PostSubject NVARCHAR(200), @PostText NVARCHAR(5000)', @ForumID, @ParentPostID, @PostSubject, @PostText 2. Create the procedures with static SQL for each group.
If you are using SQL Server 2005 have a look at partitioned tables, partitioning by group and date - this gives very good performance.These are single tables, but physically split on the columns you define, allowing the underlying files to be located on different physical disks. If you are using SQL Server 2000 take a look at partitioned views - not quite so friendly as partitioned tables, but good never the less. This is really a view over multiple tables, so table structure changes are a bit of a pain, but done properly you can insert into the view and it will add the record to the correct underlying table. In either case you will be able to have a single stored proc to insert, and, although SQL Server will cache ad-hoc sql execution plans I would be very reluctant to use anything other than stored procs for data access. They provide a good degree of security against sql injection and are a single source of data, so any changes are abstracted from your code. Hope some of this makes sense and helps.
Bob Ashfield Consultants Ltd
-
I wrote:
You can send multiple queries and get multiple responses in one round trip.
This was in reference to "...which I'm highly considering but a little concerned about how to execute multiple SQL statements..." - this isn't anything to worry about. A query isn't restricted to multiple statements - so you can send "select foo; select baz" in one ExecuteDataSet() and get back a DataSet containing multiple DataTables.
I wrote:
The stored procedures aren't giving you any benefit here - its just creating noise.
The stored procedures were just performing basic CRUD operations. SQL server will cache the execution plan for your ad-hoc queries anyway. For a simplistic view, stored procedures are for providing abstraction/code reuse rather than performance (some would also say they help with security).
Waleed Eissa wrote:
I'm a little lost here, what exactly do you mean by inheritance here?
A thread is basically a post that also has some extra information, like a title, a group it belongs in, etc. Say your database structure has a table, Product (Id, Description) and ServiceProduct(Id, CostPerHour) with ServiceProduct.Id being a foreign key to Product.Id. This defines that for every set of ServiceProduct data there is Product data, meaning ServiceProduct inherits Product, which is pretty analogous to how inheritance relationships work in code. This can be handy. Using a ParentPostId field makes it difficult to pull a whole thread out the database. Given a parent post I would have to do an index scan to get the 2nd post, then again to get the 3rd, etc. Say you have this setup: Post (Id, Author, BodyText, TimePosted, ParentThreadId) and Thread (Id, Title, GroupId) Thread.Id is a fk to Post.Id (inheritance) Post.ParentThreadId is a fk to Thread.Id (reference) This means that I can easily select threads in a group (Thread by GroupId), Posts in a Thread (Post by ParentThread). If you are feeling uncomfortable with the inheritance relationship, then you could just have a Thread table that acts as a bit of a stub to group posts. It might be worth having a look at how forums like phpbb handle their database structure (considering I'm coming up with this on the fly). I'm not comfortable with the LastUsedId. It seems incredibly unlikely you would approach the 4 billion odd posts that just an int would provide. SQL Server co
Mark Churchill wrote:
I wrote: You can send multiple queries and get multiple responses in one round trip. This was in reference to "...which I'm highly considering but a little concerned about how to execute multiple SQL statements..." - this isn't anything to worry about. A query isn't restricted to multiple statements - so you can send "select foo; select baz" in one ExecuteDataSet() and get back a DataSet containing multiple DataTables.
Well, I'm sorry, I probably should've explained it more clearly, actually what I meant here is that when you use stored procedures you can easily use many sql statements in the same procedure for example: begin transaction insert into foo update foo2 set .. .. etc This is very easy with stored procedures but I guess not so easy with ad-hoc sql statements, this is what I meant to say
Mark Churchill wrote:
A thread is basically a post that also has some extra information, like a title, a group it belongs in, etc. Say your database structure has a table, Product (Id, Description) and ServiceProduct(Id, CostPerHour) with ServiceProduct.Id being a foreign key to Product.Id. This defines that for every set of ServiceProduct data there is Product data, meaning ServiceProduct inherits Product, which is pretty analogous to how inheritance relationships work in code. This can be handy. Using a ParentPostId field makes it difficult to pull a whole thread out the database. Given a parent post I would have to do an index scan to get the 2nd post, then again to get the 3rd, etc. Say you have this setup: Post (Id, Author, BodyText, TimePosted, ParentThreadId) and Thread (Id, Title, GroupId) Thread.Id is a fk to Post.Id (inheritance) Post.ParentThreadId is a fk to Thread.Id (reference) This means that I can easily select threads in a group (Thread by GroupId), Posts in a Thread (Post by ParentThread). If you are feeling uncomfortable with the inheritance relationship, then you could just have a Thread table that acts as a bit of a stub to group posts. It might be worth having a look at how forums like phpbb handle their database structure (considering I'm coming up with this on the fly).
I like your idea about having a separate table for threads, I think this can speed things up as we can have less indexes on the same table, it might just be harder to maintain though as you have the data in two tables but I still like the idea