LINQ (un)optimization

Mike Marynowski

I just finished using LINQ for a small web project, and I was wondering if any of you have solutions or suggestions: a) LINQ doesn't appear to understand what primary keys are for Here is the designer generated property for the unique ID for a "File" table:

[Column(Storage="_FileID", AutoSync=AutoSync.OnInsert,
DbType="Int NOT NULL IDENTITY", IsPrimaryKey=true, IsDbGenerated=true)]
public int FileID
{
[...]
}

so it is defined properly. Yet, when I delete (by using DataContext.Files.DeleteOnSubmit()), using SQL Profiler I noticed that the delete command contains ALL the columns:

exec sp_executesql N'DELETE FROM [dbo].[File]
WHERE ([FileID] = @p0) AND ([Filename] = @p1) AND ([Date] = @p2)
AND ([Type] = @p3)',N'@p0 int,@p1 varchar(17),@p2 datetime,
@p3 varchar(11)',@p0=61,@p1='Autumn Leaves.jpg',@p2='2008-04-02 13:01:01:970',
@p3='image/pjpeg'

WHY?!?!? There is a primary key for god sakes! Same goes for updates - there WHERE clause contains the current value of every column. b) LINQ needs to select and load data before updating or deleting: To delete something, you need to select it first. That causes it to load everything you want to delete into memory and then iterate through the results one by one, sending off a SQL delete statement. Why didn't they just make another keyword called "delete" that you could use instead of "select"? That way, if I want to delete everything based on a condition, it can formulate one SQL command and send it off the same way it does when selecting? If I want to delete an entire table, it will actually load the entire table into memory and then send off SQL delete commands row-by-row! Call me crazy, but it doesn't seem like that much more work to implement a delete/update LINQ operator once you already have the select down pat. c) Why is the database name a hard-coded attribute? By default, the connection string includes the database name anyway...so...why hard code it as an attribute? What if I want to deploy my application to a location that already has a different database with that name? Or what if I want my application to create multiple databases with the same schema? As far as I can tell, there is no way for the DataContext.CreateDatabase() method to create a database with a different name. Please correct me if I'm wrong. ------------------------------------------------------- I was really really excited about LINQ when I first

Colin Angus Mackay

mmGoDLiKe wrote:

WHY?!?!? There is a primary key for god sakes!

As I understand it, the reason is so that if the row has been updated by another process it does not get deleted. It is a safety mechanism to ensure that you only delete what you think you are deleting. I don't know the answer the other two questions.

Upcoming FREE developer events: * Developer Day Scotland Recent blog posts: * Mixins in C#3.0 My website | Blog

Mike Marynowski

Well...that safety mechanism creates way too much overhead to be a default. If anything, it should be an optional setting you can set through a property somewhere. In 99% of applications, I don't care if it was updated or not...I just want it deleted if a user wants it deleted. Now that you mention it...has anyone played with the "Time Stamp" property you can assign to a column using the LINQ designer? The description says "Specifies whether the deatabase column is a time stamp type used for concurrency control" - perhaps if you create a timestamp column in each table, it will go by the PK and the timestamp for deletes/updates?

Mike Marynowski

OK, so I've confirmed - that is definately one way to do it. The other, if you don't care whether data has been updated or not and you want to disable the check altogether is to set the "Update Check" property on every column to "Never"!

Pete OHanlon

mmGoDLiKe wrote:

Now that you mention it...has anyone played with the "Time Stamp" property you can assign to a column using the LINQ designer?

Yes. This is the way I manage concurrency in all of our tables. Set all of the other columns never check and set this property to read only. It reduces the burden in all update/deletes. BTW - you do avoid actually performing deletes don't you? It's much better to mark a record as having been deleted rather than actually deleting it - it's so much better from the point of view of referential integrity.

Deja View - the feeling that you've seen this post before.

My blog | My articles

Mike Marynowski

Why wouldn't I actually delete the record? All FK's are set to cascade on delete, and if the relationship goes the other way, I ensure I delete all referenced values as well. Why would I want a ton of deleted records sitting in my database? Not to mention, I'd have to add a "WHERE IsDeleted=0" to every query.

Pete OHanlon

Argggh. Cascading delete - or (to put it another way) unintentionally taking out records you had no intention of taking out because of a foreign key relationship. Believe me, I've seen this one go bad many many times.

mmGoDLiKe wrote:

I'd have to add a "WHERE IsDeleted=0" to every query

Well, that's the beauty of Linq. I've blogged about refining queries here[^] and you can get an idea on how deferred execution helps you here.

Deja View - the feeling that you've seen this post before.

My blog | My articles

Mike Marynowski

Well...yes...I suppose this can happen if you screw something up and then don't properly test an application...but that's what testing is for, no? Leaving large amounts of "deleted" data in the database isn't the solution, imho. It would be pretty obvious, pretty quickly during testing (assuming you are doing that properly) if your queries weren't setup properly and you were deleting records unintentionally. Masking poorly tested queries by leaving data in the database is, in my opinion, an odd solution to suggest. I may be new to LINQ, but I'm not new to SQL. I read through your blog entry, and I certainly wouldn't do that in a non-LINQ SQL based application, and I certainly won't with LINQ either. Just my opinion, of course :) EDIT: this is, of course, assuming you don't need a "recycle bin" facility for your data. This is great if you want to implement a recycle bin, but I think maintaining referential integrity should be solved by testing your application, not by leaving unwanted data in your database.

Pete OHanlon

mmGoDLiKe wrote:

Well...yes...I suppose this can happen if you screw something up and then don't properly test an application...but that's what testing is for, no?

You trust testing for this? Well, good luck then.

mmGoDLiKe wrote:

Masking poorly tested queries by leaving data in the database is, in my opinion, an odd solution to suggest.

It's not masking poorly tested queries - it's ensuring that you don't end up screwing your clients over. One case I remember involved a person doing a one off datacleanse, and accidentally deleting about 100K rows of data because they had a cascading delete on a lookup field. Thank god the client had a backup.

mmGoDLiKe wrote:

I think maintaining referential integrity should be solved by testing your application, not by leaving unwanted data in your database.

In a perfect world, yes. But we both know that you are putting too much onus on the testing team to catch all of these cases, and we can't ignore the idiot factor. On this one, I think we are both going to have to agree to disagree (and yes, I'm an old Sql developer - 20+ years, so I've seen lots of really bad Sql). BTW - there are also technical reasons to avoid cascading deletes - large scale deletes can end up involving large scale index rebalancing and there is an overhead in checking entries when deleting. If you must delete, don't rely on cascading deletes - manage it yourself and make sure that you don't accidentally delete vital information.

Deja View - the feeling that you've seen this post before.

My blog | My articles

Mike Marynowski

Pete O'Hanlon wrote:

You trust testing for this? Well, good luck then.

Pete O'Hanlon wrote:

It's not masking poorly tested queries - it's ensuring that you don't end up screwing your clients over.

Proper testing *should* catch anything that could occur easily or often. If there is a mistake somewhere that causes catastrophic data loss at some point, then one of the nightly backups that are enforced at every location running one of my applications can be used to recover from it.

Pete O'Hanlon wrote:

BTW - there are also technical reasons to avoid cascading deletes - large scale deletes can end up involving large scale index rebalancing and there is an overhead in checking entries when deleting.

Yes, of course. I won't argue that there aren't situations where it is appropriate. But depending on the applications turnover of database records, that can very quickly grow out of control. It is just overkill at times. FK's with cascading deletes obviously need to be analyzed very closely to ensure unwanted data isn't being deleted. One can argue that is just as easy, if not easier, to accidentally delete records "manually" via more complicated SQL commands (or LINQ commands). Setting a cascade option on a FK basically guarantees the proper associated records will get deleted, instead of having to test a custom query to ensure it isn't running around deleting unrelated records. It means that as long as my design is sound (and documented for the developers), I don't need to worry about the cascading delete being properly implemented later on "manually." I'll have problems either way if my design is incorrect, but it eliminates a possible point of error in the implementation. Either way, proper documentation and design all the way through will help avoid any possible data loss nightmares. I can see the merit of both our views, and if I had someone else develop an application, perhaps I would be more worried. I DO trust my testing, however, and I am willing to bet my company that any development projects I undertake will not suffer from accidental data deletes. I have very strict processes and policies in place to prevent something like that from happening. So, I think we will indeed need to agree to disagee.

Pete OHanlon

mmGoDLiKe wrote:

I DO trust my testing, however, and I am willing to bet my company that any development projects I undertake will not suffer from accidental data deletes. I have very strict processes and policies in place to prevent something like that from happening.

I wish my clients were more like you :-D Unfortunately, in a lot of cases - I don't trust THEIR testing.

Deja View - the feeling that you've seen this post before.

My blog | My articles