nULL OR nOT?

Lost User

_Josh_ wrote:

Because the DB is (logically) storing two values

Well, I suppose you could say that it is logically but that's just describing what it achieves. In practice there is a specific property of a column (NULL) that describes its contents as having no value. I neither know, nor care, how it handles this internally. All I need to know is that, if a column is NULL, then that represents the fact that we do not know the value of that column. You seem to be saying that using a sentinal value may be preferable because the database might have to use some sort of boolean behind the scenes so that it knows the value is null? But surely better to let the Db do that ( after all, it is optimised specifically for that functionality) than it is for your application to continually test for some magic number. i.e. isn't

Select columns from table where column1 is not null

just plain better than

Select columns from table join specialValuesTable on table.column1 = specialValuesTable.Id
where specialValuesTable.RepresentsANullValue = 1

Or even

Select columns from table where column1 <> -1

? And I agree - sending an additional bool is always an overhead, although depending on the method of communication, you may have to use some sentinel value - which may end up being larger than the overhead of sending a bit to represent null!

MVVM # - I did it My Way ___________________________________________ Man, you're a god. - walterhevedeich 26/05/2011 .\\axxx (That's an 'M')

Mark_Wallace

_Maxxx_ wrote:

The argument I get is that "the customer does have an address, but we just don't know what it is yet."

Which is actually correct, and quite a nice way of stating a technical point in plain language. The empty string represents an unset address, whereas null indicates that there is no address. Unsurprisingly, most people don't think "does a field have a value or not?", they think "do we know the address?". Only developers are brought up to detach data from reality, and DBAs don't have to be developers, but they do have to know their data.

I wanna be a eunuchs developer! Pass me a bread knife!

Lost User

Silly people said:

the customer does have an address, but we just don't know what it is yet.

That is the worst reason ever, unless you really want to distinguish between "customers with an address that you don't know" and "customers without any address". If you don't (and you probably don't), then (apart from any technical reasons there may be) there's absolutely no difference between using a real null or a "pseudo null" like an empty string, because you're using both to mean "there's no thing here". Any additional meanings such as "customer is purely virtual and has no location in the universe" are purely imagined.

BobJanova

The answer to that is to fix the VM code, though. A view model which crashes when given a null has a serious bug, imo.

BobJanova

The null value is there for a good reason: to indicate that the record has no known value for that field. Using something which is a valid value for the column type in question is doing exactly the same thing as using a null, except that (i) it's confusing to someone who doesn't know that '', -1 etc is a 'fake null', and (ii) it can screw up aggregation (e.g. if you use 0 instead of NULL in a numeric column and then do an average over it). If it's valid for a piece of information to be unknown, it should be represented by a null value. There are people who are scared of nulls in the code-side development, as well. For a primitive type it might be fair enough for data size and convenience reasons (boxing primitive types into Integer or equivalent in Java, or using Nullable<int> in .Net, does have a cost, albeit small). But returning string.Empty when you mean 'unknown' is incorrect (that would mean it does have a value, and that value is empty), and results in hackery every bit as nasty as null checking when you want to check for it. The one place I do avoid nulls is with floating point types, where I slightly abuse NaN to mean 'missing value'. It has all the right semantics in arithmetic operations, it's native in the type and it's easy to check for in aggregations.

_Maxxx_ wrote:

The argument I get is that "the customer does have an address, but we just don't know what it is yet."

That's where a null is appropriate. When it's 'the customer has no address' then 'no address', which could reasonably be represented by empty if you want to maintain a difference between 'unknown' and 'none', should be represented by an actual value which means 'none'. Missing information is what null is for.

Rob Grainger

Once again, this is all a failure of most databases to implement the relational model correctly. Codd's original model had two null values: A-Marks and I-Marks. An A-Mark (Missing and Applicable) indicates a value is unknown for the current object, but applicable to that object (e.g. age for a person). An I-Mark (Missing and In-applicable) indicates a value is not applicable to the current object, (e.g. commission for someone who does no sales activity). Missing Values in RDB[^]

"If you don't fail at least 90 percent of the time, you're not aiming high enough." Alan Kay.

cjb110

Was about to say if you could have a valid customer without an address then...but it still doesn't make sense to not just use NULL. If you can actually have a customer without an address then this would be better indicated with a separate 'NoAddress' bool/bit field, as its a specific bit of data. As for the price thing thats just confusing as hell...-1 would be slightly better (as at least a reader would be forced to think 'what the hell?' for a sec), but 0 in amount field that doesn't mean '0 amount of' is just nuts.

Lost User

What annoys me about the customer thing is that, rather than creating an address record with empty columns (which is daft but forgivable) they created a single record with a zero Id and refer all customers with unknown addresses to that single record - so the data tells you "All of these customers live at the same address - which happens to have empty strings as the details" Which is just poor modelling.

MVVM # - I did it My Way ___________________________________________ Man, you're a god. - walterhevedeich 26/05/2011 .\\axxx (That's an 'M')

Lost User

_Maxxx_ wrote:

I've also come across the situation where, rather than having a null value in a 'sold price' column, the value of zero represents an unsold item.

I'd like to address this example. If '0' represents that the item hasn't been sold then the database design problem isn't a NULL problem. The problem at that point is assigning more than one meaning to a database field - something the NULL designation wouldn't remedy. In a normalized database that record would have a corresponding record in another table that would contain the details of the sale. The presence of that record would indicate the item was sold whereas the lack of a sale record would indicate that it wasn't sold. Overloading the sale price field with more than one meaning is bad design.

_Maxxx_ wrote:

What's wrong with a null value which tells you explicitly "The customer has no address"

Questions: 1: How does the address appear to the end user in the UI? Do you change the UI based on whether or not the address is NULL or merely empty strings? 2: What happens in the UI if the address is partially filled in? If the first line is supplied but the second line is NULL does the second line show up as a different color or something? On a line per line basis how do I tell, in the UI, if the line is NULL or merely empty? 3: What if the customer had an address but lost it? Can the program user set the address fields back to NULL to indicate that the customer no longer has an address? 4: What if a user accidently enters an address but then removes it? Does that mean, for database purposes, that the account had an address but now doesn't have one or does it mean that the account never had an address? What would NULL mean for those values? As per my initial example, I think giving a field additional meaning is a rat's nest of problems. The NULL thing may have some uses, but it's an academic curiosity as far as I'm concerned.

Lost User

MehGerbil wrote:

The problem at that point is assigning more than one meaning to a database field - something the NULL designation wouldn't remedy.

It would remedy it perfectly. In your example I think you are suggesting that the database wold have a table containing just the sale price, while another table would hold details of the sold item (or, actually, the item Id) and the sale to which it belongs, etc. As the sale price is an intrinsic part of the line item in most cases I'd suggest that the 'invoice line' table would include the sale price as a column, and would not hold a reference to another table.

MehGerbil wrote:

: How does the address appear to the end user in the UI? Do you change the UI based on whether or not the address is NULL or merely empty strings?

That shouldn't make any difference to the Data design. The UI is a representation that should be able to change independently. If the users want to see blanks on the screen, or big red starts, or the words "address unknown" , or whatever, that should not affect our DB design in any way.

MehGerbil wrote:

: What happens in the UI if the address is partially filled in? If the first line is supplied but the second line is NULL does the second line show up as a different color or something? On a line per line basis how do I tell, in the UI, if the line is NULL or merely empty?

Again, you're asking UI questions - this shouldn't change our data design.

MehGerbil wrote:

3: What if the customer had an address but lost it? Can the program user set the address fields back to NULL to indicate that the customer no longer has an address?

That's a business decision - can an address be deleted once it's added? Again, no effect on data design

MehGerbil wrote:

4: What if a user accidently enters an address but then removes it? Does that mean, for database purposes, that the account had an address but now doesn't have one or does it mean that the account never had an address? What would NULL mean for those values?

It depends whether the requirement for history tracking is there or not. If the system needs to know that there used to be an address and now there isn't then we need to keep history. If we need to keep history then ab address hist

Ennis Ray Lynch Jr

The databases in question used VB6 for the front-end? (Zero is common for extremely low values of null in VB6). Primary reason that I don't like to use 0 for a key value and start from one, don't want to break anything. Now, as far as not null goes, I prefer the fourth normal form in cases you have described. I will leave the understanding as an exercise for the reader.

Need custom software developed? I do custom programming based primarily on MS tools with an emphasis on C# development and consulting. "And they, since they Were not the one dead, turned to their affairs" -- Robert Frost "All users always want Excel" --Ennis Lynch

Lost User

_Maxxx_ wrote:

It would remedy it perfectly. In your example I think you are suggesting that the database wold have a table containing just the sale price, while another table would hold details of the sold item (or, actually, the item Id) and the sale to which it belongs, etc.

As the sale price is an intrinsic part of the line item in most cases I'd suggest that the 'invoice line' table would include the sale price as a column, and would not hold a reference to another table.

Actually, I was taking a larger view of the issue than what you've stated here. An item could be sold, but it could also be donated, destroyed, lost, transferred, or disassembled into parts. My other table would be expressing how the item was 'released', means by which it was 'released' and other details. The item could be sold for a negative value, perhaps part of a package deal. NULL indicating 'not sold' seems rather limiting. Additionally, 'price' in my sphere may include two different amounts, two different payment methods, and perhaps two different payees which are summed to get the final total price. I'd have a table of 'transactions' that make up the 'sold price' and so 'sold price' wouldn't be on that item table. A fundamental rule of database design is that fields aren't overloaded with multiple meanings. When NULL for sale price means there is no sale price that is fine. When NULL for sale price means the item hasn't been sold that is a bad thing.

_Maxxx_ wrote:

Again, you're asking UI questions - this shouldn't change our data design.

I'm using the UI to illustrate a point about the data design. If NULL is going to mean something then that needs to be expressed to the users via the UI. If the users don't care about the academic difference between 'never had a value' and 'currently has no value' then I have to wonder if the database design is being made to conform to a textbook or if it's being designed to meet the user's needs.

_Maxxx_ wrote:

That's a business decision - can an address be deleted once it's added? Again, no effect on data design

I pointed out the issues that NULL can raise in the UI to illustrate that decisions made in the database design impact the application throughout and if NULL is going to mean a specific thing on a field then the developer has to have a way to communicate that to the use

jschell

_Maxxx_ wrote:

I've come across a practice in more than one company where one or more develpers insit on not using null values in the database.

So at best they are ignorant.

_Maxxx_ wrote:

The argument I get is that "the customer does have an address, but we just don't know what it is yet."

The database is a persistent store - not a god. It "models" the data objects. And obviously if they want "We don't know what it is" then that is exactly what they should put in the database.

_Maxxx_ wrote:

but am willing to change my view if anyone can give a sound reason for any of this garbage.

One reason of course is that they do not in fact understand what null is. It confuses them. So a "sound" reason would be that you don't want to confuse them.

jschell

s_mon wrote:

I normally try to avoid null values to prevent any exceptions

The fallacy in that argument though is that there is in fact some value there which must be dealt with in a special way regardless. So you avoid a crash to exchange it for a harder to figure out data bug.

jschell

CDP1802 wrote:

- The importer could not import junk anymore

You explanation fails to explain how this was prevented. Certainly one can import 5 spaces or 5 X's into a string field or a date of June 1, 1099 and that wouldn't cause problems in the application? Conversely if one write a import tool/process that validates the data, all of the data, then this of course insures that all sorts of invalid data cannot be imported.

jschell

_Josh_ wrote:

In some systems using a well chosen sentinel value that is within the range of the data type used can save a lot of space because it doesnt require another bool to indicate null/non null.

Perhaps, but it would require there are no other impacts on the system. For example that non-null fields do not always require space in the paged record and that indexes are not impacted in the same way. And it is only applicable for a table where there is a LOT of data. And thus it would be an optimization for one table requiring it for all tables is pointless.

Lost User

MehGerbil wrote:

An item could be sold, but it could also be donated, destroyed, lost, transferred, or disassembled into parts.

I see what you're getting at - and I completely agree - it depends on requirements.

MehGerbil wrote:

When NULL for sale price means there is no sale price that is fine.
When NULL for sale price means the item hasn't been sold that is a bad thing.

In the example I was thinking of, which was auction based, there is a 'reason not sold' which could be passed-in, destroyed etc. But there is no status for 'not sold because it hasn't been offered yet' - and a record was created in the Db with a sale price of zero and (effectively) a status of 'sold' and, in fact, a BuyerId of 0 (and a buyer record pre-exists with an Id of zero). So, as you can gather, the design sucks out of the box!

MehGerbil wrote:

I'm using the UI to illustrate a point about the data design.
If NULL is going to mean something then that needs to be expressed to the users via the UI.

Again - I see what you mean. And agree. In the case above, having a Null buyer Id is more meaningful (IMHO) to show there is no buyer than having a buyer record with a description of something like 'No Buyer'. And, assuming there's no buyer, there is no sale price so it should be null and not zero. So Null isn't really representing anything other than the lack of a sale price - while the Null buyer shows there is no buyer. I guess my gripe is that people seem to sometimes avoid using Nulls because of their laziness (they can't be bothered to differentiate client-side so add a record representing no buyer, or whatever, to simplify their SQL - but really they end up with code that tests for magic Ids instead of Null.

MVVM # - I did it My Way ___________________________________________ Man, you're a god. - walterhevedeich 26/05/2011 .\\axxx (That's an 'M')

Lost User

Ennis Ray Lynch, Jr. wrote:

The databases in question used VB6 for the front-end?

Interestingly in this case the OLD platform was VB6 - and teh new Db was designed by those old VB programmers - so you're right, that could well be the reason.

Ennis Ray Lynch, Jr. wrote:

I prefer the fourth normal form in cases you have described

our database is, unfortunately, not normal in any way ;) It falls short of 3NF in many areas already!

MVVM # - I did it My Way ___________________________________________ Man, you're a god. - walterhevedeich 26/05/2011 .\\axxx (That's an 'M')

Lost User

jschell wrote:

One reason of course is that they do not in fact understand what null is. It confuses them. So a "sound" reason would be that you don't want to confuse them.

NOOOOOooooo! That is certainly not a sound reason for using bad practices - it is sound reason for education!

MVVM # - I did it My Way ___________________________________________ Man, you're a god. - walterhevedeich 26/05/2011 .\\axxx (That's an 'M')

PIEBALDconsult

I like nulls; I don't like "magic values". Disallowing nulls is not a solution to the problem.