Perhaps I should write an article

Lost User

BillWoodruff wrote:

This is the kind of discussion, on the Lounge, that I wonder if: it might be of "more enduring" value, to CP, if it were being held on (or copied over to ?) a specific technical forum; in this case; the "Database & SysAdmin/Database" forum would seem a logical place.

I thought about that, but it appeared a little out of place because I did not really have a specific question to ask. It's more that it appears strange that my answer to some common problems appears the opposite to what generally is seen as the right practice.

BillWoodruff wrote:

I found it interesting, in observing my reaction to the original post, that I had a kind of "visceral" reaction to the idea of initially filling data objects with a "certainly invalid" value. That seemed very strange to me: a kind of violation of "parsimony." But, perhaps I have not had enough dances with "null" ?

Why? That's one of the oldest tricks in the book. :) Accidentally overlooking a property of the data object and not filling it can happen, especially when it has more than just a handful of them. If the default value were valid, this would pass validation and the bug could go unnoticed for a longer time. By setting deliberately to an always invalid value I make sure that validation will certainly fail and reveal that this value was not filled at all. The same already happened with a web service. The client program used an outdated service description and one property of the data object was not filled when the data object was constructed on the server side. There was no exception or any other kind of error. It just was left as it was. Well, it did not get far before being noticed.

Lost User

Your first example already has something in its included libraries that I would not like to see: #include On a server different requests may be processed in separate threads. The data objects themselves should not be problematic, since there should be no shared access to them that needs to be synchronized. However, if the objects that do the validation are separated and shared by all data objects, then they must be thread safe. Even then I would not try to reach this by synchronization. Instead, I would try to initialize their state (which would be the values in the validation properties) early and then leave them unchanged during the entire runtime of the application. I would even assure this by not accepting any new values for the properties once the validator has been inserted into the collection. Except for the short time when they are initialized, the properties of the validators will be something like constants and not require any synchronization. Your last example already looks very similar to the base class of my data objects, down to using std::vector as container for the data fields. I have not separated the property definition from the properties yet, but it seems to be a good idea to share it between all data objects of the same type. It would be wasteful to let each data object have its own set of identical property definitions / validators. How about initializing them in a static constructor?

Espen Harlinn wrote:

In my opinion you should use a default 'correct' value, where 'correct' needs to be defined ...

This leaves the possibility that we actually mean the same thing. I generally don't trust other layers or even other applications to do everything the right way. If I initialize the properties with values that never can be valid, the data object will certainly fail validation and if I see this value I know that this property was not touched. In most cases this is just a simple bug that may otherwise have gone unnoticed for a while, but when web services are involved, this may also be a security issue. Hackers try all kinds of things to make your server fall on its nose, hoping to get a foot into the door when they succeed.

Espen Harlinn

So, you don't want to use a mutex. In my mind that means you are implementing a specialized set of classes - which is OK, I was just thinking about a general relatively easy to consume interface; which also means that the data pointer should have been atomic too.

CDP1802 wrote:

This leaves the possibility that we actually mean the same thing.

Most likely ...

CDP1802 wrote:

then they must be thread safe.

Right, and what exactly does threadsafe mean? As long as you ensure that the definition data does not change, it will be threadsafe without any kind of lock - which it seems you understand well enough. :-D

CDP1802 wrote:

Your last example already looks very similar to the base class of my data objects,

Which is hardly surprising given that anybody who has dabbled with meta data sooner or later ends up with something similar, or fails.

CDP1802 wrote:

when web services are involved, this may also be a security issue.

I prefer protocols consisting of fixed sized messages - when that's possible; which so far has turned out to be surprisingly often.

CDP1802 wrote:

Hackers try all kinds of things to make your server fall on its nose, hoping to get a foot into the door when they succeed.

That's usually easier done attacking known bugs in widely used applications/servers. Scada systems are usually managed by automation engineers - so it doesn't matter if Siemens figure out how to patch their solutions, because the patches will only be applied to a small fraction of systems running their software. I can also think of a number of DBA's that don't patch their systems too. Just google "<product known to be running on a server> vulnerability", and you usually end up with more than a few hits - quite a few of them includes descriptions on how said vulnerability can be used to execute the code of your choice on a remote server. Once you have stuff running on the computer, you could perhaps try to exploit Vulnerabilities in Windows Kernel Could Allow Elevation of Privilege[^], which impacted Windows XP though Windows 2008 R2 Se

Vivi Chellappa

I think you need to complicate things even more. How can you use just the factory method? Don't you need an Abstract Factory too? There is plenty of MIPS (millions of instructions per second) in today's computer. We need to employ those MIPS. Otherwise, they are just wasted as time goes by.

Lost User

You are right. A collection of properties also is too ordinary. How about using the decorator pattern? Every property then will be added with its own decorator. And then we will use the ... So you think I'm overdoing it? Distributing all those things over other layers, perhaps even redundantly or inconsistently, is better?

jschell

CDP1802 wrote:

I think each property should be initialized to a certainly invalid value,

There are two problems with that. First it assumes that default valid values do not exist. Second it assumes that all data types will always have an invalid 'value'. That of course if a false presumption.

CDP1802 wrote:

Anyway, the initially invalid values help in detecting bugs when properties of the data objects are accidentally not filled.

So will unit tests and system tests. Which you must have anyways.

CDP1802 wrote:

In the constructor of the data object this field and all others that are needed

This idiom is not always suitable for object initialization. For example if there are several methods that need to fill in different data for one object then if construction is the only way to set the data then one would need to come up with a different container for each method to collect the data first.

CDP1802 wrote:

The code to implement the base class of the data objects

Best I can suppose is that you are suggesting using inheritance for convenience and nothing else. And that is a bad idea. You should be using helper classes and composition.

CDP1802 wrote:

They make preparing new data objects that pass validation much easier.

I doubt that assertion. Validation can encompass many aspects including but not limited to pattern checks, range checks, multiple field checks, cross entity checks, duplication checks, context specific checks, etc. There is no single catch-all strategy that allows one to solve all of those.

CDP1802 wrote:

A small dispute with some Java developers over this (I actually only wanted to add a validation method to the data objects, not the whole bible) also cost me my last job in the end. Anything but struct-like data objects was not their 'standard'.

Presuming that you did in fact want to do nothing but add simple validation then their stance was idiotic. However you could have just as easily created a shadow tree that mimicked all of the data objects to provide validation.

CDP1802 wrote:

and now the whole world is religiously imitating the guru's 'standard'?

Lost User

jschell wrote:

There are two problems with that.

First it assumes that default valid values do not exist.

Second it assumes that all data types will always have an invalid 'value'. That of course if a false presumption.

No assumptions at all. At initialisation I want to have each property set to a value that says 'this property has not yet been filled'. I also do not assume that the data objects, wherever they may come from, have been properly filled and checked. When I encounter a 'not filled' value, I know that there is something wrong. I do not want this to go undetected and quietly use a valid default. That's sweeping an existing problem under the rug. As to the values themselves: Fortunately there are such things as 'NaN' for numerical types and you can also define values for that purpose which are highly unlikely ever to be needed. How often did you need a DateTime with a value like 23 Dec 9999 23:59:59?

jschell wrote:

So will unit tests and system tests. Which you must have anyways.

Having seen often enough how unit tests are treated (especially when deadlines are close), I don't invest too much trust in them. Even then a unit test will have a hard time detecting an omission when it has been filled with a valid(!) default. And, by the way, a unit test that tests a single validation method like I want can already be a nightmare. Just think of a data object with dozens of properties with more complex validation rules. Having the same nightmare in every layer redundantly does not really make anything better. Anyway, I'm much more concerned what happens at runtime. I have seen too many unprecise specification, unexpected data or even 'clever' users that made a sport out of trying to crash the server. That particular application had no unit tests at all, but extensive diagnostics and logging under the hood. My last test went over the entire productive database and was repeated until the job was completed sucessfully. And then it ran without a single incident for years until I left the company. I must have done something right.

jschell wrote:

I doubt that assertion. Validation can encompass many aspects including but not limited to pattern checks, range checks, multiple field checks, cross entity checks, duplication checks, context specific checks, etc. There is no single catch-all strategy th

etkid84

only instantiate a typed container that you know will have all of it's fields filled with valid data at the time you instantiate it? this reminds me of some user interfaces that have menu items grayed out instead of not being present at all because it's inappropriate or there is no reason to have them exist in the list. what say you?

David

Member_5893260

I think there's probably a trade-off between how smart it gets and how inefficient it becomes. Beware of doing anything which requires another programmer to have to spend a week learning how your stuff works before he can do anything with it. Unless there's a marked gain in security or elegance, it's probably not worth going to an extreme with it. Also note that by doing this, you're forcing the data model to conform to your ideas - i.e. it becomes harder to break the rules when you need to. I'd be careful of deciding ahead of time that all data will always conform to the way these objects work. Again, it's a trade-off... but make sure it's still efficient and still flexible.

RafagaX

CDP1802 wrote:

Who guarantees that the validation in the UI was complete and correct or was done at all?

I believe you're a bit paranoid... ;P Seriously, I believe the idea is good, but given that it will introduce some overhead in the normal development workflow, you must first justify why you want such validations, then make them generic enough so they can be used with (ideally) no modification in any project and finally release them as a nice open source (MIT licensed) library/framework. :)

CEO at: - Rafaga Systems - Para Facturas - Modern Components for the moment...

jschell

CDP1802 wrote:

At initialisation I want to have each property set to a value that says 'this property has not yet been filled'.

I understand your proposed solution. Because I have done it. And tried various variations as well. And the ONLY way to do it for all cases is to have a second flag property for every real property which indicates whether it has been set yet.

CDP1802 wrote:

I do not want this to go undetected and quietly use a valid default. That's sweeping an existing problem under the rug.

A default value is often an appropriate solution and I haven't seen any evidence that the problem that you are attempting to solve is significant. (I should note that I create a lot of apis that use data transfers objects and have been doing so for years.)

CDP1802 wrote:

As to the values themselves

By definition a magic value is magic. The value chosen doesn't alter that it is intended to be magic.

CDP1802 wrote:

I must have done something right.

And you are suggesting that the only reason for this success is this proposed idiom?

Lost User

"I'm talking about data objects, those objects which are passed between all layers of an application from the UI down to the database." I thought "data objects" (generally) only moved between the DAL (data access layer) and the database. It was the "business object" that talked to the DAL via the "business layer" (or "model") and talked to UI via the presentation layer (or "view") (and vise versa). By themselves, data objects have no knowledge of referential integrity or what is required to complete a "transaction" (which may require "many" data objects); that's the domain of the business object and it's (business) "rules". A data object might contain some "basic" validations, but it can't know all the possibities without having some idea of the overall context it is operating in (and which may change as the transaction is being constructed).

kelton5020

If you're rewriting in c++ with the CLR you have Nullable types in which any object can be null(This is how most ORMs deal with ints and bools etc).

Lost User

Thanks, but the point is to port everything away from Microsoft. And in unmanaged C++ every type is nullable, isn't it?

kelton5020

Ah ok. In natural C++ though, ints,bytes,chars,and bools aren't nullable as far as I've ever known.

Lost User

It's called a pointer :)

kelton5020

Yeah, but a pointer isn't any specific type, it's a reference to a location in memory.

Lost User

How about int * PointerToSomeInteger = NULL;

kelton5020

Yeah you could create pointers for doing nulls, but it would probubly make more sense to have some sort of default convention.

int MyInt = 0;
if(MyInt == 0)//equivalent to null

or if you need to use 0

int MyInt = -1;
if(MyInt == -1)//null

or if you need the entire integer

intMyInt = 0;
bool isIntNull = true;
//do work here
if(isIntNull)//int null, reguardless of the ints value

OR(it's pretty damn basic, and I just threw it together in notepad so it may not compile, but the idea would work)

template
class Nullable{
bool isNull;
theType Value;
public:
Nullable(){
isNull = true;
}
bool IsNull(){
return isNull;
}
theType GetValue(){
return Value;
}
void SetValue(theType val){
isNull = false;
Value = val;
}
};

Paul Michalik

Have a look at boost::optional, these things tend to be tricky in today's C++. I've a question to the thread author: You've apparently decided to rewrite a working piece of software, to "move away from Microsoft", if I may paraphrase... It appears to be more like "moving from managed to native", but why? This kind of thing is where the managed world offers you a fast, efficient and safe way to get your job done. Trying to rewrite this in native "bare metal" C++ is awkward, error prone and lengthy...