Case closed: NULLs are somehow valid in strings now?
-
I was beating my head on the desk all day over a stupid issue of a perl script not getting all of its arguments. You see... .NET strings can happily contain arbitrary NULL characters. (I didn't know that until today. Beware of this behavior.) Microsoft decided in their infinite wisdom that we programmers would just love to abandon the old NULL termination method where NULL signaled the end of a string. So that means .NET considers NULL a valid character in our strings, even though the definition of NULL is "THIS DATA DOES NOT EXIST and cannot be compared or evaluated." I was compiling a command line by concatenating multiple strings together. You can probably guess where this is going. Yes, an interop issue! Command lines ARE in fact NULL terminated. I use a WINAPI call that gets the short name for a long filename because somebody wrote a Perl script that can't handle filenames with spaces. Being a venerable old WINAPI function that's still with us since the Win95 days, it returns the result as a NULL terminated char array. So I said "return new string(buf)" to get the result as a string, and good old .NET said "Oh, that's okay, we can have NULLs in strings now!" So guess what happened? After I appended that short name to the argument string, everything after it got truncated when I passed it to a command line, and it rained on elementary school playgrounds around the world. Oh, the sadness was overwhelming. So I had to do a .Replace("\0", "") on that string before giving it to the ProcessInfo and all was golden and happy and the birds sang and rainbows issued forth from the heavens. Case closed.
-
I was beating my head on the desk all day over a stupid issue of a perl script not getting all of its arguments. You see... .NET strings can happily contain arbitrary NULL characters. (I didn't know that until today. Beware of this behavior.) Microsoft decided in their infinite wisdom that we programmers would just love to abandon the old NULL termination method where NULL signaled the end of a string. So that means .NET considers NULL a valid character in our strings, even though the definition of NULL is "THIS DATA DOES NOT EXIST and cannot be compared or evaluated." I was compiling a command line by concatenating multiple strings together. You can probably guess where this is going. Yes, an interop issue! Command lines ARE in fact NULL terminated. I use a WINAPI call that gets the short name for a long filename because somebody wrote a Perl script that can't handle filenames with spaces. Being a venerable old WINAPI function that's still with us since the Win95 days, it returns the result as a NULL terminated char array. So I said "return new string(buf)" to get the result as a string, and good old .NET said "Oh, that's okay, we can have NULLs in strings now!" So guess what happened? After I appended that short name to the argument string, everything after it got truncated when I passed it to a command line, and it rained on elementary school playgrounds around the world. Oh, the sadness was overwhelming. So I had to do a .Replace("\0", "") on that string before giving it to the ProcessInfo and all was golden and happy and the birds sang and rainbows issued forth from the heavens. Case closed.
djdanlib wrote:
the definition of NULL is "THIS DATA DOES NOT EXIST and cannot be compared or evaluated."
To me, that is more an indicator that the "NULL character" is named poorly, not that .NET should exclude it from allowed characters in strings. NULL terminating strings seems silly to me.
-
djdanlib wrote:
the definition of NULL is "THIS DATA DOES NOT EXIST and cannot be compared or evaluated."
To me, that is more an indicator that the "NULL character" is named poorly, not that .NET should exclude it from allowed characters in strings. NULL terminating strings seems silly to me.
aspdotnetdev wrote:
NULL terminating strings seems silly to me.
I agree. I've had trouble when receiving data from a serial connection.
-
aspdotnetdev wrote:
NULL terminating strings seems silly to me.
I agree. I've had trouble when receiving data from a serial connection.
IMO one needs to choose one of two extremes for comfort: 1. use a "printable protocol", i.e. only transmit printable characters, i.e. the ASCII range [0x20,0x7E] and ignore everything else (including tabs, CR, LF, NULL). Every part in the serial chain (drivers, protocol stacks, modems, ...) will let them through unmodified. 2. use binary data, i.e. make sure your serial path is fully binary and doesn't touch any byte, does not replace CR by CRLF, does not swallow NULL, etc. I normally start out with #1 as it tends to work right away. When performance would become important, I'd consider switching to #2. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.
-
IMO one needs to choose one of two extremes for comfort: 1. use a "printable protocol", i.e. only transmit printable characters, i.e. the ASCII range [0x20,0x7E] and ignore everything else (including tabs, CR, LF, NULL). Every part in the serial chain (drivers, protocol stacks, modems, ...) will let them through unmodified. 2. use binary data, i.e. make sure your serial path is fully binary and doesn't touch any byte, does not replace CR by CRLF, does not swallow NULL, etc. I normally start out with #1 as it tends to work right away. When performance would become important, I'd consider switching to #2. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.
I was communicating with some sort of device -- I think the terminal server was causing the trouble.
-
I was beating my head on the desk all day over a stupid issue of a perl script not getting all of its arguments. You see... .NET strings can happily contain arbitrary NULL characters. (I didn't know that until today. Beware of this behavior.) Microsoft decided in their infinite wisdom that we programmers would just love to abandon the old NULL termination method where NULL signaled the end of a string. So that means .NET considers NULL a valid character in our strings, even though the definition of NULL is "THIS DATA DOES NOT EXIST and cannot be compared or evaluated." I was compiling a command line by concatenating multiple strings together. You can probably guess where this is going. Yes, an interop issue! Command lines ARE in fact NULL terminated. I use a WINAPI call that gets the short name for a long filename because somebody wrote a Perl script that can't handle filenames with spaces. Being a venerable old WINAPI function that's still with us since the Win95 days, it returns the result as a NULL terminated char array. So I said "return new string(buf)" to get the result as a string, and good old .NET said "Oh, that's okay, we can have NULLs in strings now!" So guess what happened? After I appended that short name to the argument string, everything after it got truncated when I passed it to a command line, and it rained on elementary school playgrounds around the world. Oh, the sadness was overwhelming. So I had to do a .Replace("\0", "") on that string before giving it to the ProcessInfo and all was golden and happy and the birds sang and rainbows issued forth from the heavens. Case closed.
.Net strings are Unicode and the type of encoding is therefore important. Could it be that, somewhere along the way, the strings are read with the wrong encoding? This easily could produce some strange results.
A while ago he asked me what he should have printed on my business cards. I said 'Wizard'. I read books which nobody else understand. Then I do something which nobody understands. After that the computer does something which nobody understands. When asked, I say things about the results which nobody understand. But everybody expects miracles from me on a regular basis. Looks to me like the classical definition of a wizard.
-
.Net strings are Unicode and the type of encoding is therefore important. Could it be that, somewhere along the way, the strings are read with the wrong encoding? This easily could produce some strange results.
A while ago he asked me what he should have printed on my business cards. I said 'Wizard'. I read books which nobody else understand. Then I do something which nobody understands. After that the computer does something which nobody understands. When asked, I say things about the results which nobody understand. But everybody expects miracles from me on a regular basis. Looks to me like the classical definition of a wizard.
That was my thought too. Unicode encoded with UCS-2 will commonly have every other byte NULL. So, yes, you can have NULLs in a string and its perfectly vaid in that encoding.
-
I was beating my head on the desk all day over a stupid issue of a perl script not getting all of its arguments. You see... .NET strings can happily contain arbitrary NULL characters. (I didn't know that until today. Beware of this behavior.) Microsoft decided in their infinite wisdom that we programmers would just love to abandon the old NULL termination method where NULL signaled the end of a string. So that means .NET considers NULL a valid character in our strings, even though the definition of NULL is "THIS DATA DOES NOT EXIST and cannot be compared or evaluated." I was compiling a command line by concatenating multiple strings together. You can probably guess where this is going. Yes, an interop issue! Command lines ARE in fact NULL terminated. I use a WINAPI call that gets the short name for a long filename because somebody wrote a Perl script that can't handle filenames with spaces. Being a venerable old WINAPI function that's still with us since the Win95 days, it returns the result as a NULL terminated char array. So I said "return new string(buf)" to get the result as a string, and good old .NET said "Oh, that's okay, we can have NULLs in strings now!" So guess what happened? After I appended that short name to the argument string, everything after it got truncated when I passed it to a command line, and it rained on elementary school playgrounds around the world. Oh, the sadness was overwhelming. So I had to do a .Replace("\0", "") on that string before giving it to the ProcessInfo and all was golden and happy and the birds sang and rainbows issued forth from the heavens. Case closed.
Nothing's wrong with strings that can contain embedded
NULL
s. The fact thatNULL
terminated strings are so popular is mainly historic. There are times when it's handy or even necessary for a string to contain embeddedNULL
s, for example the Win32 SHFileOperation[^] function uses strings that have embeddedNULL
s.Steve