Trigraphs and C++

Jorgen Sigvardsson

In an application I'm writing, the format of an identifier is DDDDDD-DDDD where D is digit. In certain cases the identifier is not yet known. I thought I'd display such a "NULL" instance as ??????-???? instead of just a blank field. I was very surprised to find out that the UI displayed it as ????~????. WTF? I start debugging CString et al to see if I was using some hidden/unknown escape sequence. No such thing. I really didn't expect to find anything like that, but I don't want any loose ends. Then I tried concatenating a CString into ??????-????, and it worked. So it could not have triggered some kind of escape code in the UI (a BCGSoft list control in this case). Then it dawns on me that the compiler must be the culprit for some reason. I declare const char* lpsz = "??????-????"; and lo and behold, the debugger displays lpsz as ????~????. Then I remembered the trigraphs - an old feature that I suspect FEW programmers have ever used. Turns out that ??- is the trigraph for ~. I thought this must surely be a compiler bug - why is the compiler messing with my strings? According to http://en.wikipedia.org/wiki/Digraphs_and_trigraphs[^], trigraphs are detected at stream level. A trigraph is not by itself a token! It is detected and replaced inline of the text stream, and will therefore be picked up in both code AND strings. Digraphs it turns out, are only detected on token level, meaning that strings are untouched.

-- Kein Mitleid Für Die Mehrheit

benjymous

They get detected in *comments* too, which can cause much hillarity

Help me! I'm turning into a grapefruit! Buzzwords!

Jorgen Sigvardsson

I'm hoping they are removed in the new C++ standard. If you can't type ordinary C++ symbols, you should switch terminal, NOT your sanity for typability! (is that a word? :-D)

-- Kein Mitleid Für Die Mehrheit

Saurabh Garg

Cool, I didn't even knew about trigraphs. -Saurabh

supercat9

BTW, it's often useful to split strings into separate quote-delimited parts that will be assembled at compile-time. For example, printf("\xAE" "abracadabra" "\xAF");will compile whereas "\xAEabracadabra\xAF" will likely either not compile or else yield a different string. Since ??" is not a trigraph, splitting string literals after double question marks should avoid trouble.

peterchen

Amazing how long you can do C++ without knowing suhc things... :D

Agh! Reality! My Archnemesis![^]
| FoldWithUs! | sighist | µLaunch - program launcher for server core and hyper-v server.

Lost User

I had come across this issue some years ago when trying to write C code from an IBM 3270 (?) terminal. IBM's keyboard was missing a few of the characters needed so the trigraph trick was the solution. I didn't realise it had actually become part of the language.

It's time for a new signature.