Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Other Discussions
  3. Clever Code
  4. Trigraphs and C++

Trigraphs and C++

Scheduled Pinned Locked Moved Clever Code
c++designdebugginghelpquestion
7 Posts 6 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    Jorgen Sigvardsson
    wrote on last edited by
    #1

    In an application I'm writing, the format of an identifier is DDDDDD-DDDD where D is digit. In certain cases the identifier is not yet known. I thought I'd display such a "NULL" instance as ??????-???? instead of just a blank field. I was very surprised to find out that the UI displayed it as ????~????. WTF? I start debugging CString et al to see if I was using some hidden/unknown escape sequence. No such thing. I really didn't expect to find anything like that, but I don't want any loose ends. Then I tried concatenating a CString into ??????-????, and it worked. So it could not have triggered some kind of escape code in the UI (a BCGSoft list control in this case). Then it dawns on me that the compiler must be the culprit for some reason. I declare const char* lpsz = "??????-????"; and lo and behold, the debugger displays lpsz as ????~????. Then I remembered the trigraphs - an old feature that I suspect FEW programmers have ever used. Turns out that ??- is the trigraph for ~. I thought this must surely be a compiler bug - why is the compiler messing with my strings? According to http://en.wikipedia.org/wiki/Digraphs_and_trigraphs[^], trigraphs are detected at stream level. A trigraph is not by itself a token! It is detected and replaced inline of the text stream, and will therefore be picked up in both code AND strings. Digraphs it turns out, are only detected on token level, meaning that strings are untouched.

    -- Kein Mitleid Für Die Mehrheit

    B S S P L 5 Replies Last reply
    0
    • J Jorgen Sigvardsson

      In an application I'm writing, the format of an identifier is DDDDDD-DDDD where D is digit. In certain cases the identifier is not yet known. I thought I'd display such a "NULL" instance as ??????-???? instead of just a blank field. I was very surprised to find out that the UI displayed it as ????~????. WTF? I start debugging CString et al to see if I was using some hidden/unknown escape sequence. No such thing. I really didn't expect to find anything like that, but I don't want any loose ends. Then I tried concatenating a CString into ??????-????, and it worked. So it could not have triggered some kind of escape code in the UI (a BCGSoft list control in this case). Then it dawns on me that the compiler must be the culprit for some reason. I declare const char* lpsz = "??????-????"; and lo and behold, the debugger displays lpsz as ????~????. Then I remembered the trigraphs - an old feature that I suspect FEW programmers have ever used. Turns out that ??- is the trigraph for ~. I thought this must surely be a compiler bug - why is the compiler messing with my strings? According to http://en.wikipedia.org/wiki/Digraphs_and_trigraphs[^], trigraphs are detected at stream level. A trigraph is not by itself a token! It is detected and replaced inline of the text stream, and will therefore be picked up in both code AND strings. Digraphs it turns out, are only detected on token level, meaning that strings are untouched.

      -- Kein Mitleid Für Die Mehrheit

      B Offline
      B Offline
      benjymous
      wrote on last edited by
      #2

      They get detected in *comments* too, which can cause much hillarity

      Help me! I'm turning into a grapefruit! Buzzwords!

      J 1 Reply Last reply
      0
      • B benjymous

        They get detected in *comments* too, which can cause much hillarity

        Help me! I'm turning into a grapefruit! Buzzwords!

        J Offline
        J Offline
        Jorgen Sigvardsson
        wrote on last edited by
        #3

        I'm hoping they are removed in the new C++ standard. If you can't type ordinary C++ symbols, you should switch terminal, NOT your sanity for typability! (is that a word? :-D)

        -- Kein Mitleid Für Die Mehrheit

        1 Reply Last reply
        0
        • J Jorgen Sigvardsson

          In an application I'm writing, the format of an identifier is DDDDDD-DDDD where D is digit. In certain cases the identifier is not yet known. I thought I'd display such a "NULL" instance as ??????-???? instead of just a blank field. I was very surprised to find out that the UI displayed it as ????~????. WTF? I start debugging CString et al to see if I was using some hidden/unknown escape sequence. No such thing. I really didn't expect to find anything like that, but I don't want any loose ends. Then I tried concatenating a CString into ??????-????, and it worked. So it could not have triggered some kind of escape code in the UI (a BCGSoft list control in this case). Then it dawns on me that the compiler must be the culprit for some reason. I declare const char* lpsz = "??????-????"; and lo and behold, the debugger displays lpsz as ????~????. Then I remembered the trigraphs - an old feature that I suspect FEW programmers have ever used. Turns out that ??- is the trigraph for ~. I thought this must surely be a compiler bug - why is the compiler messing with my strings? According to http://en.wikipedia.org/wiki/Digraphs_and_trigraphs[^], trigraphs are detected at stream level. A trigraph is not by itself a token! It is detected and replaced inline of the text stream, and will therefore be picked up in both code AND strings. Digraphs it turns out, are only detected on token level, meaning that strings are untouched.

          -- Kein Mitleid Für Die Mehrheit

          S Offline
          S Offline
          Saurabh Garg
          wrote on last edited by
          #4

          Cool, I didn't even knew about trigraphs. -Saurabh

          1 Reply Last reply
          0
          • J Jorgen Sigvardsson

            In an application I'm writing, the format of an identifier is DDDDDD-DDDD where D is digit. In certain cases the identifier is not yet known. I thought I'd display such a "NULL" instance as ??????-???? instead of just a blank field. I was very surprised to find out that the UI displayed it as ????~????. WTF? I start debugging CString et al to see if I was using some hidden/unknown escape sequence. No such thing. I really didn't expect to find anything like that, but I don't want any loose ends. Then I tried concatenating a CString into ??????-????, and it worked. So it could not have triggered some kind of escape code in the UI (a BCGSoft list control in this case). Then it dawns on me that the compiler must be the culprit for some reason. I declare const char* lpsz = "??????-????"; and lo and behold, the debugger displays lpsz as ????~????. Then I remembered the trigraphs - an old feature that I suspect FEW programmers have ever used. Turns out that ??- is the trigraph for ~. I thought this must surely be a compiler bug - why is the compiler messing with my strings? According to http://en.wikipedia.org/wiki/Digraphs_and_trigraphs[^], trigraphs are detected at stream level. A trigraph is not by itself a token! It is detected and replaced inline of the text stream, and will therefore be picked up in both code AND strings. Digraphs it turns out, are only detected on token level, meaning that strings are untouched.

            -- Kein Mitleid Für Die Mehrheit

            S Offline
            S Offline
            supercat9
            wrote on last edited by
            #5

            BTW, it's often useful to split strings into separate quote-delimited parts that will be assembled at compile-time. For example, printf("\xAE" "abracadabra" "\xAF");will compile whereas "\xAEabracadabra\xAF" will likely either not compile or else yield a different string. Since ??" is not a trigraph, splitting string literals after double question marks should avoid trouble.

            1 Reply Last reply
            0
            • J Jorgen Sigvardsson

              In an application I'm writing, the format of an identifier is DDDDDD-DDDD where D is digit. In certain cases the identifier is not yet known. I thought I'd display such a "NULL" instance as ??????-???? instead of just a blank field. I was very surprised to find out that the UI displayed it as ????~????. WTF? I start debugging CString et al to see if I was using some hidden/unknown escape sequence. No such thing. I really didn't expect to find anything like that, but I don't want any loose ends. Then I tried concatenating a CString into ??????-????, and it worked. So it could not have triggered some kind of escape code in the UI (a BCGSoft list control in this case). Then it dawns on me that the compiler must be the culprit for some reason. I declare const char* lpsz = "??????-????"; and lo and behold, the debugger displays lpsz as ????~????. Then I remembered the trigraphs - an old feature that I suspect FEW programmers have ever used. Turns out that ??- is the trigraph for ~. I thought this must surely be a compiler bug - why is the compiler messing with my strings? According to http://en.wikipedia.org/wiki/Digraphs_and_trigraphs[^], trigraphs are detected at stream level. A trigraph is not by itself a token! It is detected and replaced inline of the text stream, and will therefore be picked up in both code AND strings. Digraphs it turns out, are only detected on token level, meaning that strings are untouched.

              -- Kein Mitleid Für Die Mehrheit

              P Offline
              P Offline
              peterchen
              wrote on last edited by
              #6

              Amazing how long you can do C++ without knowing suhc things... :D

              Agh! Reality! My Archnemesis![^]
              | FoldWithUs! | sighist | µLaunch - program launcher for server core and hyper-v server.

              1 Reply Last reply
              0
              • J Jorgen Sigvardsson

                In an application I'm writing, the format of an identifier is DDDDDD-DDDD where D is digit. In certain cases the identifier is not yet known. I thought I'd display such a "NULL" instance as ??????-???? instead of just a blank field. I was very surprised to find out that the UI displayed it as ????~????. WTF? I start debugging CString et al to see if I was using some hidden/unknown escape sequence. No such thing. I really didn't expect to find anything like that, but I don't want any loose ends. Then I tried concatenating a CString into ??????-????, and it worked. So it could not have triggered some kind of escape code in the UI (a BCGSoft list control in this case). Then it dawns on me that the compiler must be the culprit for some reason. I declare const char* lpsz = "??????-????"; and lo and behold, the debugger displays lpsz as ????~????. Then I remembered the trigraphs - an old feature that I suspect FEW programmers have ever used. Turns out that ??- is the trigraph for ~. I thought this must surely be a compiler bug - why is the compiler messing with my strings? According to http://en.wikipedia.org/wiki/Digraphs_and_trigraphs[^], trigraphs are detected at stream level. A trigraph is not by itself a token! It is detected and replaced inline of the text stream, and will therefore be picked up in both code AND strings. Digraphs it turns out, are only detected on token level, meaning that strings are untouched.

                -- Kein Mitleid Für Die Mehrheit

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #7

                I had come across this issue some years ago when trying to write C code from an IBM 3270 (?) terminal. IBM's keyboard was missing a few of the characters needed so the trigraph trick was the solution. I didn't realise it had actually become part of the language.

                It's time for a new signature.

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups