Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Other Discussions
  3. The Weird and The Wonderful
  4. The Magician's String, what you see is not what you get.

The Magician's String, what you see is not what you get.

Scheduled Pinned Locked Moved The Weird and The Wonderful
comsaleshelpannouncementworkspace
39 Posts 15 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • N Nicolas Dorier

    Magic. :) str2 is a string with an hidden character. If you copy my code, you copy the hidden character, so this bug follow you in whatever programming language. You can execute the code in debug mode, and see that str2 is length +1 str1. However, I have no idea how I ended up with this hidden character in my code. (the int value of this strange character is 0x200f)

    N Offline
    N Offline
    Nicholas Marty
    wrote on last edited by
    #18

    Another thing that bit me before was an UTF-8 preamble or BOM with the bytes 0xEF, 0xBB, 0xBF that got copied from somewhere... :doh:

    N 1 Reply Last reply
    0
    • N Nicholas Marty

      Another thing that bit me before was an UTF-8 preamble or BOM with the bytes 0xEF, 0xBB, 0xBF that got copied from somewhere... :doh:

      N Offline
      N Offline
      Nicolas Dorier
      wrote on last edited by
      #19

      Already got it, if you create a text file with visual studio, it bites you.

      1 Reply Last reply
      0
      • N Nicolas Dorier

        static void Main(string[] args)
        {
        String str1 = "http://toto.com/";
        String str2 = "http://toto.com‏/";
        bool eq = str1 == str2;
        Console.WriteLine(eq); //print false

        str1 = "http://toto.com/";
        str2 = "http://toto.com/";
        eq = str1 == str2;
        Console.WriteLine(eq); //print true
        

        }

        See for yourself, but copy the code, do not retype it. :) I lost hair on this one, bug on an actual project for one customer. But it is a nice trick to do to one of your most hated co worker if his computer is unlocked... Also works in configuration files. ;) This is pure evil though. [UPDATE] With some advice I found even more evil than that.

        "а" == "a" //false

        F Offline
        F Offline
        Freak30
        wrote on last edited by
        #20

        I once read a kind of ironic posting about what you could do to obscrure your code (and this way make yourself irreplacable). One of the topics was using similar letters from different alphabets in variable names. They used the example of the Cyrillic 'a' which looks just like the Latin 'a' but is seen as different by the compiler. I assume you could have reached a similar effect by using a Cyrillic 'r' instead of the Latin 'p' in the URL. :-D

        The good thing about pessimism is, that you are always either right or pleasently surprised.

        N P 2 Replies Last reply
        0
        • F Freak30

          I once read a kind of ironic posting about what you could do to obscrure your code (and this way make yourself irreplacable). One of the topics was using similar letters from different alphabets in variable names. They used the example of the Cyrillic 'a' which looks just like the Latin 'a' but is seen as different by the compiler. I assume you could have reached a similar effect by using a Cyrillic 'r' instead of the Latin 'p' in the URL. :-D

          The good thing about pessimism is, that you are always either right or pleasently surprised.

          N Offline
          N Offline
          Nicolas Dorier
          wrote on last edited by
          #21

          Did not know that, this is even more evil than the invisible character. I take some notes. You can spot the invisible char by doing str1.Length, but a Cyrillic 'a'... huhu. :)

          B 1 Reply Last reply
          0
          • F Freak30

            I once read a kind of ironic posting about what you could do to obscrure your code (and this way make yourself irreplacable). One of the topics was using similar letters from different alphabets in variable names. They used the example of the Cyrillic 'a' which looks just like the Latin 'a' but is seen as different by the compiler. I assume you could have reached a similar effect by using a Cyrillic 'r' instead of the Latin 'p' in the URL. :-D

            The good thing about pessimism is, that you are always either right or pleasently surprised.

            P Offline
            P Offline
            PIEBALDconsult
            wrote on last edited by
            #22

            Yes, Unicode can be very handy. "the Greek letter Tau (t) (Unicode U+03A4) which looks enough like the Latin letter T" -- Sorting 'Total' after data values[^]

            N 1 Reply Last reply
            0
            • P PIEBALDconsult

              Yes, Unicode can be very handy. "the Greek letter Tau (t) (Unicode U+03A4) which looks enough like the Latin letter T" -- Sorting 'Total' after data values[^]

              N Offline
              N Offline
              Nicolas Dorier
              wrote on last edited by
              #23

              Does the Terminal font trick would find the bug ? :D

              1 Reply Last reply
              0
              • N Nicolas Dorier

                Did not know that, this is even more evil than the invisible character. I take some notes. You can spot the invisible char by doing str1.Length, but a Cyrillic 'a'... huhu. :)

                B Offline
                B Offline
                Bernhard Hiller
                wrote on last edited by
                #24

                Well, some phishers used that in web addresses. Then, browsers were changed to show some encoded values in the address bar for such characters. www.dеutsсhеbаnk.соm looks so nice at first view, but Firefox changes it into www.xn--dutshbnk-66g8be6l.xn--m-0tbi nowadays.

                1 Reply Last reply
                0
                • W Wonde Tadesse

                  Compileonline.com[^] will shows you the buggy char on String str2 . Interesting though. :)

                  Wonde Tadesse

                  V Offline
                  V Offline
                  VICK
                  wrote on last edited by
                  #25

                  bt it wont catch "a" == "a" :D

                  We should be building great things that don't exist-Lary Page

                  W 1 Reply Last reply
                  0
                  • V VICK

                    bt it wont catch "a" == "a" :D

                    We should be building great things that don't exist-Lary Page

                    W Offline
                    W Offline
                    Wonde Tadesse
                    wrote on last edited by
                    #26

                    It catches it and will tell you false. They are not the same Unicode. See the code below.

                    private static void MystriesUniCode()
                    {
                    Console.WriteLine("{0} U+{1:x4} {2}", 'а', (int)'а', (int)'а');
                    Console.WriteLine("{0} U+{1:x4} {2}", 'a', (int)'a', (int)'a');
                    }

                    And output is

                    ? U+0430 1072 a U+0061 97

                    Seeing is believing. Not this time. Compiling is believing. :-D

                    Wonde Tadesse

                    V 1 Reply Last reply
                    0
                    • N Nicolas Dorier

                      static void Main(string[] args)
                      {
                      String str1 = "http://toto.com/";
                      String str2 = "http://toto.com‏/";
                      bool eq = str1 == str2;
                      Console.WriteLine(eq); //print false

                      str1 = "http://toto.com/";
                      str2 = "http://toto.com/";
                      eq = str1 == str2;
                      Console.WriteLine(eq); //print true
                      

                      }

                      See for yourself, but copy the code, do not retype it. :) I lost hair on this one, bug on an actual project for one customer. But it is a nice trick to do to one of your most hated co worker if his computer is unlocked... Also works in configuration files. ;) This is pure evil though. [UPDATE] With some advice I found even more evil than that.

                      "а" == "a" //false

                      R Offline
                      R Offline
                      Rob Grainger
                      wrote on last edited by
                      #27

                      I would never, ever, ever stoop so low as to unleash this on my coworkers... but I can think of some suppliers who may benefit from this (see my previous posts on Coding Horrors The Wierd and the Wonderful). bwa ha ha ha bwa ha ha ha ha bwa ha ha ha ha ha bwa ha ha ha ha ha ha

                      "If you don't fail at least 90 percent of the time, you're not aiming high enough." Alan Kay.

                      1 Reply Last reply
                      0
                      • N Nicolas Dorier

                        static void Main(string[] args)
                        {
                        String str1 = "http://toto.com/";
                        String str2 = "http://toto.com‏/";
                        bool eq = str1 == str2;
                        Console.WriteLine(eq); //print false

                        str1 = "http://toto.com/";
                        str2 = "http://toto.com/";
                        eq = str1 == str2;
                        Console.WriteLine(eq); //print true
                        

                        }

                        See for yourself, but copy the code, do not retype it. :) I lost hair on this one, bug on an actual project for one customer. But it is a nice trick to do to one of your most hated co worker if his computer is unlocked... Also works in configuration files. ;) This is pure evil though. [UPDATE] With some advice I found even more evil than that.

                        "а" == "a" //false

                        I Offline
                        I Offline
                        Ian Shlasko
                        wrote on last edited by
                        #28

                        So evil... Heh... Of course, the VS theme[^] I'm using automatically underlines hyperlinks. And it doesn't underline that second one. Gee, I wonder why :-D

                        Proud to have finally moved to the A-Ark. Which one are you in?
                        Author of the Guardians Saga (Sci-Fi/Fantasy novels)

                        1 Reply Last reply
                        0
                        • W Wonde Tadesse

                          It catches it and will tell you false. They are not the same Unicode. See the code below.

                          private static void MystriesUniCode()
                          {
                          Console.WriteLine("{0} U+{1:x4} {2}", 'а', (int)'а', (int)'а');
                          Console.WriteLine("{0} U+{1:x4} {2}", 'a', (int)'a', (int)'a');
                          }

                          And output is

                          ? U+0430 1072 a U+0061 97

                          Seeing is believing. Not this time. Compiling is believing. :-D

                          Wonde Tadesse

                          V Offline
                          V Offline
                          VICK
                          wrote on last edited by
                          #29

                          Quote:

                          Compileonline.com[^] will shows you the buggy char on String str2 . Interesting though. Smile | :)

                          I was talking about the Above BTW. :) Well strange compilation. :D Same Line twice and Different Results..

                          We should be building great things that don't exist-Lary Page

                          1 Reply Last reply
                          0
                          • N Nicolas Dorier

                            static void Main(string[] args)
                            {
                            String str1 = "http://toto.com/";
                            String str2 = "http://toto.com‏/";
                            bool eq = str1 == str2;
                            Console.WriteLine(eq); //print false

                            str1 = "http://toto.com/";
                            str2 = "http://toto.com/";
                            eq = str1 == str2;
                            Console.WriteLine(eq); //print true
                            

                            }

                            See for yourself, but copy the code, do not retype it. :) I lost hair on this one, bug on an actual project for one customer. But it is a nice trick to do to one of your most hated co worker if his computer is unlocked... Also works in configuration files. ;) This is pure evil though. [UPDATE] With some advice I found even more evil than that.

                            "а" == "a" //false

                            K Offline
                            K Offline
                            KP Lee
                            wrote on last edited by
                            #30

                            I refused to copy because I also wanted to find out where.

                             static void Main(string\[\] args)
                                {
                                    String str1 = "http://toto.com/";
                                    String str2 = "http://toto.com‏/";
                                    //             123456789 123456
                                    bool eq = str1 == str2;
                                    int j = str1.Length;
                                    Console.WriteLine(string.Format("Evaluates to {0}, Length = {1},{2}", eq, j, str2.Length)); //print false
                                    for (int i = 0; i < j; i++)
                                    {
                                        if (str1\[i\] != str2\[i\])
                                            Console.WriteLine(string.Format("Mismatch found index={0}, char(2),int(2) = {1}-{2},{3}-{4}"
                                                , i, str1\[i\], str2\[i\], (int)str1\[i\], (int)str2\[i\]));
                                    }
                                    str1 = "http://toto.com/";
                                    str2 = "http://toto.com/";
                                    eq = str1 == str2;
                                    Console.WriteLine(eq); //print true
                                    Console.Read();
                                }
                            

                            PS my "find-out" code has a bug in it that I only realized after it ran successfully through pure luck. (Do you see it?)

                            N 1 Reply Last reply
                            0
                            • K KP Lee

                              I refused to copy because I also wanted to find out where.

                               static void Main(string\[\] args)
                                  {
                                      String str1 = "http://toto.com/";
                                      String str2 = "http://toto.com‏/";
                                      //             123456789 123456
                                      bool eq = str1 == str2;
                                      int j = str1.Length;
                                      Console.WriteLine(string.Format("Evaluates to {0}, Length = {1},{2}", eq, j, str2.Length)); //print false
                                      for (int i = 0; i < j; i++)
                                      {
                                          if (str1\[i\] != str2\[i\])
                                              Console.WriteLine(string.Format("Mismatch found index={0}, char(2),int(2) = {1}-{2},{3}-{4}"
                                                  , i, str1\[i\], str2\[i\], (int)str1\[i\], (int)str2\[i\]));
                                      }
                                      str1 = "http://toto.com/";
                                      str2 = "http://toto.com/";
                                      eq = str1 == str2;
                                      Console.WriteLine(eq); //print true
                                      Console.Read();
                                  }
                              

                              PS my "find-out" code has a bug in it that I only realized after it ran successfully through pure luck. (Do you see it?)

                              N Offline
                              N Offline
                              Nicolas Dorier
                              wrote on last edited by
                              #31

                              Have you tried

                              String str1 = "аrnold";
                              String str2 = "arnold";

                              This is not the same problem ;)

                              K 1 Reply Last reply
                              0
                              • N Nicolas Dorier

                                Have you tried

                                String str1 = "аrnold";
                                String str2 = "arnold";

                                This is not the same problem ;)

                                K Offline
                                K Offline
                                KP Lee
                                wrote on last edited by
                                #32

                                Looks like the same problem to me. One character is ASCII and the other is Unicode. (Well, they both are Unicode and one isn't ASCII. You of course, could have both not ASCII

                                    static void Main(string\[\] args)
                                    {
                                        String str1 = "http://toto.com/";
                                        String str2 = "http://toto.com‏/";
                                        //             123456789 123456
                                        TestStrs(str1, str2);
                                        str1 = "аrnold";
                                        str2 = "arnold";
                                        TestStrs(str1, str2);
                                        str1 = "http://toto.com/";
                                        str2 = "http://toto.com/";
                                        TestStrs(str1, str2);
                                        Console.Read();
                                    }
                                    static bool TestStrs(string str1, string str2)
                                    {
                                        bool eq = str1 == str2;
                                        if (eq)
                                        {
                                            Console.WriteLine(string.Format("Two Strings ({0}) are the same", str1));
                                            return eq;
                                        }
                                        Console.WriteLine(string.Format("Mismatch, two Strings ({0}) ({1})are not the same", str1, str2));
                                        int j = str1.Length, i = str2.Length;
                                        if (j > i)
                                        {
                                            j = i;
                                        }
                                        for (i = 0; i < j; i++)
                                        {
                                            if (str1\[i\] != str2\[i\])
                                                Console.WriteLine(string.Format("Mismatch found index={0}, char(2),int(2) = {1}-{2},{3}-{4}", i, str1\[i\], str2\[i\], (int)str1\[i\], (int)str2\[i\]));
                                        }
                                        return eq;
                                    }
                                

                                Of course this has a bug in it too. Multiple true Unicode strings would blow up with an overindex error. Help says exactly what I thought it said, which is patently wrong: String . Length Property (System) - MSDN – the Microsoft ...‎ The Length property returns the number of Char objects in this instance, not the number of Unicode characters. The reason is that a Unicode character might be ... http://msdn.microsoft.com/en-us/library/system.string.length[^] If it did what it said it would do, I wouldn't have spotted the bug in the first place. No, I'm wrong. I was under the impression that char could hold one or two bytes. I guess instead when a true Unicode character is indexed in the string, the next index location points to the start of the next Unicode character, otherwise your string would have a bunch of mismatches in the loop. So true multiple UNICODE

                                N 1 Reply Last reply
                                0
                                • K KP Lee

                                  Looks like the same problem to me. One character is ASCII and the other is Unicode. (Well, they both are Unicode and one isn't ASCII. You of course, could have both not ASCII

                                      static void Main(string\[\] args)
                                      {
                                          String str1 = "http://toto.com/";
                                          String str2 = "http://toto.com‏/";
                                          //             123456789 123456
                                          TestStrs(str1, str2);
                                          str1 = "аrnold";
                                          str2 = "arnold";
                                          TestStrs(str1, str2);
                                          str1 = "http://toto.com/";
                                          str2 = "http://toto.com/";
                                          TestStrs(str1, str2);
                                          Console.Read();
                                      }
                                      static bool TestStrs(string str1, string str2)
                                      {
                                          bool eq = str1 == str2;
                                          if (eq)
                                          {
                                              Console.WriteLine(string.Format("Two Strings ({0}) are the same", str1));
                                              return eq;
                                          }
                                          Console.WriteLine(string.Format("Mismatch, two Strings ({0}) ({1})are not the same", str1, str2));
                                          int j = str1.Length, i = str2.Length;
                                          if (j > i)
                                          {
                                              j = i;
                                          }
                                          for (i = 0; i < j; i++)
                                          {
                                              if (str1\[i\] != str2\[i\])
                                                  Console.WriteLine(string.Format("Mismatch found index={0}, char(2),int(2) = {1}-{2},{3}-{4}", i, str1\[i\], str2\[i\], (int)str1\[i\], (int)str2\[i\]));
                                          }
                                          return eq;
                                      }
                                  

                                  Of course this has a bug in it too. Multiple true Unicode strings would blow up with an overindex error. Help says exactly what I thought it said, which is patently wrong: String . Length Property (System) - MSDN – the Microsoft ...‎ The Length property returns the number of Char objects in this instance, not the number of Unicode characters. The reason is that a Unicode character might be ... http://msdn.microsoft.com/en-us/library/system.string.length[^] If it did what it said it would do, I wouldn't have spotted the bug in the first place. No, I'm wrong. I was under the impression that char could hold one or two bytes. I guess instead when a true Unicode character is indexed in the string, the next index location points to the start of the next Unicode character, otherwise your string would have a bunch of mismatches in the loop. So true multiple UNICODE

                                  N Offline
                                  N Offline
                                  Nicolas Dorier
                                  wrote on last edited by
                                  #33

                                  The different is that the http://toto.com/ example contains a char that is hidden. However the "arnold" == "arnold" are using two different "a". This is why the two arnold have same length, but not the two http://toto.com/ I was not aware of the StringInfo class and how a unicode char could take two chars. Very intersting stuff, I have not idea if a "Unicode char" exists.

                                  K 2 Replies Last reply
                                  0
                                  • N Nicolas Dorier

                                    The different is that the http://toto.com/ example contains a char that is hidden. However the "arnold" == "arnold" are using two different "a". This is why the two arnold have same length, but not the two http://toto.com/ I was not aware of the StringInfo class and how a unicode char could take two chars. Very intersting stuff, I have not idea if a "Unicode char" exists.

                                    K Offline
                                    K Offline
                                    KP Lee
                                    wrote on last edited by
                                    #34

                                    I had been taught that char supported UNICODE format and I accepted it, then I ran into the documentation that said the Length field only represented the char length. Therefore, I figured all I'd learned was wrong. Then you mentioned the hidden character and I remembered something else. I cast the char into an int. The only way it could be larger than 256 is if it was a Unicode char. I thought for sure I could get this to blow up with more Unicode characters. By the way, you are wrong. There isn't a hidden character in the original problem. The last character in the string is an ASCII "47" character and a UNICODE "8207" character. Your "a" code is also an ASCII and UNICODE difference, though I can't prove it without going back to your post. The last version evaluates as true.

                                    N 1 Reply Last reply
                                    0
                                    • N Nicolas Dorier

                                      The different is that the http://toto.com/ example contains a char that is hidden. However the "arnold" == "arnold" are using two different "a". This is why the two arnold have same length, but not the two http://toto.com/ I was not aware of the StringInfo class and how a unicode char could take two chars. Very intersting stuff, I have not idea if a "Unicode char" exists.

                                      K Offline
                                      K Offline
                                      KP Lee
                                      wrote on last edited by
                                      #35

                                      Yep. Your "a"'s are ascii 97 and Unicode 1072. Interesting, the second one also exceeds 1 byte, but still counts as one character. (2^10 is 1024, so it takes 11 bits, 2^3 is 8 so the "/" longer character takes 14 bits. You are absolutely right that debug makes it look like the / character is too long to display the interpretation.

                                      N 1 Reply Last reply
                                      0
                                      • K KP Lee

                                        I had been taught that char supported UNICODE format and I accepted it, then I ran into the documentation that said the Length field only represented the char length. Therefore, I figured all I'd learned was wrong. Then you mentioned the hidden character and I remembered something else. I cast the char into an int. The only way it could be larger than 256 is if it was a Unicode char. I thought for sure I could get this to blow up with more Unicode characters. By the way, you are wrong. There isn't a hidden character in the original problem. The last character in the string is an ASCII "47" character and a UNICODE "8207" character. Your "a" code is also an ASCII and UNICODE difference, though I can't prove it without going back to your post. The last version evaluates as true.

                                        N Offline
                                        N Offline
                                        Nicolas Dorier
                                        wrote on last edited by
                                        #36

                                        There is an hidden character in the first version. Look, I just moved it.

                                        "http://toto‏.com/"

                                        Unicode 8207 was not the '/', that's the reason why the two length were not the same. However, for the arnold example, you get the same length.

                                        K 1 Reply Last reply
                                        0
                                        • K KP Lee

                                          Yep. Your "a"'s are ascii 97 and Unicode 1072. Interesting, the second one also exceeds 1 byte, but still counts as one character. (2^10 is 1024, so it takes 11 bits, 2^3 is 8 so the "/" longer character takes 14 bits. You are absolutely right that debug makes it look like the / character is too long to display the interpretation.

                                          N Offline
                                          N Offline
                                          Nicolas Dorier
                                          wrote on last edited by
                                          #37

                                          it is not related to the "/" but the unicode char 8107 that you can put where you want. (see my previous comment)

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups