Funny strings
-
Hello guys. Recently I've been working on some relatively complex regular expressions. It took me a while before I finally figured out what was wrong with an obviously correct (but very long) regular expression. To illustrate the issue I'm talking about, fire up your Visual Studio 2008 SP1 Express/Pro/Team, create a new C++ Windows console application and replace the contents of your main cpp file with the following code:
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
#include <string>using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
// Fun starts here. Check the output. It quite differs from "(??)".printf("(??)\\n"); string str1 = "(??)"; cout << str1 << endl; // An additional backspace ends the fun... printf("(?\\?)\\n"); string str2 = "(?\\?)"; cout << str2 << endl; // ...and so does the closing parenthesis removal. printf("(??\\n"); string str3 = "(??"; cout << str3 << endl; // The same thing happens to wstring too. wstring str4 = L"(??)"; // bad one wcout << str4 << endl; wstring str5 = L"(?\\?)"; // good one wcout << str5 << endl; wstring str6 = L"(??"; // good one wcout << str6 << endl; return 0;
}
As you can see, the output quite differs form the expected one in case of "(??)" string. Since my regular expression contained a lot of '*', '[', ']', '\', '?', '(', and ')' characters, it was quite difficult to figure out that the value of a variable displayed in debug window differs from the one assigned to it in the source code (actually I figured it out by accident). So be careful when using regular expressions ;-) Oh, one more thing. I just tried this example on GCC compiler. In all cases the output is as expected. Hmmm...
-- Vladimir Svrkota, AlfaNum Novi Sad, Serbia.
-
Ye olde trigraph strikes again. It's well documented in my old Kernighan & Ritchie: "The C Programming Language", 2nd edition from 1988.
The string literal "??)" will be replaced with the single character ']' which looks consistent with what you've seen. I don't have VS around so i can't immediately reproduce the output. By default gcc ignores trigraphs, so it's not a surprise you don't see the "problem" there. Try compiling with the -trigraphs switch.
-
Hello guys. Recently I've been working on some relatively complex regular expressions. It took me a while before I finally figured out what was wrong with an obviously correct (but very long) regular expression. To illustrate the issue I'm talking about, fire up your Visual Studio 2008 SP1 Express/Pro/Team, create a new C++ Windows console application and replace the contents of your main cpp file with the following code:
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
#include <string>using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
// Fun starts here. Check the output. It quite differs from "(??)".printf("(??)\\n"); string str1 = "(??)"; cout << str1 << endl; // An additional backspace ends the fun... printf("(?\\?)\\n"); string str2 = "(?\\?)"; cout << str2 << endl; // ...and so does the closing parenthesis removal. printf("(??\\n"); string str3 = "(??"; cout << str3 << endl; // The same thing happens to wstring too. wstring str4 = L"(??)"; // bad one wcout << str4 << endl; wstring str5 = L"(?\\?)"; // good one wcout << str5 << endl; wstring str6 = L"(??"; // good one wcout << str6 << endl; return 0;
}
As you can see, the output quite differs form the expected one in case of "(??)" string. Since my regular expression contained a lot of '*', '[', ']', '\', '?', '(', and ')' characters, it was quite difficult to figure out that the value of a variable displayed in debug window differs from the one assigned to it in the source code (actually I figured it out by accident). So be careful when using regular expressions ;-) Oh, one more thing. I just tried this example on GCC compiler. In all cases the output is as expected. Hmmm...
-- Vladimir Svrkota, AlfaNum Novi Sad, Serbia.
They who don't learn from history are doomed to be bitten on the bottom by it. :-D
-
Ye olde trigraph strikes again. It's well documented in my old Kernighan & Ritchie: "The C Programming Language", 2nd edition from 1988.
OMG! :-D Have the same book, cleaned up the dust and there they were (trigraphs), staring right back at me, making me just stand there in shame :-D . Thanks, Søren.
-- Vladimir Svrkota, AlfaNum Novi Sad, Serbia.
-
OMG! :-D Have the same book, cleaned up the dust and there they were (trigraphs), staring right back at me, making me just stand there in shame :-D . Thanks, Søren.
-- Vladimir Svrkota, AlfaNum Novi Sad, Serbia.
-
Hello guys. Recently I've been working on some relatively complex regular expressions. It took me a while before I finally figured out what was wrong with an obviously correct (but very long) regular expression. To illustrate the issue I'm talking about, fire up your Visual Studio 2008 SP1 Express/Pro/Team, create a new C++ Windows console application and replace the contents of your main cpp file with the following code:
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
#include <string>using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
// Fun starts here. Check the output. It quite differs from "(??)".printf("(??)\\n"); string str1 = "(??)"; cout << str1 << endl; // An additional backspace ends the fun... printf("(?\\?)\\n"); string str2 = "(?\\?)"; cout << str2 << endl; // ...and so does the closing parenthesis removal. printf("(??\\n"); string str3 = "(??"; cout << str3 << endl; // The same thing happens to wstring too. wstring str4 = L"(??)"; // bad one wcout << str4 << endl; wstring str5 = L"(?\\?)"; // good one wcout << str5 << endl; wstring str6 = L"(??"; // good one wcout << str6 << endl; return 0;
}
As you can see, the output quite differs form the expected one in case of "(??)" string. Since my regular expression contained a lot of '*', '[', ']', '\', '?', '(', and ')' characters, it was quite difficult to figure out that the value of a variable displayed in debug window differs from the one assigned to it in the source code (actually I figured it out by accident). So be careful when using regular expressions ;-) Oh, one more thing. I just tried this example on GCC compiler. In all cases the output is as expected. Hmmm...
-- Vladimir Svrkota, AlfaNum Novi Sad, Serbia.
-
Hello guys. Recently I've been working on some relatively complex regular expressions. It took me a while before I finally figured out what was wrong with an obviously correct (but very long) regular expression. To illustrate the issue I'm talking about, fire up your Visual Studio 2008 SP1 Express/Pro/Team, create a new C++ Windows console application and replace the contents of your main cpp file with the following code:
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
#include <string>using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
// Fun starts here. Check the output. It quite differs from "(??)".printf("(??)\\n"); string str1 = "(??)"; cout << str1 << endl; // An additional backspace ends the fun... printf("(?\\?)\\n"); string str2 = "(?\\?)"; cout << str2 << endl; // ...and so does the closing parenthesis removal. printf("(??\\n"); string str3 = "(??"; cout << str3 << endl; // The same thing happens to wstring too. wstring str4 = L"(??)"; // bad one wcout << str4 << endl; wstring str5 = L"(?\\?)"; // good one wcout << str5 << endl; wstring str6 = L"(??"; // good one wcout << str6 << endl; return 0;
}
As you can see, the output quite differs form the expected one in case of "(??)" string. Since my regular expression contained a lot of '*', '[', ']', '\', '?', '(', and ')' characters, it was quite difficult to figure out that the value of a variable displayed in debug window differs from the one assigned to it in the source code (actually I figured it out by accident). So be careful when using regular expressions ;-) Oh, one more thing. I just tried this example on GCC compiler. In all cases the output is as expected. Hmmm...
-- Vladimir Svrkota, AlfaNum Novi Sad, Serbia.
-
Hello guys. Recently I've been working on some relatively complex regular expressions. It took me a while before I finally figured out what was wrong with an obviously correct (but very long) regular expression. To illustrate the issue I'm talking about, fire up your Visual Studio 2008 SP1 Express/Pro/Team, create a new C++ Windows console application and replace the contents of your main cpp file with the following code:
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
#include <string>using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
// Fun starts here. Check the output. It quite differs from "(??)".printf("(??)\\n"); string str1 = "(??)"; cout << str1 << endl; // An additional backspace ends the fun... printf("(?\\?)\\n"); string str2 = "(?\\?)"; cout << str2 << endl; // ...and so does the closing parenthesis removal. printf("(??\\n"); string str3 = "(??"; cout << str3 << endl; // The same thing happens to wstring too. wstring str4 = L"(??)"; // bad one wcout << str4 << endl; wstring str5 = L"(?\\?)"; // good one wcout << str5 << endl; wstring str6 = L"(??"; // good one wcout << str6 << endl; return 0;
}
As you can see, the output quite differs form the expected one in case of "(??)" string. Since my regular expression contained a lot of '*', '[', ']', '\', '?', '(', and ')' characters, it was quite difficult to figure out that the value of a variable displayed in debug window differs from the one assigned to it in the source code (actually I figured it out by accident). So be careful when using regular expressions ;-) Oh, one more thing. I just tried this example on GCC compiler. In all cases the output is as expected. Hmmm...
-- Vladimir Svrkota, AlfaNum Novi Sad, Serbia.
I forgot about that? :laugh: I'll be adding a warning to my personal regular expression library documentation, because some how that never showed up in the tests. :doh: Note: g++ gives a warning when it sees a trigraph.
INTP "Program testing can be used to show the presence of bugs, but never to show their absence." - Edsger Dijkstra "I have never been lost, but I will admit to being confused for several weeks. " - Daniel Boone
-
Hello guys. Recently I've been working on some relatively complex regular expressions. It took me a while before I finally figured out what was wrong with an obviously correct (but very long) regular expression. To illustrate the issue I'm talking about, fire up your Visual Studio 2008 SP1 Express/Pro/Team, create a new C++ Windows console application and replace the contents of your main cpp file with the following code:
#include "stdafx.h"
#include <stdio.h>
#include <iostream>
#include <string>using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
// Fun starts here. Check the output. It quite differs from "(??)".printf("(??)\\n"); string str1 = "(??)"; cout << str1 << endl; // An additional backspace ends the fun... printf("(?\\?)\\n"); string str2 = "(?\\?)"; cout << str2 << endl; // ...and so does the closing parenthesis removal. printf("(??\\n"); string str3 = "(??"; cout << str3 << endl; // The same thing happens to wstring too. wstring str4 = L"(??)"; // bad one wcout << str4 << endl; wstring str5 = L"(?\\?)"; // good one wcout << str5 << endl; wstring str6 = L"(??"; // good one wcout << str6 << endl; return 0;
}
As you can see, the output quite differs form the expected one in case of "(??)" string. Since my regular expression contained a lot of '*', '[', ']', '\', '?', '(', and ')' characters, it was quite difficult to figure out that the value of a variable displayed in debug window differs from the one assigned to it in the source code (actually I figured it out by accident). So be careful when using regular expressions ;-) Oh, one more thing. I just tried this example on GCC compiler. In all cases the output is as expected. Hmmm...
-- Vladimir Svrkota, AlfaNum Novi Sad, Serbia.
I don't get it. What's going on there? I get the same results you seemed to indicate with Visual C++ 2005.
-
I forgot about that? :laugh: I'll be adding a warning to my personal regular expression library documentation, because some how that never showed up in the tests. :doh: Note: g++ gives a warning when it sees a trigraph.
INTP "Program testing can be used to show the presence of bugs, but never to show their absence." - Edsger Dijkstra "I have never been lost, but I will admit to being confused for several weeks. " - Daniel Boone
VS has a warning (C4837) for this as well - by default (pre vs2010) was off by default. We have (as part of our standard headers that apply to all compile units) a set of warnings we always turn on (and some we turn off). Somebody had slipped that one in a while back :)