Simple C to C++ [modified]

Chuck OToole · modified on Wednesday, September 14, 2011 4:09 PM

It is my understanding that std::string does *NOT* include a NULL ('\0') character at the end of the string. One cannot assume a null termination. So, basically you are using C style assumptions on C++ string objects. The way to deal with std::string is through the member functions string.length(), string.replace(), etc. The examples others have shown you work because they stay within the object's definition of operative functions. There is a string.c_str() member function that returns a pointer to a C style null terminated char * (http://www.cplusplus.com/reference/string/string/c_str/[^]) but that too cannot be modified by the receiving program. If you're going to convert from C to C++, you should go all the way and avoid those old char * uses and move to some string class, either std::string or MFC/ATL CString, depending on your project's needs.

enhzflep · modified on Wednesday, September 14, 2011 2:30 PM

The reason there's no upper limit on "i" in either of the for loops is that this would take more code - it would require a strlen be performed once before the loop in addition to checking to see if i is equal to this length. It's less clear to read and more prone to induce error during maintenance, but I believe it to be for this reason that the way the loop is exited with a break. Not sure why you'd go the trouble of #defining NUL as 0x0.. It would be clearer if the already provided NULL was used (less code too, since there's only 4 references to 'NUL' - 4 cases of simply adding another 'L'. In any case, the executable code is identical - it is just the source-code that suffers from reduced readability, unlike the loop-terminating-condition check, which produces a smaller executable when done this way than the more readable alternative of checking the strlen first then using a terminarting condition of ichar *htmlStr = ""; replace_html_delimiters(htmlStr); while I can see this succeeding

char *htmlStr = "";
char htmlStrCopy = strdup(htmlStr);
replace_html_delimiters(htmlStrCopy);
..
.. other actions on htmlStrCopy
..
free(htmlStrCopy);

Add to that the fact that the "char z_buf[4095]" statement isn't terminated with a ';' in either case and you have rather a problem, considering that the C function called with a string containing "" returns the same string. Studying the function, it appears to scan through a string, quitting upon end of string (0x0), while perplexingly, when it intercepts a '<' character it copies all of the text except for this character, then it appends the '<' explicitly. It's 5am here, and I can't think of a circumstance that the output sring would be different to the input string.

Chuck OToole

You're right in that the code looks odd but I think it was because the Code Project Editor messed it up. The OP was trying to replace the < character with the sequence ampersand-l-t, a sequence which if typed into this editor will yield a <, making the code look wrong. He's really making the string bigger with the replacements.

Lost User · modified on Wednesday, September 14, 2011 2:30 PM

Software2007 wrote:

debugger would crash at msg[i] = nul in the c-like code

In that case msg is not a NULL terminated string. How big do you expect msg to be, do you knwow? If you do, you can add an additional check for not exceeding that length thus:

for(i=0; ; i++) //Why no upper limit here?
{
if(msg[i]== NUL)
break;
if(i > maxvmsglen)
break;
...

The other likelyhood is that the code:

   msg\[i\] = NUL;
	   strcpy(z\_buf,msg);
	   strcat(z\_buf,"<");
	   strcat(z\_buf,msg+i+1);//confusing me
	   strcpy(msg,z\_buf);

is mashing up the msg buffer and overflowing it. I have rarely seen such a horrible piece of code, what is it supposed to be doing?

============================== Nothing to say.

Lost User · modified on Wednesday, September 14, 2011 2:30 PM

This works:

string msg = "strangebeautiful";
string::size_type index;
while ((index = msg.find('<')) != string::npos)
{
msg = msg.replace(index, 1, "<");
}

Remember that strings are immutable, they cannot be altered in-place, so each replace call returns the modified string, which you must use on the next iteration. Similarly expressions such as msg[i] = '\0'; will cause an access violation. See here[^] for all the lowdown on STL string types.

Unrequited desire is character building. OriginalGriff

Orjan Westin · modified on Wednesday, September 14, 2011 2:30 PM

If you want to replace the character '<' in a string with, "<", it could be done quite simply in C++ like this:

void replace_html_delimiters(std::string& str)
{
std::string::size_type pos = str.find("<");
while (std::string::npos != pos)
{
str.replace(pos, 1, "<");
pos = str.find("<", pos + 4);
}
}

Or if you want to cover the closing '>' as well:

void replace_html_delimiters(std::string& str)
{
std::string::size_type pos = str.find_first_of("<>");
while (std::string::npos != pos)
{
if ('<' == str[pos])
str.replace(pos, 1, "<");
else
str.replace(pos, 1, ">");
pos = str.find_first_of("<>", pos + 4);
}
}

By the way, I assume that you had < in your code example, and that CodeProject converted it to < when you posted it? This can be avoided by escaping out the leading ampersand (& is also a reserved character in HTML, like < and >) like this: &lt;. Otherwise, your C code would simply replace the character '<' with the character '<', with lots of copying back and forth.

void replace_html_delimiters(char *msg)
{
for(i=0; ; i++)
{
if(msg[i]== NUL)
break; // End condition, so not needed in for statement
if(msg[i]=='<')
{
msg[i] = NUL; // Replace found '<' with 0 to mark end of string
strcpy(z_buf,msg); // Copy string (up to the new end) to buffer
strcat(z_buf,"<"); // Add string "<" to end of buffer
strcat(z_buf,msg+i+1); // Add remaining string to end of buffer)
strcpy(msg,z_buf); // Copy back to string.
}
}
}

And that could be rewritten very effectively like this:

void replace_html_delimiters(char *msg)
{
// No need to do anything
}

:-)

Orjan Westin

Remember that strings are immutable This is not true in standard C++. Did you think of C# or some pre-standard implementation of STL?

Stefan_Lang · modified on Wednesday, September 14, 2011 2:30 PM

First, the original code should compile and work for any C++ compiler. Almost any C code should, as long as your function declarations contain the full parameter list (this was not mandatory in C, but is in C++). If the function doesn't work as intended in C++, then it didn't in C either! If your intention is to refactor the code into something that more resembles C++ coding standard, here's a few pointers: 1. Do not use #define for constants. C++ introduced the const keyword for that purpose and AFAIK ANSI C as well. #define always introduces a risk, as it replaces text without concern for the context, and therefore might break your code in places that it was not meant to affect. It's even worse when #defines are used in headers, making the replacement global. 2. Do not use magic numbers. Magic numbers are numeric or string literals that are used within the code to define array boundaries, values passed to functions, or limits used for loops. It's almost always better to instead define a constant, using a name that explains its purpose or use. There are multiple advantages of doing this: First, if you ever need to change the value you only need to change it in one place, no matter how often you used it, and no matter whether others used it in places that you don't even know of; Second, the name of the constant explains what it is, saving people the effort to somehow divine it from the context or (nonexistent) comments; Third, constant names are often easier to remember than the literals they represent. And intelligent editors will even remember these names for you. 3. Use std::string instead of C-style 0-terminated strings. They are sometimes more awkward to use, but they're fast and generally more safe. They also manage their own memory, so you don't need to allocate an arbitrarily sized buffer yourself, nor do you need to care about its deallocation. Also there are already plenty of functions available in the STL, either as member functions of std::string, or as generic functions found in algorithm:: (unfortunately though, none of them exactly reciprokes your function) 4. Be careful when using index values for std::string, or in fact any of the containers of the STL. For one, many functions in the STL require iterators, not index values; most of the time index values - if provided for a container - can be used to read (or write) an element, but nothing else! Second, checking for the end (

Lost User

Cool Cow Orjan wrote:

Did you think of C#

Yep, my brain can only handle one language at a time. :( However, the code still works.

Unrequited desire is character building. OriginalGriff

Lost User

Stefan_Lang wrote:

Sorry this turned out rather longer than

Don't apologise, it's an excellent analysis of the issues, and solution.

Unrequited desire is character building. OriginalGriff

MicroVirus

Stefan_Lang wrote:

Sorry this turned out rather longer than intended, but I hope you will appreciate it anyway ;)

Only ever apologise to yourself, for the time lost writing an excellent post ;)