XML-safe strings, and iostreams
-
Hi, 1. Surely somebody here has written a really efficient and fast function which will replace special characters in a string with their XML entity equivalents? Every programmer using C++ and XML must surely have had to write this by now? I've just knocked a quick one up here:
//typedef basic_string tstring; void XMLSafeString(tstring& str) { int len = str.size(); TCHAR * buffer = new TCHAR[len+1]; _tcscpy(buffer, str.c_str()); // make an attempt at reducing re-allocs... str.reserve(len + 20); str = _T(""); for(int i = 0; i < len; i++) { switch(buffer[i]) { case _T('"'): str += """; break; case _T('<'): str += "<"; break; case _T('>'): str += ">"; break; case _T('&'): str += "&"; break; case _T('\''): str += "'"; break; default: str += i; } } delete [] buffer; }
but I'm sure there must be better ways. Also, I would like to see an efficient solution that doesn't modify the incoming string, but returns a tstring. The reason for this is: 2. I want to write out XML using the ofstream class. Unfortunately, I can't use the function I've got above without permanently modifying my input strings - which is a no-no. Therefore, I will have to copy every string into another object, and pass the xmlsafe-ised version of that string into the ofstream. This kind of reduces the simplicity of iostreams. Does anyone have anything clever like a modifier which enables XML character escaping? Thanks in advance for any suggestions, code, etc... Oh, and a quick disclaimer: I haven't tested the above code yet. -- Simon Steele Programmers Notepad - http://www.pnotepad.org/ -
Hi, 1. Surely somebody here has written a really efficient and fast function which will replace special characters in a string with their XML entity equivalents? Every programmer using C++ and XML must surely have had to write this by now? I've just knocked a quick one up here:
//typedef basic_string tstring; void XMLSafeString(tstring& str) { int len = str.size(); TCHAR * buffer = new TCHAR[len+1]; _tcscpy(buffer, str.c_str()); // make an attempt at reducing re-allocs... str.reserve(len + 20); str = _T(""); for(int i = 0; i < len; i++) { switch(buffer[i]) { case _T('"'): str += """; break; case _T('<'): str += "<"; break; case _T('>'): str += ">"; break; case _T('&'): str += "&"; break; case _T('\''): str += "'"; break; default: str += i; } } delete [] buffer; }
but I'm sure there must be better ways. Also, I would like to see an efficient solution that doesn't modify the incoming string, but returns a tstring. The reason for this is: 2. I want to write out XML using the ofstream class. Unfortunately, I can't use the function I've got above without permanently modifying my input strings - which is a no-no. Therefore, I will have to copy every string into another object, and pass the xmlsafe-ised version of that string into the ofstream. This kind of reduces the simplicity of iostreams. Does anyone have anything clever like a modifier which enables XML character escaping? Thanks in advance for any suggestions, code, etc... Oh, and a quick disclaimer: I haven't tested the above code yet. -- Simon Steele Programmers Notepad - http://www.pnotepad.org/ -
Sounds like great work for regular expressions no?
"No matter where you go, there your are..." - Buckaoo Banzi
-pete
Possibly, but will this be faster than my brute force replacer? Also, it involves the inclusion of a regular expression library. I'm not against the idea, but it has drawbacks. Also, would I have to use x different regexs where each x represents one transformation: " -> " ' -> ' ... or is there a way I can do all of these with one regex? Thanks for your help, -- Simon Steele Programmers Notepad - http://www.pnotepad.org/
-
Hi, 1. Surely somebody here has written a really efficient and fast function which will replace special characters in a string with their XML entity equivalents? Every programmer using C++ and XML must surely have had to write this by now? I've just knocked a quick one up here:
//typedef basic_string tstring; void XMLSafeString(tstring& str) { int len = str.size(); TCHAR * buffer = new TCHAR[len+1]; _tcscpy(buffer, str.c_str()); // make an attempt at reducing re-allocs... str.reserve(len + 20); str = _T(""); for(int i = 0; i < len; i++) { switch(buffer[i]) { case _T('"'): str += """; break; case _T('<'): str += "<"; break; case _T('>'): str += ">"; break; case _T('&'): str += "&"; break; case _T('\''): str += "'"; break; default: str += i; } } delete [] buffer; }
but I'm sure there must be better ways. Also, I would like to see an efficient solution that doesn't modify the incoming string, but returns a tstring. The reason for this is: 2. I want to write out XML using the ofstream class. Unfortunately, I can't use the function I've got above without permanently modifying my input strings - which is a no-no. Therefore, I will have to copy every string into another object, and pass the xmlsafe-ised version of that string into the ofstream. This kind of reduces the simplicity of iostreams. Does anyone have anything clever like a modifier which enables XML character escaping? Thanks in advance for any suggestions, code, etc... Oh, and a quick disclaimer: I haven't tested the above code yet. -- Simon Steele Programmers Notepad - http://www.pnotepad.org/Firstly, the above code contains a bug:
str += i;
should be:str += buffer[i];
Also, I have found a simple way of using the above function with iostreams. It works like this:struct FormatXML { tstring str_; explicit FormatXML(const tstring& str) : str_(str) { XMLSafeString(str_); } friend std::ostream& operator<<(std::ostream& s, const FormatXML& x) { s << x.str_; return s; } };
This copies the input string, and performs the XML safety check on construction. The safe string is then passed into the stream. It would be used like this:tstring str = _T("My string with \"Quotes\"."); myostream << FormatXML(str);
This isn't very efficient, however. :(( Therefore, I'm still on the lookout for cool code. Let me know! -- Simon Steele Programmers Notepad - http://www.pnotepad.org/