STL std::string help needed

Daniel Pfeffer

Inside your program, the best way to represent characters is using the wchar_t-based types (e.g. std::wstring). This enables simple processing (all characters are represented by a single wchar_t value), and so on. If you wish to call a library that only supports char-based types (e.g. std::string), you must convert whar_t types to char type, call the library, and convert the results back. In C++11, the standard way to do this is something like this:

#include
#include
#include

std::wstring_convert> converter;

std::wstring wide_source;
std::string narrow_target = converter.to_bytes(wide_source);

std::string narrow_source;
std::wstring wide_target = converter.from_bytes(narrow_source);

If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

Lost User

Thank you for your solution sir but the wide character is something like this, L"F:\\dupelicateFinder\\New folder\\New folder\\检查.jpg" so I am converting it to the string by the above method described by you, "F:\\dupelicateFinder\\New folder\\New folder\\æ£€æŸ¥.jpg" I've have already found a way to convert the std::string to char* using strcppy so finally I get this, "F:\\dupelicateFinder\\New folder\\New folder\\æ£€æŸ¥.jpg" the same thing as of string, but I have a function( from 3rd party library) which takes char* as an argument so, I have char* value as F:\\dupelicateFinder\\New folder\\New folder\\æ£€æŸ¥.jpg but the function shows returns -1(file not found) since the unicode fonts didn't changed from 检查.jpg to æ£€æŸ¥.jpg so how to open the file using that function I have checked the work flow of this function using Debugger by creating the break-points and checked the values using Immediate window. Below is my code:

// Template is passed as wstring i.e duplicates is equal to std::wstring
template
std::string Duplicates::compute_hash(duplicates file_loc)
{
std::wstring_convert> converter;
std::string narrow_target = converter.to_bytes(file_loc);
char *cstr = new char[narrow_target.length() + 1];
strcpy(cstr, narrow_target.c_str());
//This function takes char* as an argument
std::string hash = CALL_MD5_Function(cstr);
delete[] cstr;
std::cout << hash;
return hash;
}

Lost User

Thank you for your solution sir but the wide character is something like this, L"F:\\dupelicateFinder\\New folder\\New folder\\检查.jpg" so I am converting it to the string by the above method described by you, "F:\\dupelicateFinder\\New folder\\New folder\\æ£€æŸ¥.jpg" I've have already found a way to convert the std::string to char* using strcppy so finally I get this, "F:\\dupelicateFinder\\New folder\\New folder\\æ£€æŸ¥.jpg" the same thing as of string, but I have a function( from 3rd party library) which takes char* as an argument so, I have char* value as F:\\dupelicateFinder\\New folder\\New folder\\æ£€æŸ¥.jpg but the function shows returns -1(file not found) since the unicode fonts didn't changed from 检查.jpg to æ£€æŸ¥.jpg so how to open the file using that function I have checked the work flow of this function using Debugger by creating the break-points and checked the values using Immediate window. Below is my code:

// Template is passed as wstring i.e duplicates is equal to std::wstring
template
std::string Duplicates::compute_hash(duplicates file_loc)
{
std::wstring_convert> converter;
std::string narrow_target = converter.to_bytes(file_loc);
char *cstr = new char[narrow_target.length() + 1];
strcpy(cstr, narrow_target.c_str());
//This function takes char* as an argument
std::string hash = CALL_MD5_Function(cstr);
delete[] cstr;
std::cout << hash;
return hash;
}

Lost User

Thank you for your solution sir but the wide character is something like this, L"F:\\dupelicateFinder\\New folder\\New folder\\检查.jpg" so I am converting it to the string by the above method described by you, "F:\\dupelicateFinder\\New folder\\New folder\\æ£€æŸ¥.jpg" I've have already found a way to convert the std::string to char* using strcppy so finally I get this, "F:\\dupelicateFinder\\New folder\\New folder\\æ£€æŸ¥.jpg" the same thing as of string, but I have a function( from 3rd party library) which takes char* as an argument so, I have char* value as F:\\dupelicateFinder\\New folder\\New folder\\æ£€æŸ¥.jpg but the function shows returns -1(file not found) since the unicode fonts didn't changed from 检查.jpg to æ£€æŸ¥.jpg so how to open the file using that function I have checked the work flow of this function using Debugger by creating the break-points and checked the values using Immediate window. Below is my code:

// Template is passed as wstring i.e duplicates is equal to std::wstring
template
std::string Duplicates::compute_hash(duplicates file_loc)
{
std::wstring_convert> converter;
std::string narrow_target = converter.to_bytes(file_loc);
char *cstr = new char[narrow_target.length() + 1];
strcpy(cstr, narrow_target.c_str());
//This function takes char* as an argument
std::string hash = CALL_MD5_Function(cstr);
delete[] cstr;
std::cout << hash;
return hash;
}

Lost User

As you have discovered, converting the Unicode string to ASCII does not work.

Jochen Arndt

The function name CALL_MD5_Function indicates that it is calculating an MD5 hash sum. But that algorithm is a binary operation and usually requires passing a byte array and a length. With C/C++ char* pointers are often used to pass byte arrays (using uint8_t* would be better). So a char* is not always an indication for a string type. You are calculating the hash for file names which use different encodings on different platforms (e.g. UTF-16LE on Windows and UTF-8 on Linux). In such cases you have to know (or define) which encoding has to be used for calculations of the hash sum. Then you have to convert the file name strings to that encoding before calculating the hash sum. If it is used only on a single platform, just cast the wide string pointer and pass the length in bytes (the length is missing in your function call; I assume it is just a wrapper to the real function passing strlen). Finally, why do you want to get the MD5 sum of file names? It is usually calculated for file content which is just binary.

Lost User

No, sir the function will get the MD5 of the file itself, not for the names of the files. I am on a windows platform and this function is not going to be used for *nix platforms. so what shall I do sir

Lost User

Thank you so much sir for your kind help, I finally found a way. Thank you once again for your time sir.

Jochen Arndt

Use a wide string version of that function. If you have the sources, change the file name parameter to be a wide string and call the wide string version of the used file open function.

Lost User

Thank you for your kind help sir, I have modified the function and now it is working! Thank you once again for your time!

Lost User

Thank you for your kind help sir, I have modified the function and now it is working! Thank you once again for your time!

Daniel Pfeffer

Filenames, unfortunately, can be a problem. In order to work with multi-byte character filenames (rather than Unicode), you must convert them according to your Operating System's requirements. For Windows, this typically means using the crorrect Code Page for your system. See the WideCharToMultibyte() and the MultibyteToWideChar() APIs for details.

If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

Lost User

Sir, If I convert it according to my os requirement then we cannot guarantee it works with other os sir? Thank you :confused:

Daniel Pfeffer

If you are converting filenames from Unicode to multi-byte, then you must do this according to the rules of the O/S. However, this can be encapsulated in a single class. Use conditional compilation (e.g #ifdef WINDOWS or #ifdef LINUX) to choose the correct version of the class.

If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

Lost User

Thank you sir for your kind help and time, I now understand. However I converted the some of the functions in that library to accept std::wstring which I made it easy that way