Unicode comparing of file extensions?

Gary R Wheeler

Try this:

_TCHAR *FileExt = new _TCHAR[10];
int FileNameLen = _tcslen(FileName);
for(int x = FileNameLen; x>= 0; x--)
{

== _T('.'))

{
// Get the extension
_tcsncpy(FileExt, FileName + x + 1, FileNameLen - x - 1);
break;
}
}

_tcsupr(FileExt);

if ((_tcscmp(FileExt,_T("U")) == 0) || (_tcscmp(FileExt,_T("DLL")) == 0))
{
delete []FileExt;
return 1;
}
else if (_tcscmp(FileExt,_T("UNR")) == 0)
{
delete []FileExt;
return 2;
}
else
{
delete []FileExt;
return 3;
}

First, I replaced the 'unsigned short' with _TCHAR. You're using the tchar.h macros elsewhere, so this will make the code work whether you compile for MBCS or UNICODE. Second, the first if condition wasn't quite right; I changed the character constant from '.' to _T('.'). The '.' is an MBCS character always, even if you are compiling for UNICODE. The statement FileExt = _wcsupr(FileExt) will make the FileExt value upper case, but it may also change the value of FileExt (which you probably don't want). Next, the conditions where you compare the extracted extension weren't right. Using FileExt == _T("U") compares the pointer to the extension value to the pointer to a constant string. You want to do a string comparison here, hence the _tcscmp calls. I also added the delete's before the return's so that this wouldn't leak memory.

Software Zen: delete this;

Fahr

Ok, since I'm compiling only unicode I stuck with the wcs commands (Unicode is required for this, I'm not ever going to recompile it in MBCS). If there's a good reason for me to use the _TCHAR and _tcs commands instead, please tell me :) The code DOES work now with the compare calls and the not writing back after uppercasing, but the 1-letter-extension problem still remains for some reason, after a one-letter extension it adds a 135-char... I don't get why... As for the delete[]s, what to do in such a case if I want to RETURN the FileExt? I can't delete it before the return. And also, what to do with static variables? I assume I can't delete those either... Thanks, - Fahr

Michael Dunn

You have a couple of problems. First, the correct type for Unicode characters is WCHAR (yeah, it's the same thing as unsigned short due to a typedef, but using WCHAR will make the code easily readable by others). Then you're comparing strings with == which is not correct for C-style strings. There's already an API for finding the extension, then once you have that use _wcsicmp() to do case-insensitive comparison.

int LookAtExt(LPCWSTR wszFilename)
{
LPCWSTR wszExt = PathFindExtension ( wszFilename );

if ( 0 == _wcsicmp ( wszExt, L".U" ) || 0 == _wcsicmp ( wszExt, L".DLL" ) )
return 1;
else if ( 0 == _wcsicmp ( wszExt, L".UNR" ) )
return 2;
else
return 3;
}

You can also use a string wrapper class like wstring or CString (which has a handy CompareNoCase() method) to make the code a little neater. --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

Fahr

PathFindExtension results in: error LNK2001: unresolved external symbol __imp__PathFindExtensionW@4 I DO have shlwapi.h in my includes... - Fahr

Michael Dunn

2.3 I'm trying to call a Windows API, but the linker gives an unresolved external error (LNK2001) on the API name. Why? --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

Fahr

OK! That worked! Thanks a lot :) It saves a lot of trouble of getting it myself, plus the weird trailing char error is no longer there :) Also, you suggested the use of L"", while I use _T(""), a quick look in the TCHAR.h gave me the idea that _T("") is defined as L""... So what IS the actual difference? If any... - Fahr

Michael Dunn

Use the TCHAR macros (including _T) when you want to make ANSI and Unicode builds from the same code. Since you said you only need to make a Unicode build, you can go ahead and use L"" to make Unicode literals. See my article on Win32 Character Encodings[^] for the full scoop. --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

Fahr

Well, that article does shed light on a lot of issues, thanks a lot :) I actually DO need the DLL to run under Windows 98, if I use only unicode it wont work then I guess? I didn't quite realise that... So what do I do now? Change all my L""s back to _T("")s and all my WCHAR*s to TCHAR*s? Will that do the trick? And do I need to build 2 different DLLs for WinNT and Win9x?? - Fahr

Michael Dunn

Fahr wrote: I actually DO need the DLL to run under Windows 98, if I use only unicode it wont work then I guess? Right. You can use Unicode strings in 98, however you can't call Unicode APIs because they are not implemented. So yeah, you'll need to change your wcsxxx() calls to their _tcsxxx() equivalents, and use _T around literals. The string article covers this topic and why the TCHAR system is necessary. It sounds like you're not using any NT-specific features, so you can just build an MBCS build of your code and use that on NT. Again, see the string article for the full story. --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

Fahr

yeah, I changed all it around... problem is tho, as soon as I change UNICODE,_UNICODE to _MBCS, I get about 5 unresolved externals... I'm using the DLL for native coding with Unreal Script, the script can call to the DLL and uses the core and engine of the game for game-specific functions, apperently those only support unicode... Which is terribly odd, cuz the game runs on ANY windows... - Fahr