Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Unicode comparing of file extensions?

Unicode comparing of file extensions?

Scheduled Pinned Locked Moved C / C++ / MFC
helpquestion
11 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F Fahr

    == '.') { // Get the extension wcsncpy(FileExt, FileName + x + 1, FileNameLen - x - 1); break; } } FileExt = _wcsupr(FileExt); if(FileExt == _T("U") || FileExt == _T("DLL")) { return 1; } else if(FileExt == _T("UNR")) { return 2; } else { return 3; } === First of all, it ALWAYS returns 3, none of the other IFs are triggered for some reason. Second, if I view the FileExt after the extraction; if the extension of the file is only ONE charachter, it adds a strange other char to it for no apparent reason... Can anyone tell me what I'm doing wrong? Thanks, - Fahr

    G Offline
    G Offline
    Gary R Wheeler
    wrote on last edited by
    #2

    Try this:

    _TCHAR *FileExt = new _TCHAR[10];
    int FileNameLen = _tcslen(FileName);
    for(int x = FileNameLen; x>= 0; x--)
    {

    == _T('.'))

    {
    // Get the extension
    _tcsncpy(FileExt, FileName + x + 1, FileNameLen - x - 1);
    break;
    }
    }

    _tcsupr(FileExt);

    if ((_tcscmp(FileExt,_T("U")) == 0) || (_tcscmp(FileExt,_T("DLL")) == 0))
    {
    delete []FileExt;
    return 1;
    }
    else if (_tcscmp(FileExt,_T("UNR")) == 0)
    {
    delete []FileExt;
    return 2;
    }
    else
    {
    delete []FileExt;
    return 3;
    }

    First, I replaced the 'unsigned short' with _TCHAR. You're using the tchar.h macros elsewhere, so this will make the code work whether you compile for MBCS or UNICODE. Second, the first if condition wasn't quite right; I changed the character constant from '.' to _T('.'). The '.' is an MBCS character always, even if you are compiling for UNICODE. The statement FileExt = _wcsupr(FileExt) will make the FileExt value upper case, but it may also change the value of FileExt (which you probably don't want). Next, the conditions where you compare the extracted extension weren't right. Using FileExt == _T("U") compares the pointer to the extension value to the pointer to a constant string. You want to do a string comparison here, hence the _tcscmp calls. I also added the delete's before the return's so that this wouldn't leak memory.


    Software Zen: delete this;

    F 1 Reply Last reply
    0
    • G Gary R Wheeler

      Try this:

      _TCHAR *FileExt = new _TCHAR[10];
      int FileNameLen = _tcslen(FileName);
      for(int x = FileNameLen; x>= 0; x--)
      {

      == _T('.'))

      {
      // Get the extension
      _tcsncpy(FileExt, FileName + x + 1, FileNameLen - x - 1);
      break;
      }
      }

      _tcsupr(FileExt);

      if ((_tcscmp(FileExt,_T("U")) == 0) || (_tcscmp(FileExt,_T("DLL")) == 0))
      {
      delete []FileExt;
      return 1;
      }
      else if (_tcscmp(FileExt,_T("UNR")) == 0)
      {
      delete []FileExt;
      return 2;
      }
      else
      {
      delete []FileExt;
      return 3;
      }

      First, I replaced the 'unsigned short' with _TCHAR. You're using the tchar.h macros elsewhere, so this will make the code work whether you compile for MBCS or UNICODE. Second, the first if condition wasn't quite right; I changed the character constant from '.' to _T('.'). The '.' is an MBCS character always, even if you are compiling for UNICODE. The statement FileExt = _wcsupr(FileExt) will make the FileExt value upper case, but it may also change the value of FileExt (which you probably don't want). Next, the conditions where you compare the extracted extension weren't right. Using FileExt == _T("U") compares the pointer to the extension value to the pointer to a constant string. You want to do a string comparison here, hence the _tcscmp calls. I also added the delete's before the return's so that this wouldn't leak memory.


      Software Zen: delete this;

      F Offline
      F Offline
      Fahr
      wrote on last edited by
      #3

      Ok, since I'm compiling only unicode I stuck with the wcs commands (Unicode is required for this, I'm not ever going to recompile it in MBCS). If there's a good reason for me to use the _TCHAR and _tcs commands instead, please tell me :) The code DOES work now with the compare calls and the not writing back after uppercasing, but the 1-letter-extension problem still remains for some reason, after a one-letter extension it adds a 135-char... I don't get why... As for the delete[]s, what to do in such a case if I want to RETURN the FileExt? I can't delete it before the return. And also, what to do with static variables? I assume I can't delete those either... Thanks, - Fahr

      1 Reply Last reply
      0
      • F Fahr

        == '.') { // Get the extension wcsncpy(FileExt, FileName + x + 1, FileNameLen - x - 1); break; } } FileExt = _wcsupr(FileExt); if(FileExt == _T("U") || FileExt == _T("DLL")) { return 1; } else if(FileExt == _T("UNR")) { return 2; } else { return 3; } === First of all, it ALWAYS returns 3, none of the other IFs are triggered for some reason. Second, if I view the FileExt after the extraction; if the extension of the file is only ONE charachter, it adds a strange other char to it for no apparent reason... Can anyone tell me what I'm doing wrong? Thanks, - Fahr

        M Offline
        M Offline
        Michael Dunn
        wrote on last edited by
        #4

        You have a couple of problems. First, the correct type for Unicode characters is WCHAR (yeah, it's the same thing as unsigned short due to a typedef, but using WCHAR will make the code easily readable by others). Then you're comparing strings with == which is not correct for C-style strings. There's already an API for finding the extension, then once you have that use _wcsicmp() to do case-insensitive comparison.

        int LookAtExt(LPCWSTR wszFilename)
        {
        LPCWSTR wszExt = PathFindExtension ( wszFilename );

        if ( 0 == _wcsicmp ( wszExt, L".U" ) || 0 == _wcsicmp ( wszExt, L".DLL" ) )
        return 1;
        else if ( 0 == _wcsicmp ( wszExt, L".UNR" ) )
        return 2;
        else
        return 3;
        }

        You can also use a string wrapper class like wstring or CString (which has a handy CompareNoCase() method) to make the code a little neater. --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

        F 1 Reply Last reply
        0
        • M Michael Dunn

          You have a couple of problems. First, the correct type for Unicode characters is WCHAR (yeah, it's the same thing as unsigned short due to a typedef, but using WCHAR will make the code easily readable by others). Then you're comparing strings with == which is not correct for C-style strings. There's already an API for finding the extension, then once you have that use _wcsicmp() to do case-insensitive comparison.

          int LookAtExt(LPCWSTR wszFilename)
          {
          LPCWSTR wszExt = PathFindExtension ( wszFilename );

          if ( 0 == _wcsicmp ( wszExt, L".U" ) || 0 == _wcsicmp ( wszExt, L".DLL" ) )
          return 1;
          else if ( 0 == _wcsicmp ( wszExt, L".UNR" ) )
          return 2;
          else
          return 3;
          }

          You can also use a string wrapper class like wstring or CString (which has a handy CompareNoCase() method) to make the code a little neater. --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

          F Offline
          F Offline
          Fahr
          wrote on last edited by
          #5

          PathFindExtension results in: error LNK2001: unresolved external symbol __imp__PathFindExtensionW@4 I DO have shlwapi.h in my includes... - Fahr

          M 1 Reply Last reply
          0
          • F Fahr

            PathFindExtension results in: error LNK2001: unresolved external symbol __imp__PathFindExtensionW@4 I DO have shlwapi.h in my includes... - Fahr

            M Offline
            M Offline
            Michael Dunn
            wrote on last edited by
            #6

            2.3 I'm trying to call a Windows API, but the linker gives an unresolved external error (LNK2001) on the API name. Why? --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

            F 1 Reply Last reply
            0
            • M Michael Dunn

              2.3 I'm trying to call a Windows API, but the linker gives an unresolved external error (LNK2001) on the API name. Why? --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

              F Offline
              F Offline
              Fahr
              wrote on last edited by
              #7

              OK! That worked! Thanks a lot :) It saves a lot of trouble of getting it myself, plus the weird trailing char error is no longer there :) Also, you suggested the use of L"", while I use _T(""), a quick look in the TCHAR.h gave me the idea that _T("") is defined as L""... So what IS the actual difference? If any... - Fahr

              M 1 Reply Last reply
              0
              • F Fahr

                OK! That worked! Thanks a lot :) It saves a lot of trouble of getting it myself, plus the weird trailing char error is no longer there :) Also, you suggested the use of L"", while I use _T(""), a quick look in the TCHAR.h gave me the idea that _T("") is defined as L""... So what IS the actual difference? If any... - Fahr

                M Offline
                M Offline
                Michael Dunn
                wrote on last edited by
                #8

                Use the TCHAR macros (including _T) when you want to make ANSI and Unicode builds from the same code. Since you said you only need to make a Unicode build, you can go ahead and use L"" to make Unicode literals. See my article on Win32 Character Encodings[^] for the full scoop. --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

                F 1 Reply Last reply
                0
                • M Michael Dunn

                  Use the TCHAR macros (including _T) when you want to make ANSI and Unicode builds from the same code. Since you said you only need to make a Unicode build, you can go ahead and use L"" to make Unicode literals. See my article on Win32 Character Encodings[^] for the full scoop. --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

                  F Offline
                  F Offline
                  Fahr
                  wrote on last edited by
                  #9

                  Well, that article does shed light on a lot of issues, thanks a lot :) I actually DO need the DLL to run under Windows 98, if I use only unicode it wont work then I guess? I didn't quite realise that... So what do I do now? Change all my L""s back to _T("")s and all my WCHAR*s to TCHAR*s? Will that do the trick? And do I need to build 2 different DLLs for WinNT and Win9x?? - Fahr

                  M 1 Reply Last reply
                  0
                  • F Fahr

                    Well, that article does shed light on a lot of issues, thanks a lot :) I actually DO need the DLL to run under Windows 98, if I use only unicode it wont work then I guess? I didn't quite realise that... So what do I do now? Change all my L""s back to _T("")s and all my WCHAR*s to TCHAR*s? Will that do the trick? And do I need to build 2 different DLLs for WinNT and Win9x?? - Fahr

                    M Offline
                    M Offline
                    Michael Dunn
                    wrote on last edited by
                    #10

                    Fahr wrote: I actually DO need the DLL to run under Windows 98, if I use only unicode it wont work then I guess? Right. You can use Unicode strings in 98, however you can't call Unicode APIs because they are not implemented. So yeah, you'll need to change your wcsxxx() calls to their _tcsxxx() equivalents, and use _T around literals. The string article covers this topic and why the TCHAR system is necessary. It sounds like you're not using any NT-specific features, so you can just build an MBCS build of your code and use that on NT. Again, see the string article for the full story. --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

                    F 1 Reply Last reply
                    0
                    • M Michael Dunn

                      Fahr wrote: I actually DO need the DLL to run under Windows 98, if I use only unicode it wont work then I guess? Right. You can use Unicode strings in 98, however you can't call Unicode APIs because they are not implemented. So yeah, you'll need to change your wcsxxx() calls to their _tcsxxx() equivalents, and use _T around literals. The string article covers this topic and why the TCHAR system is necessary. It sounds like you're not using any NT-specific features, so you can just build an MBCS build of your code and use that on NT. Again, see the string article for the full story. --Mike-- When 900 years old you reach, look as good you will not. Hmm. 1ClickPicGrabber - Grab & organize pictures from your favorite web pages, with 1 click! My really out-of-date homepage Sonork-100.19012 Acid_Helm

                      F Offline
                      F Offline
                      Fahr
                      wrote on last edited by
                      #11

                      yeah, I changed all it around... problem is tho, as soon as I change UNICODE,_UNICODE to _MBCS, I get about 5 unresolved externals... I'm using the DLL for native coding with Unreal Script, the script can call to the DLL and uses the core and engine of the game for game-specific functions, apperently those only support unicode... Which is terribly odd, cuz the game runs on ANY windows... - Fahr

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups