Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. ATL / WTL / STL
  4. My codecvt (a.k.a. facet) never gets called.

My codecvt (a.k.a. facet) never gets called.

Scheduled Pinned Locked Moved ATL / WTL / STL
questionc++ioscomtesting
4 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    John R Shaw
    wrote on last edited by
    #1

    I have been at this all night and I am still no closer to a solution.:confused: The problem is that utf16_codecvt methods never get called and, therefore, the result is wrong. I have search the net, but all I can find is examples of what is supposed to work. Unfortunately none of them has worked. I have also seen other posters, on the net, with the same problem, but no one gave them and answer to it. I have tested to make sure that it has the facet (utf16_codecvt) and it does. So I see no reason why its virtual methods are never called. Instead it keeps calling the codecvt<wchar_t,char, mbstate> methods. Any ideas?

    class utf16_codecvt : public std::codecvt<char16_t, char16_t, std::mbstate_t>
    {
    ...//
    };

    void MyTestFunc()
    {
    ... //
    std::wifstream myFile;
    std::locale myLoc = std::locale(myFile.getloc(), new utf16_codecvt);
    myFile.imbue(myLoc);
    myFile.open(pFileName, std::ios::in | std::ios::binary);
    ... //
    myFile.read(bom_buffer, 1);
    ... //
    }

    The following link gives an example of the types of things I am trying to do. April 01, 1999 - Unicode Files - P.J. Plauger http://www.ddj.com/cpp/184403638?pgno=1[^] Signed, Very <blanking> tired. :zzz:

    INTP "Program testing can be used to show the presence of bugs, but never to show their absence."Edsger Dijkstra

    S 1 Reply Last reply
    0
    • J John R Shaw

      I have been at this all night and I am still no closer to a solution.:confused: The problem is that utf16_codecvt methods never get called and, therefore, the result is wrong. I have search the net, but all I can find is examples of what is supposed to work. Unfortunately none of them has worked. I have also seen other posters, on the net, with the same problem, but no one gave them and answer to it. I have tested to make sure that it has the facet (utf16_codecvt) and it does. So I see no reason why its virtual methods are never called. Instead it keeps calling the codecvt<wchar_t,char, mbstate> methods. Any ideas?

      class utf16_codecvt : public std::codecvt<char16_t, char16_t, std::mbstate_t>
      {
      ...//
      };

      void MyTestFunc()
      {
      ... //
      std::wifstream myFile;
      std::locale myLoc = std::locale(myFile.getloc(), new utf16_codecvt);
      myFile.imbue(myLoc);
      myFile.open(pFileName, std::ios::in | std::ios::binary);
      ... //
      myFile.read(bom_buffer, 1);
      ... //
      }

      The following link gives an example of the types of things I am trying to do. April 01, 1999 - Unicode Files - P.J. Plauger http://www.ddj.com/cpp/184403638?pgno=1[^] Signed, Very <blanking> tired. :zzz:

      INTP "Program testing can be used to show the presence of bugs, but never to show their absence."Edsger Dijkstra

      S Offline
      S Offline
      Stuart Dootson
      wrote on last edited by
      #2

      From what I can tell, the C++ stream system presumes that files are sequences of bytes, not characters - even when you use wide streams - the 'wide' part of wide stream (AFAICT) indicates how the stream object interacts with C++, not the underlying file or whatever. Thus, your codecvt facet has to take in characters. By changing the declaration of your codecvt facet to that shown below, I was able to get breakpoints in the replacement facet being set.

      class utf16_codecvt : public std::codecvt<char16_t, char, std::mbstate_t>
      {
      typedef std::codecvt<char16_t, char, std::mbstate_t> Base;
      typedef char16_t ElemT;
      typedef char ByteT;
      virtual result __CLR_OR_THIS_CALL do_in(std::mbstate_t& s,
      const ByteT *_First1, const ByteT *_Last1, const ByteT *& _Mid1,
      ElemT*_First2, ElemT* _Last2, ElemT *& _Mid2) const
      { // convert bytes [_First1, _Last1) to [_First2, _Last)
      return Base::do_in(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
      }

      virtual result __CLR_OR_THIS_CALL do_out(std::mbstate_t& s,
      const ElemT*_First1, const ElemT*_Last1, const ElemT*& _Mid1,
      ByteT*_First2, ByteT*_Last2, ByteT*& _Mid2) const
      { // convert [_First1, _Last1) to bytes [_First2, _Last)
      return Base::do_out(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
      }

      virtual result __CLR_OR_THIS_CALL do_unshift(std::mbstate_t& s,
      ByteT*_First2, ByteT*_Last2, ByteT*&_Mid2) const
      { // generate bytes to return to default shift state
      return Base::do_unshift(s, _First2, _Last2, _Mid2);
      }

      virtual int __CLR_OR_THIS_CALL do_length(const std::mbstate_t& s, const ByteT*_First1,
      const ByteT*_Last1, size_t _Count) const
      { // return min(_Count, converted length of bytes [_First1, _Last1))
      return Base::do_length(s, _First1, _Last1, _Count);
      }
      };

      So, your replacement facet will have to know it needs two bytes read for every character (and vice versa, obviously). The best reference for that sort of information is probably Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft[^] - but even then, locales and facets are heavy going in C++ :-(

      Java, Basic, who cares

      J 1 Reply Last reply
      0
      • S Stuart Dootson

        From what I can tell, the C++ stream system presumes that files are sequences of bytes, not characters - even when you use wide streams - the 'wide' part of wide stream (AFAICT) indicates how the stream object interacts with C++, not the underlying file or whatever. Thus, your codecvt facet has to take in characters. By changing the declaration of your codecvt facet to that shown below, I was able to get breakpoints in the replacement facet being set.

        class utf16_codecvt : public std::codecvt<char16_t, char, std::mbstate_t>
        {
        typedef std::codecvt<char16_t, char, std::mbstate_t> Base;
        typedef char16_t ElemT;
        typedef char ByteT;
        virtual result __CLR_OR_THIS_CALL do_in(std::mbstate_t& s,
        const ByteT *_First1, const ByteT *_Last1, const ByteT *& _Mid1,
        ElemT*_First2, ElemT* _Last2, ElemT *& _Mid2) const
        { // convert bytes [_First1, _Last1) to [_First2, _Last)
        return Base::do_in(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
        }

        virtual result __CLR_OR_THIS_CALL do_out(std::mbstate_t& s,
        const ElemT*_First1, const ElemT*_Last1, const ElemT*& _Mid1,
        ByteT*_First2, ByteT*_Last2, ByteT*& _Mid2) const
        { // convert [_First1, _Last1) to bytes [_First2, _Last)
        return Base::do_out(s, _First1, _Last1, _Mid1, _First2, _Last2, _Mid2);
        }

        virtual result __CLR_OR_THIS_CALL do_unshift(std::mbstate_t& s,
        ByteT*_First2, ByteT*_Last2, ByteT*&_Mid2) const
        { // generate bytes to return to default shift state
        return Base::do_unshift(s, _First2, _Last2, _Mid2);
        }

        virtual int __CLR_OR_THIS_CALL do_length(const std::mbstate_t& s, const ByteT*_First1,
        const ByteT*_Last1, size_t _Count) const
        { // return min(_Count, converted length of bytes [_First1, _Last1))
        return Base::do_length(s, _First1, _Last1, _Count);
        }
        };

        So, your replacement facet will have to know it needs two bytes read for every character (and vice versa, obviously). The best reference for that sort of information is probably Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft[^] - but even then, locales and facets are heavy going in C++ :-(

        Java, Basic, who cares

        J Offline
        J Offline
        John R Shaw
        wrote on last edited by
        #3

        Thanks, Stuart That worked great; I expected the problem was something like that. Something else I discovered was that the second template parameter has to be ‘char’ or it will not work. That is ‘unsigned char’ will not even work as the second parameter. I need to dig up a copy of the standard to see if this is compliant and makes since, because having template parameters that can only be of a single integral type is illogical.

        INTP "Program testing can be used to show the presence of bugs, but never to show their absence."Edsger Dijkstra

        S 1 Reply Last reply
        0
        • J John R Shaw

          Thanks, Stuart That worked great; I expected the problem was something like that. Something else I discovered was that the second template parameter has to be ‘char’ or it will not work. That is ‘unsigned char’ will not even work as the second parameter. I need to dig up a copy of the standard to see if this is compliant and makes since, because having template parameters that can only be of a single integral type is illogical.

          INTP "Program testing can be used to show the presence of bugs, but never to show their absence."Edsger Dijkstra

          S Offline
          S Offline
          Stuart Dootson
          wrote on last edited by
          #4

          John R. Shaw wrote:

          Something else I discovered was that the second template parameter has to be ‘char’ or it will not work. That is ‘unsigned char’ will not even work as the second parameter. I need to dig up a copy of the standard to see if this is compliant and makes since, because having template parameters that can only be of a single integral type is illogical.

          I think there are two pertinent ideas here - firstly, files are streams of bytes (that's the basic concept underlying file streams in C++), which is why they always convert to/from bytes. Secondly, codecvt facets can be used on their own, without streams. So, say you'd read in a file converting from a byte stream to (say) UCS-2. Then you want to write the UTF-32 equivalent to a file. You could use a codecvt facet that could convert from UCS-2 to UTF-32. The example code in the codecvt::in documentation on MSDN[^] shows this sort of scenario.

          Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups