Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Problem Parsing Pictures from Rich-Text (RTF) File

Problem Parsing Pictures from Rich-Text (RTF) File

Scheduled Pinned Locked Moved C / C++ / MFC
graphicshardwarejsonhelptutorial
3 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P Offline
    P Offline
    Popeye Doyle Murray
    wrote on last edited by
    #1

    I am writing software that parses out an RTF file. One of the things I want to do is extract pictures and save them to a file. The following is taken directly from Microsoft's RTF FAQ (version 1.3): ---- From RTF FAQ --- The picture in hexadecimal or binary format follows the picture-destination control words. The following example illustrates the destination format: {\pict\wbitmap0\picw170\pich77\wbmbitspixel1\wbmplanes1\wbmwidthbytes22 \picwgoal505 \pichgoal221 \picscalex172 \picscaley172 49f2000000000273023d1101a030 3901000a000000000273023d98 0048000200000275 02040000200010275023e000000000 273023d000002b90002b90002 b90002b90002b9 0002b90002b90002b90002b90002b90002 b92222b90002b90002b90 002b90002b9 0002b90002b90002b90002b9000 ---- End of Stuff from RTF FAQ --- I am able to parse out all of the bytes (in this example, beginning 49f2...) and I convert them from their text representation to real bytes (i.e., the character sequence "49" is converted to 0x49), but what I get is not a readable picture. Here is more detail: Suppose I have a bitmap file. I take the same picture and embed it in an RTF document. I can open up the document in Notepad and view the hexidecimal represeantion of the image. I have also written a program that can read a binary file such as a bitmap and convert it to the text equivalent of a hexidecimal as it would appear embedded in the RTF document. I then visually compare the hex I generated directly from the hex embeded in the RTF file. The hex from the RTF file is the same as the hex I generate directly from the disk file... EXCEPT for the first several hundred bytes! Each source starts out different, but after a hundred byts or so, each is then the same, so I know my encode/decode should be working. But why is the hexidecimal of the embedded picture not exactly the same as the hexidecmal of the disk file? According to the RTF FAQ, they should be. What can I do to extract the picture? Did Microsoft do something to encode the first series of bytes for a picture so that no one else can extract it? I have found this problem with both bitmaps and GIF files. I am using MS Word 95 to generate the RTF files with pictures embedded.

    H J 2 Replies Last reply
    0
    • P Popeye Doyle Murray

      I am writing software that parses out an RTF file. One of the things I want to do is extract pictures and save them to a file. The following is taken directly from Microsoft's RTF FAQ (version 1.3): ---- From RTF FAQ --- The picture in hexadecimal or binary format follows the picture-destination control words. The following example illustrates the destination format: {\pict\wbitmap0\picw170\pich77\wbmbitspixel1\wbmplanes1\wbmwidthbytes22 \picwgoal505 \pichgoal221 \picscalex172 \picscaley172 49f2000000000273023d1101a030 3901000a000000000273023d98 0048000200000275 02040000200010275023e000000000 273023d000002b90002b90002 b90002b90002b9 0002b90002b90002b90002b90002b90002 b92222b90002b90002b90 002b90002b9 0002b90002b90002b90002b9000 ---- End of Stuff from RTF FAQ --- I am able to parse out all of the bytes (in this example, beginning 49f2...) and I convert them from their text representation to real bytes (i.e., the character sequence "49" is converted to 0x49), but what I get is not a readable picture. Here is more detail: Suppose I have a bitmap file. I take the same picture and embed it in an RTF document. I can open up the document in Notepad and view the hexidecimal represeantion of the image. I have also written a program that can read a binary file such as a bitmap and convert it to the text equivalent of a hexidecimal as it would appear embedded in the RTF document. I then visually compare the hex I generated directly from the hex embeded in the RTF file. The hex from the RTF file is the same as the hex I generate directly from the disk file... EXCEPT for the first several hundred bytes! Each source starts out different, but after a hundred byts or so, each is then the same, so I know my encode/decode should be working. But why is the hexidecimal of the embedded picture not exactly the same as the hexidecmal of the disk file? According to the RTF FAQ, they should be. What can I do to extract the picture? Did Microsoft do something to encode the first series of bytes for a picture so that no one else can extract it? I have found this problem with both bitmaps and GIF files. I am using MS Word 95 to generate the RTF files with pictures embedded.

      H Offline
      H Offline
      Henry miller
      wrote on last edited by
      #2

      I don't know the answer, but from your question, it sounds like some picture header is being modified. I'd look for specs for your picture format, and manually compare the hex. It wouldn't surprize me if Word was changing something in the headers, but you can't know unless you start reading. Have your tried other programs that read rtf? There are some free ones. (Open Office comes to mind). Perhaps there is a bug in word95, or it isn't implimenting the same version of rtf as what you are working on. If you are encoding, can word read your document? How about other programs. If you just want to decode, save some pictures in rtf, decode, and see if they look the same. Don't forget to extract your pictures with Word again (if you can) to see if it saves the same thing as the origional. The way you worded your question I suspect your program is working correctly. If your pictures look the same visually, I wouldn't worry about it. If there are visual differences, you are on the right track to solving them.

      1 Reply Last reply
      0
      • P Popeye Doyle Murray

        I am writing software that parses out an RTF file. One of the things I want to do is extract pictures and save them to a file. The following is taken directly from Microsoft's RTF FAQ (version 1.3): ---- From RTF FAQ --- The picture in hexadecimal or binary format follows the picture-destination control words. The following example illustrates the destination format: {\pict\wbitmap0\picw170\pich77\wbmbitspixel1\wbmplanes1\wbmwidthbytes22 \picwgoal505 \pichgoal221 \picscalex172 \picscaley172 49f2000000000273023d1101a030 3901000a000000000273023d98 0048000200000275 02040000200010275023e000000000 273023d000002b90002b90002 b90002b90002b9 0002b90002b90002b90002b90002b90002 b92222b90002b90002b90 002b90002b9 0002b90002b90002b90002b9000 ---- End of Stuff from RTF FAQ --- I am able to parse out all of the bytes (in this example, beginning 49f2...) and I convert them from their text representation to real bytes (i.e., the character sequence "49" is converted to 0x49), but what I get is not a readable picture. Here is more detail: Suppose I have a bitmap file. I take the same picture and embed it in an RTF document. I can open up the document in Notepad and view the hexidecimal represeantion of the image. I have also written a program that can read a binary file such as a bitmap and convert it to the text equivalent of a hexidecimal as it would appear embedded in the RTF document. I then visually compare the hex I generated directly from the hex embeded in the RTF file. The hex from the RTF file is the same as the hex I generate directly from the disk file... EXCEPT for the first several hundred bytes! Each source starts out different, but after a hundred byts or so, each is then the same, so I know my encode/decode should be working. But why is the hexidecimal of the embedded picture not exactly the same as the hexidecmal of the disk file? According to the RTF FAQ, they should be. What can I do to extract the picture? Did Microsoft do something to encode the first series of bytes for a picture so that no one else can extract it? I have found this problem with both bitmaps and GIF files. I am using MS Word 95 to generate the RTF files with pictures embedded.

        J Offline
        J Offline
        Joel Lucsy
        wrote on last edited by
        #3

        I believe that it's a OLE Stream that embeds the picture, not a straight picture. -- Joel Lucsy

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups