Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Text extraction

Text extraction

Scheduled Pinned Locked Moved C / C++ / MFC
question
11 Posts 8 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K KongHL

    How do i extract text from MS word? It would be better if the end user does not need to have microsoft office installed.

    H Offline
    H Offline
    hsdok
    wrote on last edited by
    #2

    The words from .doc file???

    J K 2 Replies Last reply
    0
    • H hsdok

      The words from .doc file???

      J Offline
      J Offline
      James Gupta
      wrote on last edited by
      #3

      Well, try searching for any character with an ASCII value of between 32 and 128 (i think thats right, the characters that can be typed on a keyboard, but just check it before you do)

      T 1 Reply Last reply
      0
      • H hsdok

        The words from .doc file???

        K Offline
        K Offline
        KongHL
        wrote on last edited by
        #4

        What i need is to convert a microsoft words document to a text file. How can I do this without having microsoft office? If not, using library in microsoft office is fine but how?

        S 1 Reply Last reply
        0
        • J James Gupta

          Well, try searching for any character with an ASCII value of between 32 and 128 (i think thats right, the characters that can be typed on a keyboard, but just check it before you do)

          T Offline
          T Offline
          toxcct
          wrote on last edited by
          #5

          wrong, MS Word uses unicode...


          TOXCCT >>> GEII power
          [toxcct][VisualCalc 2.20][VCalc 3.0 soon...]

          1 Reply Last reply
          0
          • K KongHL

            What i need is to convert a microsoft words document to a text file. How can I do this without having microsoft office? If not, using library in microsoft office is fine but how?

            S Offline
            S Offline
            Sebastian Schneider
            wrote on last edited by
            #6

            Geez. Open the document, parse it, but discard any formatting information - just keep the text. Write the text to a file. Done. WHY, do YOU think, "Office-to-anything-else" converters are so freaking expensive? Cheers, Sebastian -- Contra vim mortem non est medicamen in hortem.

            K 1 Reply Last reply
            0
            • S Sebastian Schneider

              Geez. Open the document, parse it, but discard any formatting information - just keep the text. Write the text to a file. Done. WHY, do YOU think, "Office-to-anything-else" converters are so freaking expensive? Cheers, Sebastian -- Contra vim mortem non est medicamen in hortem.

              K Offline
              K Offline
              KongHL
              wrote on last edited by
              #7

              how do i parse it?

              S 1 Reply Last reply
              0
              • K KongHL

                how do i parse it?

                S Offline
                S Offline
                Sebastian Schneider
                wrote on last edited by
                #8

                From now on, questions are answered at a rate of 40 EUR/h. Answering your last question would take around 6-12 months. Should I go ahead? Seriously, if you really want to do it, you will have to analyze the file yourself. I aint got no clue how Word hides its content. Cheers, Sebastian -- Contra vim mortem non est medicamen in hortem.

                1 Reply Last reply
                0
                • K KongHL

                  How do i extract text from MS word? It would be better if the end user does not need to have microsoft office installed.

                  G Offline
                  G Offline
                  grigsoft
                  wrote on last edited by
                  #9

                  I could recommend www.wordcnv.com[^] - these guys have fastest library, which can be supplied in a small (<50K!) lib file, and their support is great. I'm using their library myself. Igor Green http://www.grigsoft.com/ - files and folders comparison tools

                  1 Reply Last reply
                  0
                  • K KongHL

                    How do i extract text from MS word? It would be better if the end user does not need to have microsoft office installed.

                    M Offline
                    M Offline
                    Maximilien
                    wrote on last edited by
                    #10

                    like others wrote, you need to do your own reverse engineering on the Word format to extract the text; which is no walk in the park. I'm certain that if you google enough you might find something interresting on the subject.


                    Maximilien Lincourt Your Head A Splode - Strong Bad

                    1 Reply Last reply
                    0
                    • K KongHL

                      How do i extract text from MS word? It would be better if the end user does not need to have microsoft office installed.

                      D Offline
                      D Offline
                      David Crow
                      wrote on last edited by
                      #11

                      Without using Word, you'll need to know the format of a .doc file. See http://www.wotsit.org/[^] for this.


                      "The greatest good you can do for another is not just to share your riches but to reveal to him his own." - Benjamin Disraeli

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups