Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Text extraction

Text extraction

Scheduled Pinned Locked Moved C / C++ / MFC
question
11 Posts 8 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K Offline
    K Offline
    KongHL
    wrote on last edited by
    #1

    How do i extract text from MS word? It would be better if the end user does not need to have microsoft office installed.

    H G M D 4 Replies Last reply
    0
    • K KongHL

      How do i extract text from MS word? It would be better if the end user does not need to have microsoft office installed.

      H Offline
      H Offline
      hsdok
      wrote on last edited by
      #2

      The words from .doc file???

      J K 2 Replies Last reply
      0
      • H hsdok

        The words from .doc file???

        J Offline
        J Offline
        James Gupta
        wrote on last edited by
        #3

        Well, try searching for any character with an ASCII value of between 32 and 128 (i think thats right, the characters that can be typed on a keyboard, but just check it before you do)

        T 1 Reply Last reply
        0
        • H hsdok

          The words from .doc file???

          K Offline
          K Offline
          KongHL
          wrote on last edited by
          #4

          What i need is to convert a microsoft words document to a text file. How can I do this without having microsoft office? If not, using library in microsoft office is fine but how?

          S 1 Reply Last reply
          0
          • J James Gupta

            Well, try searching for any character with an ASCII value of between 32 and 128 (i think thats right, the characters that can be typed on a keyboard, but just check it before you do)

            T Offline
            T Offline
            toxcct
            wrote on last edited by
            #5

            wrong, MS Word uses unicode...


            TOXCCT >>> GEII power
            [toxcct][VisualCalc 2.20][VCalc 3.0 soon...]

            1 Reply Last reply
            0
            • K KongHL

              What i need is to convert a microsoft words document to a text file. How can I do this without having microsoft office? If not, using library in microsoft office is fine but how?

              S Offline
              S Offline
              Sebastian Schneider
              wrote on last edited by
              #6

              Geez. Open the document, parse it, but discard any formatting information - just keep the text. Write the text to a file. Done. WHY, do YOU think, "Office-to-anything-else" converters are so freaking expensive? Cheers, Sebastian -- Contra vim mortem non est medicamen in hortem.

              K 1 Reply Last reply
              0
              • S Sebastian Schneider

                Geez. Open the document, parse it, but discard any formatting information - just keep the text. Write the text to a file. Done. WHY, do YOU think, "Office-to-anything-else" converters are so freaking expensive? Cheers, Sebastian -- Contra vim mortem non est medicamen in hortem.

                K Offline
                K Offline
                KongHL
                wrote on last edited by
                #7

                how do i parse it?

                S 1 Reply Last reply
                0
                • K KongHL

                  how do i parse it?

                  S Offline
                  S Offline
                  Sebastian Schneider
                  wrote on last edited by
                  #8

                  From now on, questions are answered at a rate of 40 EUR/h. Answering your last question would take around 6-12 months. Should I go ahead? Seriously, if you really want to do it, you will have to analyze the file yourself. I aint got no clue how Word hides its content. Cheers, Sebastian -- Contra vim mortem non est medicamen in hortem.

                  1 Reply Last reply
                  0
                  • K KongHL

                    How do i extract text from MS word? It would be better if the end user does not need to have microsoft office installed.

                    G Offline
                    G Offline
                    grigsoft
                    wrote on last edited by
                    #9

                    I could recommend www.wordcnv.com[^] - these guys have fastest library, which can be supplied in a small (<50K!) lib file, and their support is great. I'm using their library myself. Igor Green http://www.grigsoft.com/ - files and folders comparison tools

                    1 Reply Last reply
                    0
                    • K KongHL

                      How do i extract text from MS word? It would be better if the end user does not need to have microsoft office installed.

                      M Offline
                      M Offline
                      Maximilien
                      wrote on last edited by
                      #10

                      like others wrote, you need to do your own reverse engineering on the Word format to extract the text; which is no walk in the park. I'm certain that if you google enough you might find something interresting on the subject.


                      Maximilien Lincourt Your Head A Splode - Strong Bad

                      1 Reply Last reply
                      0
                      • K KongHL

                        How do i extract text from MS word? It would be better if the end user does not need to have microsoft office installed.

                        D Offline
                        D Offline
                        David Crow
                        wrote on last edited by
                        #11

                        Without using Word, you'll need to know the format of a .doc file. See http://www.wotsit.org/[^] for this.


                        "The greatest good you can do for another is not just to share your riches but to reveal to him his own." - Benjamin Disraeli

                        1 Reply Last reply
                        0
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Don't have an account? Register

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups