Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Word docs

Word docs

Scheduled Pinned Locked Moved C#
helpcsharpregexquestion
3 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    jblouir
    wrote on last edited by
    #1

    This is stressing me out. I am trying to read word documents at the moment by loading them in as plain text into a hidden richtextbox, then I take that text and use regex to "attempt" to extract the words from the unicode. I have so far been somewhat unsuccesfull as their is always garble unicode left. Does anyone know any regex expressions that remove unicode from a string? If you are wondering why im doing it in this fasion, it is the fastest way I have found to read word docs. I have tried to use WordApplication classes, and OfficeReaders, and they are either slow or error prone. I thought about using Iwordbreaker, but that skyrocketed over my head and dont know where to begin their, plus you need a license. My c# app is working great now, and its almost ready to go live, its just these bloody word documents. If anyone can help I would greatly appreciate it. Thanks Jeremy

    C 1 Reply Last reply
    0
    • J jblouir

      This is stressing me out. I am trying to read word documents at the moment by loading them in as plain text into a hidden richtextbox, then I take that text and use regex to "attempt" to extract the words from the unicode. I have so far been somewhat unsuccesfull as their is always garble unicode left. Does anyone know any regex expressions that remove unicode from a string? If you are wondering why im doing it in this fasion, it is the fastest way I have found to read word docs. I have tried to use WordApplication classes, and OfficeReaders, and they are either slow or error prone. I thought about using Iwordbreaker, but that skyrocketed over my head and dont know where to begin their, plus you need a license. My c# app is working great now, and its almost ready to go live, its just these bloody word documents. If anyone can help I would greatly appreciate it. Thanks Jeremy

      C Offline
      C Offline
      Christian Graus
      wrote on last edited by
      #2

      I don't understand, a Word doc is NOT just rich text. How is this working ?

      Christian Graus - Microsoft MVP - C++ "I am working on a project that will convert a FORTRAN code to corresponding C++ code.I am not aware of FORTRAN syntax" ( spotted in the C++/CLI forum )

      J 1 Reply Last reply
      0
      • C Christian Graus

        I don't understand, a Word doc is NOT just rich text. How is this working ?

        Christian Graus - Microsoft MVP - C++ "I am working on a project that will convert a FORTRAN code to corresponding C++ code.I am not aware of FORTRAN syntax" ( spotted in the C++/CLI forum )

        J Offline
        J Offline
        jblouir
        wrote on last edited by
        #3

        I use the load method that comes from a richtextbox variable, I point it at a word doc, and I choose the plain text option, it loads the text into the richtextbox with a combination of junk and all the actual words in the word doc. I then transfer the text from their in to a string. Its just the matter of getting rid of the junk. Usually all the text in the word doc is tucked away neatly in the middle of the junk, but you get some \\'07 and \par's in their sometimes.

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups