Word docs
-
This is stressing me out. I am trying to read word documents at the moment by loading them in as plain text into a hidden richtextbox, then I take that text and use regex to "attempt" to extract the words from the unicode. I have so far been somewhat unsuccesfull as their is always garble unicode left. Does anyone know any regex expressions that remove unicode from a string? If you are wondering why im doing it in this fasion, it is the fastest way I have found to read word docs. I have tried to use WordApplication classes, and OfficeReaders, and they are either slow or error prone. I thought about using Iwordbreaker, but that skyrocketed over my head and dont know where to begin their, plus you need a license. My c# app is working great now, and its almost ready to go live, its just these bloody word documents. If anyone can help I would greatly appreciate it. Thanks Jeremy
-
This is stressing me out. I am trying to read word documents at the moment by loading them in as plain text into a hidden richtextbox, then I take that text and use regex to "attempt" to extract the words from the unicode. I have so far been somewhat unsuccesfull as their is always garble unicode left. Does anyone know any regex expressions that remove unicode from a string? If you are wondering why im doing it in this fasion, it is the fastest way I have found to read word docs. I have tried to use WordApplication classes, and OfficeReaders, and they are either slow or error prone. I thought about using Iwordbreaker, but that skyrocketed over my head and dont know where to begin their, plus you need a license. My c# app is working great now, and its almost ready to go live, its just these bloody word documents. If anyone can help I would greatly appreciate it. Thanks Jeremy
I don't understand, a Word doc is NOT just rich text. How is this working ?
Christian Graus - Microsoft MVP - C++ "I am working on a project that will convert a FORTRAN code to corresponding C++ code.I am not aware of FORTRAN syntax" ( spotted in the C++/CLI forum )
-
I don't understand, a Word doc is NOT just rich text. How is this working ?
Christian Graus - Microsoft MVP - C++ "I am working on a project that will convert a FORTRAN code to corresponding C++ code.I am not aware of FORTRAN syntax" ( spotted in the C++/CLI forum )
I use the load method that comes from a richtextbox variable, I point it at a word doc, and I choose the plain text option, it loads the text into the richtextbox with a combination of junk and all the actual words in the word doc. I then transfer the text from their in to a string. Its just the matter of getting rid of the junk. Usually all the text in the word doc is tucked away neatly in the middle of the junk, but you get some \\'07 and \par's in their sometimes.