Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
CODE PROJECT For Those Who Code
  • Home
  • Articles
  • FAQ
Community
  1. Home
  2. General Programming
  3. C#
  4. How to read a file from any kind of documents and display its contents?

How to read a file from any kind of documents and display its contents?

Scheduled Pinned Locked Moved C#
questiondata-structureshelptutorialworkspace
19 Posts 6 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Offline
    C Offline
    CoderForEver
    wrote on last edited by
    #1

    Hi my friends ........ I want to do term weighting by an approach called Composite measure(Tf*idf) ... it has its own formula .... But what I want to do is to read files like(Word 2003,2007 pdf files etc.... ) then to take each word as an array .... I can read all files line by line by the following code try { // Create an instance of StreamReader to read from a file. // The using statement also closes the StreamReader. using (StreamReader sr = new StreamReader(textBox1.Text,Encoding.ASCII,true))// Where textbox1.text is a path { String line; // Read and display lines from the file until the end of // the file is reached. while ((line = sr.ReadLine()) != null) { richTextBox1.Text =richTextBox1.Text +Environment.NewLine+ line; //MessageBox.Show(line); } } } and to check the content of the file i tried to display it on a Richtextbox1 .... but it displays encrypted file ..... What I want to know is 1. How can I put each words(separated by Space and newline) in to array ... just to know each word (here displaying the content is not necessary) 2. How can I display the content of each file on Richtextbox1 ... but not in an encrypted form .... Thanks for your help

    OriginalGriffO L P 3 Replies Last reply
    0
    • C CoderForEver

      Hi my friends ........ I want to do term weighting by an approach called Composite measure(Tf*idf) ... it has its own formula .... But what I want to do is to read files like(Word 2003,2007 pdf files etc.... ) then to take each word as an array .... I can read all files line by line by the following code try { // Create an instance of StreamReader to read from a file. // The using statement also closes the StreamReader. using (StreamReader sr = new StreamReader(textBox1.Text,Encoding.ASCII,true))// Where textbox1.text is a path { String line; // Read and display lines from the file until the end of // the file is reached. while ((line = sr.ReadLine()) != null) { richTextBox1.Text =richTextBox1.Text +Environment.NewLine+ line; //MessageBox.Show(line); } } } and to check the content of the file i tried to display it on a Richtextbox1 .... but it displays encrypted file ..... What I want to know is 1. How can I put each words(separated by Space and newline) in to array ... just to know each word (here displaying the content is not necessary) 2. How can I display the content of each file on Richtextbox1 ... but not in an encrypted form .... Thanks for your help

      OriginalGriffO Offline
      OriginalGriffO Offline
      OriginalGriff
      wrote on last edited by
      #2

      The first thing you need to know is: Word and PDF files (as well as many others) are not necesarily line based text documents. Instead they are frequently binary files - which explains why they look encrypted. The second thing you need to know is: Word and PDF files can be encrypted. So that may be why they lookm encrypted. Google for "Word File Format" and "PDF File Format" - this will provide a starting point for you to gather the information you are going to need.

      All those who believe in psycho kinesis, raise my hand.

      "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
      "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

      C 2 Replies Last reply
      0
      • OriginalGriffO OriginalGriff

        The first thing you need to know is: Word and PDF files (as well as many others) are not necesarily line based text documents. Instead they are frequently binary files - which explains why they look encrypted. The second thing you need to know is: Word and PDF files can be encrypted. So that may be why they lookm encrypted. Google for "Word File Format" and "PDF File Format" - this will provide a starting point for you to gather the information you are going to need.

        All those who believe in psycho kinesis, raise my hand.

        C Offline
        C Offline
        CoderForEver
        wrote on last edited by
        #3

        So how can i decrypt them? .... or can you tell me how to work on them please? Just u can see my first question. Thank you

        realJSOPR OriginalGriffO 2 Replies Last reply
        0
        • OriginalGriffO OriginalGriff

          The first thing you need to know is: Word and PDF files (as well as many others) are not necesarily line based text documents. Instead they are frequently binary files - which explains why they look encrypted. The second thing you need to know is: Word and PDF files can be encrypted. So that may be why they lookm encrypted. Google for "Word File Format" and "PDF File Format" - this will provide a starting point for you to gather the information you are going to need.

          All those who believe in psycho kinesis, raise my hand.

          C Offline
          C Offline
          CoderForEver
          wrote on last edited by
          #4

          Or forget about the Pdf and Word ... but how do i gather the words in an array ... in just a text document ... Thank you

          L 1 Reply Last reply
          0
          • C CoderForEver

            So how can i decrypt them? .... or can you tell me how to work on them please? Just u can see my first question. Thank you

            realJSOPR Offline
            realJSOPR Offline
            realJSOP
            wrote on last edited by
            #5

            Ummm...[^]

            .45 ACP - because shooting twice is just silly
            -----
            "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
            -----
            "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

            C OriginalGriffO 2 Replies Last reply
            0
            • realJSOPR realJSOP

              Ummm...[^]

              .45 ACP - because shooting twice is just silly
              -----
              "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
              -----
              "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

              C Offline
              C Offline
              CoderForEver
              wrote on last edited by
              #6

              got noting ... just http://75.11.0.157/homenet/stupid.htm on z title bar .... What that supposed to mean?

              realJSOPR 1 Reply Last reply
              0
              • C CoderForEver

                got noting ... just http://75.11.0.157/homenet/stupid.htm on z title bar .... What that supposed to mean?

                realJSOPR Offline
                realJSOPR Offline
                realJSOP
                wrote on last edited by
                #7

                Okay, how about this one[^]?

                .45 ACP - because shooting twice is just silly
                -----
                "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
                -----
                "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

                L 1 Reply Last reply
                0
                • realJSOPR realJSOP

                  Okay, how about this one[^]?

                  .45 ACP - because shooting twice is just silly
                  -----
                  "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
                  -----
                  "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

                  L Offline
                  L Offline
                  Lost User
                  wrote on last edited by
                  #8

                  I couldn't open them either, using Chrome. It works in FF though :)

                  I are Troll :suss:

                  realJSOPR 1 Reply Last reply
                  0
                  • C CoderForEver

                    Hi my friends ........ I want to do term weighting by an approach called Composite measure(Tf*idf) ... it has its own formula .... But what I want to do is to read files like(Word 2003,2007 pdf files etc.... ) then to take each word as an array .... I can read all files line by line by the following code try { // Create an instance of StreamReader to read from a file. // The using statement also closes the StreamReader. using (StreamReader sr = new StreamReader(textBox1.Text,Encoding.ASCII,true))// Where textbox1.text is a path { String line; // Read and display lines from the file until the end of // the file is reached. while ((line = sr.ReadLine()) != null) { richTextBox1.Text =richTextBox1.Text +Environment.NewLine+ line; //MessageBox.Show(line); } } } and to check the content of the file i tried to display it on a Richtextbox1 .... but it displays encrypted file ..... What I want to know is 1. How can I put each words(separated by Space and newline) in to array ... just to know each word (here displaying the content is not necessary) 2. How can I display the content of each file on Richtextbox1 ... but not in an encrypted form .... Thanks for your help

                    L Offline
                    L Offline
                    Lost User
                    wrote on last edited by
                    #9

                    Each document has a specific structure. Word-documents and PDF files can't be "read", because the computer doesn't know how to read them. Those documents contain extra information like "this part text in bold formatting", and "this in red". All that information is stored in between the words that you see when you open the thing in Word.

                    CoderForEver wrote:

                    1. How can I put each words(separated by Space and newline) in to array ... just to know each word (here displaying the content is not necessary)

                    You can't until you have something to decode the file. You can save Word-files as RTF. Take a look at the result with a text-editor, and you'll see where the extra codes are located. You can also save the file as HTML. Again, a coded form, just like the binary representation.

                    I are Troll :suss:

                    C 1 Reply Last reply
                    0
                    • C CoderForEver

                      Or forget about the Pdf and Word ... but how do i gather the words in an array ... in just a text document ... Thank you

                      L Offline
                      L Offline
                      Lost User
                      wrote on last edited by
                      #10

                      Use string.split with a blank space as your separator to populate an array of just each individual word. check out the documentation[^] for more basic string manipulation.

                      Check out the CodeProject forum Guidelines[^] The original soapbox 1.0 is back![^]

                      1 Reply Last reply
                      0
                      • C CoderForEver

                        Hi my friends ........ I want to do term weighting by an approach called Composite measure(Tf*idf) ... it has its own formula .... But what I want to do is to read files like(Word 2003,2007 pdf files etc.... ) then to take each word as an array .... I can read all files line by line by the following code try { // Create an instance of StreamReader to read from a file. // The using statement also closes the StreamReader. using (StreamReader sr = new StreamReader(textBox1.Text,Encoding.ASCII,true))// Where textbox1.text is a path { String line; // Read and display lines from the file until the end of // the file is reached. while ((line = sr.ReadLine()) != null) { richTextBox1.Text =richTextBox1.Text +Environment.NewLine+ line; //MessageBox.Show(line); } } } and to check the content of the file i tried to display it on a Richtextbox1 .... but it displays encrypted file ..... What I want to know is 1. How can I put each words(separated by Space and newline) in to array ... just to know each word (here displaying the content is not necessary) 2. How can I display the content of each file on Richtextbox1 ... but not in an encrypted form .... Thanks for your help

                        P Offline
                        P Offline
                        Pete OHanlon
                        wrote on last edited by
                        #11

                        One way to do this would be to use the Index Server IFilter approach and read the words this way, outlined here[^].

                        "WPF has many lovers. It's a veritable porn star!" - Josh Smith

                        As Braveheart once said, "You can take our freedom but you'll never take our Hobnobs!" - Martin Hughes.

                        My blog | My articles | MoXAML PowerToys | Onyx

                        1 Reply Last reply
                        0
                        • realJSOPR realJSOP

                          Ummm...[^]

                          .45 ACP - because shooting twice is just silly
                          -----
                          "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
                          -----
                          "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

                          OriginalGriffO Offline
                          OriginalGriffO Offline
                          OriginalGriff
                          wrote on last edited by
                          #12

                          Gets my five!

                          All those who believe in psycho kinesis, raise my hand.

                          "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
                          "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

                          1 Reply Last reply
                          0
                          • L Lost User

                            Each document has a specific structure. Word-documents and PDF files can't be "read", because the computer doesn't know how to read them. Those documents contain extra information like "this part text in bold formatting", and "this in red". All that information is stored in between the words that you see when you open the thing in Word.

                            CoderForEver wrote:

                            1. How can I put each words(separated by Space and newline) in to array ... just to know each word (here displaying the content is not necessary)

                            You can't until you have something to decode the file. You can save Word-files as RTF. Take a look at the result with a text-editor, and you'll see where the extra codes are located. You can also save the file as HTML. Again, a coded form, just like the binary representation.

                            I are Troll :suss:

                            C Offline
                            C Offline
                            CoderForEver
                            wrote on last edited by
                            #13

                            Eddy Vluggen wrote:

                            You can save Word-files as RTF

                            So can I read this RTF file .... then display it on Richtext box ? ... or what is left? Thnk you for your help

                            M L 2 Replies Last reply
                            0
                            • C CoderForEver

                              So how can i decrypt them? .... or can you tell me how to work on them please? Just u can see my first question. Thank you

                              OriginalGriffO Offline
                              OriginalGriffO Offline
                              OriginalGriff
                              wrote on last edited by
                              #14

                              Decrypt: First, find out the password... Because they aren't stored as straight text, you can't just read them and identify the words. The files contain heaps of other stuff: font, size, colour, location, lines, boxes, italics, bold, pictures, spreadsheets, etc. etc. etc. If all you are interested in is the text of the document and doing some textual analysis, then the best thing you can do is to throw away as much of the formatting as possible, and save the file as a straight .TXT file from Word and/or PDF. You can then read the whole thing in, and use string.Split (with space and reasonable puncuation) to break it into words.

                              All those who believe in psycho kinesis, raise my hand.

                              "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
                              "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

                              1 Reply Last reply
                              0
                              • L Lost User

                                I couldn't open them either, using Chrome. It works in FF though :)

                                I are Troll :suss:

                                realJSOPR Offline
                                realJSOPR Offline
                                realJSOP
                                wrote on last edited by
                                #15

                                It's a picture and a caption - nothing special, and certainly nothing exotic. If Chrome can't open something that simple, I'd certainly entertain the idea of using one of the alternative browsers for my regular web-browsing pleasures...

                                .45 ACP - because shooting twice is just silly
                                -----
                                "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
                                -----
                                "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

                                L 1 Reply Last reply
                                0
                                • C CoderForEver

                                  Eddy Vluggen wrote:

                                  You can save Word-files as RTF

                                  So can I read this RTF file .... then display it on Richtext box ? ... or what is left? Thnk you for your help

                                  M Offline
                                  M Offline
                                  Maximilien
                                  wrote on last edited by
                                  #16

                                  if you use a CRichEditCtrl, you can directly load RTF files. Sorry 'bout that.... Well, we're in the C# forum, so use the equivalent C# control, I'm certain you can so the same thing. see http://msdn.microsoft.com/en-us/library/system.windows.forms.richtextbox.loadfile.aspx[^] (is that correct, I'm no C# expert) Max.

                                  This signature was proudly tested on animals.

                                  1 Reply Last reply
                                  0
                                  • C CoderForEver

                                    Eddy Vluggen wrote:

                                    You can save Word-files as RTF

                                    So can I read this RTF file .... then display it on Richtext box ? ... or what is left? Thnk you for your help

                                    L Offline
                                    L Offline
                                    Lost User
                                    wrote on last edited by
                                    #17

                                    CoderForEver wrote:

                                    So can I read this RTF file .... then display it on Richtext box ?

                                    Yup. The same method can be used to read plain text files. If you want to read another format, then you'll have to provide methods to read those formats. Reading Word-files directly is a fair bit more complex.

                                    I are Troll :suss:

                                    1 Reply Last reply
                                    0
                                    • realJSOPR realJSOP

                                      It's a picture and a caption - nothing special, and certainly nothing exotic. If Chrome can't open something that simple, I'd certainly entertain the idea of using one of the alternative browsers for my regular web-browsing pleasures...

                                      .45 ACP - because shooting twice is just silly
                                      -----
                                      "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
                                      -----
                                      "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

                                      L Offline
                                      L Offline
                                      Lost User
                                      wrote on last edited by
                                      #18

                                      John Simmons / outlaw programmer wrote:

                                      It's a picture and a caption - nothing special, and certainly nothing exotic.

                                      content="IE=EmulateIE7"

                                      ..and some css to position all that :-D

                                      John Simmons / outlaw programmer wrote:

                                      If Chrome can't open something that simple, I'd certainly entertain the idea of using one of the alternative browsers for my regular web-browsing pleasures...

                                      You could also entertain the idea of installing multiple browsers. It's not a marriage, and I'm not going to commit to a single system :)

                                      I are Troll :suss:

                                      realJSOPR 1 Reply Last reply
                                      0
                                      • L Lost User

                                        John Simmons / outlaw programmer wrote:

                                        It's a picture and a caption - nothing special, and certainly nothing exotic.

                                        content="IE=EmulateIE7"

                                        ..and some css to position all that :-D

                                        John Simmons / outlaw programmer wrote:

                                        If Chrome can't open something that simple, I'd certainly entertain the idea of using one of the alternative browsers for my regular web-browsing pleasures...

                                        You could also entertain the idea of installing multiple browsers. It's not a marriage, and I'm not going to commit to a single system :)

                                        I are Troll :suss:

                                        realJSOPR Offline
                                        realJSOPR Offline
                                        realJSOP
                                        wrote on last edited by
                                        #19

                                        Well, IE 6 and 8 show it just fine without any special compatibility tags.

                                        .45 ACP - because shooting twice is just silly
                                        -----
                                        "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
                                        -----
                                        "The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

                                        1 Reply Last reply
                                        0
                                        Reply
                                        • Reply as topic
                                        Log in to reply
                                        • Oldest to Newest
                                        • Newest to Oldest
                                        • Most Votes


                                        • Login

                                        • Don't have an account? Register

                                        • Login or register to search.
                                        • First post
                                          Last post
                                        0
                                        • Categories
                                        • Recent
                                        • Tags
                                        • Popular
                                        • World
                                        • Users
                                        • Groups