Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Convert PDF to text or html

Convert PDF to text or html

Scheduled Pinned Locked Moved C#
csharphtmlcareer
3 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    Johan Martensson
    wrote on last edited by
    #1

    Hi, I need to parse some text that's currently is in PDF-format so I'm thinking that converting it to text or html would be a good place to start. There are a lot of PDF-component out there for C#, has anyone of you tried anyone and can tell me which once are doing a good job. Thanks.

    http://johanmartensson.se - Home of MPEG4Watcher

    V 1 Reply Last reply
    0
    • J Johan Martensson

      Hi, I need to parse some text that's currently is in PDF-format so I'm thinking that converting it to text or html would be a good place to start. There are a lot of PDF-component out there for C#, has anyone of you tried anyone and can tell me which once are doing a good job. Thanks.

      http://johanmartensson.se - Home of MPEG4Watcher

      V Offline
      V Offline
      vaghelabhavesh
      wrote on last edited by
      #2

      I have used PDFBox. But its in java. So you need to use iKvm.net to use PDFBox.dll in .Net. It has a class called PDFTextStripper which has a method called getText. But this will only work on Text PDF. It won't work on Image PDF. But if it not that much taks you can use iTextsharp. One more thing So far I have learned that iTextsharp is great for creating pdf and PDFBox is great for parsing/reading pdf. :-)

      Be careful, there is no Undo Button(Ctrl+Z) in life.

      J 1 Reply Last reply
      0
      • V vaghelabhavesh

        I have used PDFBox. But its in java. So you need to use iKvm.net to use PDFBox.dll in .Net. It has a class called PDFTextStripper which has a method called getText. But this will only work on Text PDF. It won't work on Image PDF. But if it not that much taks you can use iTextsharp. One more thing So far I have learned that iTextsharp is great for creating pdf and PDFBox is great for parsing/reading pdf. :-)

        Be careful, there is no Undo Button(Ctrl+Z) in life.

        J Offline
        J Offline
        Johan Martensson
        wrote on last edited by
        #3

        I tried iTextsharp but it really did a terrible job with my PDF so now I'm playing around with PDFBox and it seems to be doing a much better job.

        http://johanmartensson.se - Home of MPEG4Watcher

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups