Convert PDF to text or html
-
Hi, I need to parse some text that's currently is in PDF-format so I'm thinking that converting it to text or html would be a good place to start. There are a lot of PDF-component out there for C#, has anyone of you tried anyone and can tell me which once are doing a good job. Thanks.
http://johanmartensson.se - Home of MPEG4Watcher
-
Hi, I need to parse some text that's currently is in PDF-format so I'm thinking that converting it to text or html would be a good place to start. There are a lot of PDF-component out there for C#, has anyone of you tried anyone and can tell me which once are doing a good job. Thanks.
http://johanmartensson.se - Home of MPEG4Watcher
I have used PDFBox. But its in java. So you need to use iKvm.net to use PDFBox.dll in .Net. It has a class called PDFTextStripper which has a method called getText. But this will only work on Text PDF. It won't work on Image PDF. But if it not that much taks you can use iTextsharp. One more thing So far I have learned that iTextsharp is great for creating pdf and PDFBox is great for parsing/reading pdf. :-)
Be careful, there is no Undo Button(Ctrl+Z) in life.
-
I have used PDFBox. But its in java. So you need to use iKvm.net to use PDFBox.dll in .Net. It has a class called PDFTextStripper which has a method called getText. But this will only work on Text PDF. It won't work on Image PDF. But if it not that much taks you can use iTextsharp. One more thing So far I have learned that iTextsharp is great for creating pdf and PDFBox is great for parsing/reading pdf. :-)
Be careful, there is no Undo Button(Ctrl+Z) in life.
I tried iTextsharp but it really did a terrible job with my PDF so now I'm playing around with PDFBox and it seems to be doing a much better job.
http://johanmartensson.se - Home of MPEG4Watcher