Tiff Image -> Searchable PDF [modified]
-
We have got several scanned documents(Tiff) in CDs, around 10GB of data. We are looking for a way to make them searchable PDF for easy accessibility. We are using PDF creator to convert Tiff to PDF also we have built custom module on top of OCR engine which saves text from page by page in Xml. So, How can I create Searchable PDF out of PDF Image and OCRed text in xml?? We are not looking for high end server side solutions, it would be nice if we can develop our own app or use opensource software. Regards, MaulikCE -- modified at 10:13 Thursday 25th May, 2006
-
We have got several scanned documents(Tiff) in CDs, around 10GB of data. We are looking for a way to make them searchable PDF for easy accessibility. We are using PDF creator to convert Tiff to PDF also we have built custom module on top of OCR engine which saves text from page by page in Xml. So, How can I create Searchable PDF out of PDF Image and OCRed text in xml?? We are not looking for high end server side solutions, it would be nice if we can develop our own app or use opensource software. Regards, MaulikCE -- modified at 10:13 Thursday 25th May, 2006
Hi Maulik, You can use some OCR like Terrasact from Google or some Paid ones and convert the tiff to text. store the text in an Lucene index using lucence.net and then create a search interface for the lucene index. let me know if you need any paid consultation for this solution. My company can provide you the same. We are experts in Enterprise Search. Thanks, Sumit Globussoft