Etracting certain text from PDF document into a database
-
Hello. I am currently working on a project to extract data from a PDF document into structured database tables. I have solved the problem of extracti9g the text from a pdf into a text box in vb6 form by using a readymade dll component that I referenced in my program. Now my problem is, how to extract only certain pieces of data from the text based on some keywords that appear in the text. for example in the pdf document: Existing Chemical Substance ID: 50–00–0 CAS No. 50–00–0 EINECS Name formaldehyde EINECS No. 200–001–8 Molecular Formula CH2O I need to extract "50-00-0" that appears after the keyword "CAS No.", then the text "formaldehyde" that appears after the keyword "EINECS Name" and "200-001-8" that comes after "EINECS No.". I have database table which contain these keywords as field names. What I want is the table to look like this: Sno CAS No. EINECS Name EINECS No. 1 50–00–0 formaldehyde 200–001–8 I would really appreciate it if someone could point me towards the string manipulation functions that I would need to use i order to do this. Also How to get count of a keyword if it appears multiple times between two other keywords. Thanks and Regards, Kumar
-
Hello. I am currently working on a project to extract data from a PDF document into structured database tables. I have solved the problem of extracti9g the text from a pdf into a text box in vb6 form by using a readymade dll component that I referenced in my program. Now my problem is, how to extract only certain pieces of data from the text based on some keywords that appear in the text. for example in the pdf document: Existing Chemical Substance ID: 50–00–0 CAS No. 50–00–0 EINECS Name formaldehyde EINECS No. 200–001–8 Molecular Formula CH2O I need to extract "50-00-0" that appears after the keyword "CAS No.", then the text "formaldehyde" that appears after the keyword "EINECS Name" and "200-001-8" that comes after "EINECS No.". I have database table which contain these keywords as field names. What I want is the table to look like this: Sno CAS No. EINECS Name EINECS No. 1 50–00–0 formaldehyde 200–001–8 I would really appreciate it if someone could point me towards the string manipulation functions that I would need to use i order to do this. Also How to get count of a keyword if it appears multiple times between two other keywords. Thanks and Regards, Kumar
A guide to posting questions on CodeProject[^]
Dave Kreskowiak Microsoft MVP Visual Developer - Visual Basic
2006, 2007, 2008 -
A guide to posting questions on CodeProject[^]
Dave Kreskowiak Microsoft MVP Visual Developer - Visual Basic
2006, 2007, 2008Dear Dave, Thanks a lot for the suggestion. I found just the perfect piece of code.
Public Function Extract(ByVal TextIN As String, Optional StartTag As String = " ", Optional ByVal EndTag As String = " ", Optional ByVal CheckCase As Boolean) As String On Error GoTo LocalError ' Extracts Text from string using start and end "tags" 'NB: If EndTag is ommitted the entire string from: ' StartTag to EndOfString is returned... Dim lArray As Variant Extract = "" lArray = Split(TextIN, StartTag) If IsArray(lArray) Then Extract = lArray(1) lArray = Split(Extract, EndTag) If IsArray(lArray) Then Extract = lArray(0) Else Extract = "" End If End If Exit Function LocalError: Extract = "" End Function
It works beautifully. Now, All I need to do is put all the keywords into a database and do a recursive search using those fields dynamically at runtime. Thanks a million for your help. Best Regards, Kumar -
Dear Dave, Thanks a lot for the suggestion. I found just the perfect piece of code.
Public Function Extract(ByVal TextIN As String, Optional StartTag As String = " ", Optional ByVal EndTag As String = " ", Optional ByVal CheckCase As Boolean) As String On Error GoTo LocalError ' Extracts Text from string using start and end "tags" 'NB: If EndTag is ommitted the entire string from: ' StartTag to EndOfString is returned... Dim lArray As Variant Extract = "" lArray = Split(TextIN, StartTag) If IsArray(lArray) Then Extract = lArray(1) lArray = Split(Extract, EndTag) If IsArray(lArray) Then Extract = lArray(0) Else Extract = "" End If End If Exit Function LocalError: Extract = "" End Function
It works beautifully. Now, All I need to do is put all the keywords into a database and do a recursive search using those fields dynamically at runtime. Thanks a million for your help. Best Regards, Kumarkshincsk wrote:
I found just the perfect piece of code.
Great. Yet another Copy'N'Paste programmer...
A guide to posting questions on CodeProject[^]
Dave Kreskowiak Microsoft MVP Visual Developer - Visual Basic
2006, 2007, 2008