Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Visual Basic
  4. Etracting certain text from PDF document into a database

Etracting certain text from PDF document into a database

Scheduled Pinned Locked Moved Visual Basic
tutorialdatabasehelp
4 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K Offline
    K Offline
    kshincsk
    wrote on last edited by
    #1

    Hello. I am currently working on a project to extract data from a PDF document into structured database tables. I have solved the problem of extracti9g the text from a pdf into a text box in vb6 form by using a readymade dll component that I referenced in my program. Now my problem is, how to extract only certain pieces of data from the text based on some keywords that appear in the text. for example in the pdf document: Existing Chemical Substance ID: 50–00–0 CAS No. 50–00–0 EINECS Name formaldehyde EINECS No. 200–001–8 Molecular Formula CH2O I need to extract "50-00-0" that appears after the keyword "CAS No.", then the text "formaldehyde" that appears after the keyword "EINECS Name" and "200-001-8" that comes after "EINECS No.". I have database table which contain these keywords as field names. What I want is the table to look like this: Sno CAS No. EINECS Name EINECS No. 1 50–00–0 formaldehyde 200–001–8 I would really appreciate it if someone could point me towards the string manipulation functions that I would need to use i order to do this. Also How to get count of a keyword if it appears multiple times between two other keywords. Thanks and Regards, Kumar

    D 1 Reply Last reply
    0
    • K kshincsk

      Hello. I am currently working on a project to extract data from a PDF document into structured database tables. I have solved the problem of extracti9g the text from a pdf into a text box in vb6 form by using a readymade dll component that I referenced in my program. Now my problem is, how to extract only certain pieces of data from the text based on some keywords that appear in the text. for example in the pdf document: Existing Chemical Substance ID: 50–00–0 CAS No. 50–00–0 EINECS Name formaldehyde EINECS No. 200–001–8 Molecular Formula CH2O I need to extract "50-00-0" that appears after the keyword "CAS No.", then the text "formaldehyde" that appears after the keyword "EINECS Name" and "200-001-8" that comes after "EINECS No.". I have database table which contain these keywords as field names. What I want is the table to look like this: Sno CAS No. EINECS Name EINECS No. 1 50–00–0 formaldehyde 200–001–8 I would really appreciate it if someone could point me towards the string manipulation functions that I would need to use i order to do this. Also How to get count of a keyword if it appears multiple times between two other keywords. Thanks and Regards, Kumar

      D Offline
      D Offline
      Dave Kreskowiak
      wrote on last edited by
      #2

      VB6 String Manipulation[^]

      A guide to posting questions on CodeProject[^]
      Dave Kreskowiak Microsoft MVP Visual Developer - Visual Basic
           2006, 2007, 2008

      K 1 Reply Last reply
      0
      • D Dave Kreskowiak

        VB6 String Manipulation[^]

        A guide to posting questions on CodeProject[^]
        Dave Kreskowiak Microsoft MVP Visual Developer - Visual Basic
             2006, 2007, 2008

        K Offline
        K Offline
        kshincsk
        wrote on last edited by
        #3

        Dear Dave, Thanks a lot for the suggestion. I found just the perfect piece of code. Public Function Extract(ByVal TextIN As String, Optional StartTag As String = " ", Optional ByVal EndTag As String = " ", Optional ByVal CheckCase As Boolean) As String On Error GoTo LocalError ' Extracts Text from string using start and end "tags" 'NB: If EndTag is ommitted the entire string from: ' StartTag to EndOfString is returned... Dim lArray As Variant Extract = "" lArray = Split(TextIN, StartTag) If IsArray(lArray) Then Extract = lArray(1) lArray = Split(Extract, EndTag) If IsArray(lArray) Then Extract = lArray(0) Else Extract = "" End If End If Exit Function LocalError: Extract = "" End Function It works beautifully. Now, All I need to do is put all the keywords into a database and do a recursive search using those fields dynamically at runtime. Thanks a million for your help. Best Regards, Kumar

        D 1 Reply Last reply
        0
        • K kshincsk

          Dear Dave, Thanks a lot for the suggestion. I found just the perfect piece of code. Public Function Extract(ByVal TextIN As String, Optional StartTag As String = " ", Optional ByVal EndTag As String = " ", Optional ByVal CheckCase As Boolean) As String On Error GoTo LocalError ' Extracts Text from string using start and end "tags" 'NB: If EndTag is ommitted the entire string from: ' StartTag to EndOfString is returned... Dim lArray As Variant Extract = "" lArray = Split(TextIN, StartTag) If IsArray(lArray) Then Extract = lArray(1) lArray = Split(Extract, EndTag) If IsArray(lArray) Then Extract = lArray(0) Else Extract = "" End If End If Exit Function LocalError: Extract = "" End Function It works beautifully. Now, All I need to do is put all the keywords into a database and do a recursive search using those fields dynamically at runtime. Thanks a million for your help. Best Regards, Kumar

          D Offline
          D Offline
          Dave Kreskowiak
          wrote on last edited by
          #4

          kshincsk wrote:

          I found just the perfect piece of code.

          Great. Yet another Copy'N'Paste programmer...

          A guide to posting questions on CodeProject[^]
          Dave Kreskowiak Microsoft MVP Visual Developer - Visual Basic
               2006, 2007, 2008

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups