Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Regular Expressions
  4. RegEx: Get Values from HTML Attribute Tags

RegEx: Get Values from HTML Attribute Tags

Scheduled Pinned Locked Moved Regular Expressions
csharphtmlregexhelpquestion
6 Posts 4 Posters 12 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • U Offline
    U Offline
    User 10433150
    wrote on last edited by
    #1

    I need to get the values from below following html snippet. So far I came up with this regex which helps me trim it down to the values I needed, but to automate this I need to join 2 regex statements to get the result "18" which is where I am stuck at. Or Please suggest a better method for me get the values. I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command. First Regex Statement (?s)(?<=attribute bathroom).+?(?=\/span) Result:

    " title="Bathrooms" style=" ">
    18<

    Second Regex Statement (?s)(?<=).+?(?=<) Result: 18 HTML Snippet

                *                       xxx1
                    
                *                       Factory
                    
                *                       18
                    
                *                       18
                    
                *                           5,010m**2**
                            
                        |
                            9,270m**2**
    
    L Richard DeemingR 2 Replies Last reply
    0
    • U User 10433150

      I need to get the values from below following html snippet. So far I came up with this regex which helps me trim it down to the values I needed, but to automate this I need to join 2 regex statements to get the result "18" which is where I am stuck at. Or Please suggest a better method for me get the values. I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command. First Regex Statement (?s)(?<=attribute bathroom).+?(?=\/span) Result:

      " title="Bathrooms" style=" ">
      18<

      Second Regex Statement (?s)(?<=).+?(?=<) Result: 18 HTML Snippet

                  *                       xxx1
                      
                  *                       Factory
                      
                  *                       18
                      
                  *                       18
                      
                  *                           5,010m**2**
                              
                          |
                              9,270m**2**
      
      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      Please do not repost the same question. You can easily edit your own questions if you need to add more details.

      1 Reply Last reply
      0
      • U User 10433150

        I need to get the values from below following html snippet. So far I came up with this regex which helps me trim it down to the values I needed, but to automate this I need to join 2 regex statements to get the result "18" which is where I am stuck at. Or Please suggest a better method for me get the values. I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command. First Regex Statement (?s)(?<=attribute bathroom).+?(?=\/span) Result:

        " title="Bathrooms" style=" ">
        18<

        Second Regex Statement (?s)(?<=).+?(?=<) Result: 18 HTML Snippet

                    *                       xxx1
                        
                    *                       Factory
                        
                    *                       18
                        
                    *                       18
                        
                    *                           5,010m**2**
                                
                            |
                                9,270m**2**
        
        Richard DeemingR Offline
        Richard DeemingR Offline
        Richard Deeming
        wrote on last edited by
        #3

        Don't try to use Regex to parse an HTML document. You'll end up with an extremely fragile solution, where even the slightest change to the source document will cause it to break. Use a proper HTML parsing library instead - for example, AngleSharp[^].


        "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

        "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

        U 1 Reply Last reply
        0
        • Richard DeemingR Richard Deeming

          Don't try to use Regex to parse an HTML document. You'll end up with an extremely fragile solution, where even the slightest change to the source document will cause it to break. Use a proper HTML parsing library instead - for example, AngleSharp[^].


          "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

          U Offline
          U Offline
          User 10433150
          wrote on last edited by
          #4

          In my question I have mentioned "

          I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command.

          " I cannot use any solution except using regex in this tool. When 2 of my regex statements are bringing the result I wanted then I am pretty sure using regex can get the solution needed but due to lack of knowledge I am stuck here. Parsing HTML with regex is not best practice but I am willing to take the risk. Suggest a solution please.

          D Richard DeemingR 2 Replies Last reply
          0
          • U User 10433150

            In my question I have mentioned "

            I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command.

            " I cannot use any solution except using regex in this tool. When 2 of my regex statements are bringing the result I wanted then I am pretty sure using regex can get the solution needed but due to lack of knowledge I am stuck here. Parsing HTML with regex is not best practice but I am willing to take the risk. Suggest a solution please.

            Richard DeemingR Offline
            Richard DeemingR Offline
            Richard Deeming
            wrote on last edited by
            #5

            I'd suggest getting a better scraping tool, or writing your own. :) Given the sample input, this regex should match:

            (?<=class="attribute bathroom"[^>]*>\s*<span[^>]*>)[^<]+

            Demo[^]


            "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

            "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

            1 Reply Last reply
            0
            • U User 10433150

              In my question I have mentioned "

              I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command.

              " I cannot use any solution except using regex in this tool. When 2 of my regex statements are bringing the result I wanted then I am pretty sure using regex can get the solution needed but due to lack of knowledge I am stuck here. Parsing HTML with regex is not best practice but I am willing to take the risk. Suggest a solution please.

              D Offline
              D Offline
              Dave Kreskowiak
              wrote on last edited by
              #6

              He was saying instead of using WebHarvery, use AngleSharp instead.

              Asking questions is a skill CodeProject Forum Guidelines Google: C# How to debug code Seriously, go read these articles.
              Dave Kreskowiak

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups