RegEx: Get Values from HTML Attribute Tags

User 10433150

I need to get the values from below following html snippet. So far I came up with this regex which helps me trim it down to the values I needed, but to automate this I need to join 2 regex statements to get the result "18" which is where I am stuck at. Or Please suggest a better method for me get the values. I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command. First Regex Statement (?s)(?<=attribute bathroom).+?(?=\/span) Result:

" title="Bathrooms" style=" ">
18<

Second Regex Statement (?s)(?<=).+?(?=<) Result: 18 HTML Snippet

            *                       xxx1
                
            *                       Factory
                
            *                       18
                
            *                       18
                
            *                           5,010m**2**
                        
                    |
                        9,270m**2**

Lost User

Please do not repost the same question. You can easily edit your own questions if you need to add more details.

Richard Deeming

Don't try to use Regex to parse an HTML document. You'll end up with an extremely fragile solution, where even the slightest change to the source document will cause it to break. Use a proper HTML parsing library instead - for example, AngleSharp[^].

"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

User 10433150

In my question I have mentioned "

I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command.

" I cannot use any solution except using regex in this tool. When 2 of my regex statements are bringing the result I wanted then I am pretty sure using regex can get the solution needed but due to lack of knowledge I am stuck here. Parsing HTML with regex is not best practice but I am willing to take the risk. Suggest a solution please.

Richard Deeming

I'd suggest getting a better scraping tool, or writing your own. :) Given the sample input, this regex should match:

(?<=class="attribute bathroom"[^>]*>\s*<span[^>]*>)[^<]+

Demo[^]

"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

Dave Kreskowiak

He was saying instead of using WebHarvery, use AngleSharp instead.

Asking questions is a skill CodeProject Forum Guidelines Google: C# How to debug code Seriously, go read these articles.
Dave Kreskowiak