Regex Problem,Extracting text from Table
-
Hello Everyone, I am trying to extract the contents of the table from a static HTML file. Program is almost giving the expected output. Program :-
class Program
{
static void Main()
{
StreamReader str = new StreamReader("C:\\AllRoles.html");
string SFile = str.ReadToEnd();Regex regex = new Regex( @"<tr>(\\s\* <td\[^>\]\*> \\s\* (?<value>\[^<\]\*?) \\s\* </td> )+ \\s\*</tr>", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace); foreach( Match m in regex.Matches( SFile ) ) { foreach( Capture item in m.Groups\["value"\].Captures ) { Console.WriteLine( item.Value ); } Console.WriteLine( ); } Console.ReadLine(); } }
I am facing one problem there is a comment in between the table content HTML File :-
<table border="0" cellpadding="5" cellspacing="0" width="100%">
<tbody><tr>
<td class="columnheading" nowrap="nowrap">Last Name</td>
<td class="columnheading" nowrap="nowrap">First Name</td>
<td class="columnheading" nowrap="nowrap">Role</td>
<td class="columnheading">Term</td>
<td class="columnheading">Company</td>
</tr>
<tr>
<td valign="top"><!-- Logic Here--> Gottlieb </td> <td valign="top">Pradep</td> <td valign="top">President </td> <td valign="top">8/15/2009 - 9/1/2010</td> <td valign="top">DSCMIT</td> </tr> <tr> <td valign="top"> <!-- Logic Here--> Rajesh </td> <td valign="top">H</td> <td valign="top"> President </td> <td valign="top">8/15/2009 - 8/14/2010</td> <td valign="top">BHSIT</td> </tr>
I am getting the Output as Last Name First Name Role Term Company and it stops at this line <!-- Logic Here--> Can anyone please help me to solve this problem. Thanking you, Naveen HS
-
Hello Everyone, I am trying to extract the contents of the table from a static HTML file. Program is almost giving the expected output. Program :-
class Program
{
static void Main()
{
StreamReader str = new StreamReader("C:\\AllRoles.html");
string SFile = str.ReadToEnd();Regex regex = new Regex( @"<tr>(\\s\* <td\[^>\]\*> \\s\* (?<value>\[^<\]\*?) \\s\* </td> )+ \\s\*</tr>", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace); foreach( Match m in regex.Matches( SFile ) ) { foreach( Capture item in m.Groups\["value"\].Captures ) { Console.WriteLine( item.Value ); } Console.WriteLine( ); } Console.ReadLine(); } }
I am facing one problem there is a comment in between the table content HTML File :-
<table border="0" cellpadding="5" cellspacing="0" width="100%">
<tbody><tr>
<td class="columnheading" nowrap="nowrap">Last Name</td>
<td class="columnheading" nowrap="nowrap">First Name</td>
<td class="columnheading" nowrap="nowrap">Role</td>
<td class="columnheading">Term</td>
<td class="columnheading">Company</td>
</tr>
<tr>
<td valign="top"><!-- Logic Here--> Gottlieb </td> <td valign="top">Pradep</td> <td valign="top">President </td> <td valign="top">8/15/2009 - 9/1/2010</td> <td valign="top">DSCMIT</td> </tr> <tr> <td valign="top"> <!-- Logic Here--> Rajesh </td> <td valign="top">H</td> <td valign="top"> President </td> <td valign="top">8/15/2009 - 8/14/2010</td> <td valign="top">BHSIT</td> </tr>
I am getting the Output as Last Name First Name Role Term Company and it stops at this line <!-- Logic Here--> Can anyone please help me to solve this problem. Thanking you, Naveen HS
I would try adding
RegexOptions.MultiLine
:)Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.