Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Regex Problem,Extracting text from Table

Regex Problem,Extracting text from Table

Scheduled Pinned Locked Moved C#
regexhelphtmlquestion
2 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • N Offline
    N Offline
    NaveenHS
    wrote on last edited by
    #1

    Hello Everyone, I am trying to extract the contents of the table from a static HTML file. Program is almost giving the expected output. Program :-

    class Program
    {
    static void Main()
    {
    StreamReader str = new StreamReader("C:\\AllRoles.html");
    string SFile = str.ReadToEnd();

    	    Regex regex = new Regex(
    			@"<tr>(\\s\* <td\[^>\]\*>  \\s\* (?<value>\[^<\]\*?) \\s\* </td> )+ \\s\*</tr>",
    	 		RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
    
    
    		 foreach( Match m in regex.Matches( SFile ) )
    		 {
    			foreach( Capture item in m.Groups\["value"\].Captures ) {
    
    			Console.WriteLine( item.Value );
    		}
    		
    			Console.WriteLine( );
    		}
    			Console.ReadLine();
    	}
    }
    

    I am facing one problem there is a comment in between the table content HTML File :-

    <table border="0" cellpadding="5" cellspacing="0" width="100%">
    <tbody><tr>
    <td class="columnheading" nowrap="nowrap">Last Name</td>
    <td class="columnheading" nowrap="nowrap">First Name</td>
    <td class="columnheading" nowrap="nowrap">Role</td>
    <td class="columnheading">Term</td>
    <td class="columnheading">Company</td>
    </tr>
    <tr>
    <td valign="top">

            <!-- Logic Here-->
            
            
            Gottlieb
            
            </td>
    		<td valign="top">Pradep</td>
    		<td valign="top">President
            
            </td>
    		<td valign="top">8/15/2009 - 9/1/2010</td>
    		<td valign="top">DSCMIT</td>
    
    	    </tr>
    		<tr>
    		<td valign="top">
            <!-- Logic Here-->
            
            
            Rajesh
            
            </td>
    		<td valign="top">H</td>
    		<td valign="top"> President
            
            </td>
    		<td valign="top">8/15/2009 - 8/14/2010</td>
            <td valign="top">BHSIT</td>
    	    </tr>
    

    I am getting the Output as Last Name First Name Role Term Company and it stops at this line <!-- Logic Here--> Can anyone please help me to solve this problem. Thanking you, Naveen HS

    L 1 Reply Last reply
    0
    • N NaveenHS

      Hello Everyone, I am trying to extract the contents of the table from a static HTML file. Program is almost giving the expected output. Program :-

      class Program
      {
      static void Main()
      {
      StreamReader str = new StreamReader("C:\\AllRoles.html");
      string SFile = str.ReadToEnd();

      	    Regex regex = new Regex(
      			@"<tr>(\\s\* <td\[^>\]\*>  \\s\* (?<value>\[^<\]\*?) \\s\* </td> )+ \\s\*</tr>",
      	 		RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
      
      
      		 foreach( Match m in regex.Matches( SFile ) )
      		 {
      			foreach( Capture item in m.Groups\["value"\].Captures ) {
      
      			Console.WriteLine( item.Value );
      		}
      		
      			Console.WriteLine( );
      		}
      			Console.ReadLine();
      	}
      }
      

      I am facing one problem there is a comment in between the table content HTML File :-

      <table border="0" cellpadding="5" cellspacing="0" width="100%">
      <tbody><tr>
      <td class="columnheading" nowrap="nowrap">Last Name</td>
      <td class="columnheading" nowrap="nowrap">First Name</td>
      <td class="columnheading" nowrap="nowrap">Role</td>
      <td class="columnheading">Term</td>
      <td class="columnheading">Company</td>
      </tr>
      <tr>
      <td valign="top">

              <!-- Logic Here-->
              
              
              Gottlieb
              
              </td>
      		<td valign="top">Pradep</td>
      		<td valign="top">President
              
              </td>
      		<td valign="top">8/15/2009 - 9/1/2010</td>
      		<td valign="top">DSCMIT</td>
      
      	    </tr>
      		<tr>
      		<td valign="top">
              <!-- Logic Here-->
              
              
              Rajesh
              
              </td>
      		<td valign="top">H</td>
      		<td valign="top"> President
              
              </td>
      		<td valign="top">8/15/2009 - 8/14/2010</td>
              <td valign="top">BHSIT</td>
      	    </tr>
      

      I am getting the Output as Last Name First Name Role Term Company and it stops at this line <!-- Logic Here--> Can anyone please help me to solve this problem. Thanking you, Naveen HS

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #2

      I would try adding RegexOptions.MultiLine :)

      Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

      Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

      1 Reply Last reply
      0
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Don't have an account? Register

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups