Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Regex weakling help

Regex weakling help

Scheduled Pinned Locked Moved C#
game-devregexhelpquestion
4 Posts 3 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    afronaut
    wrote on last edited by
    #1

    Hey all, Work is really boring so I'm going to write a screen scraper. But the best say to do this is using Regex's, something I need to work on because it's a particular area of weakness. If a page has the following structure: foo

    foo

    foo

    Is there a regex I could use to pick up what's between the tags? Like a regex to grab the title, another for the first table, and the second? Thanks much *->>Always working on my game, teach me *->>something new. cout << "dav1d\n";

    N H 2 Replies Last reply
    0
    • A afronaut

      Hey all, Work is really boring so I'm going to write a screen scraper. But the best say to do this is using Regex's, something I need to work on because it's a particular area of weakness. If a page has the following structure: foo

      foo

      foo

      Is there a regex I could use to pick up what's between the tags? Like a regex to grab the title, another for the first table, and the second? Thanks much *->>Always working on my game, teach me *->>something new. cout << "dav1d\n";

      N Offline
      N Offline
      Nick Parker
      wrote on last edited by
      #2

      Sure, here is a quick example so I am sure you can expand on it:

      private void ShowContent(string s)
      {
      	Regex r = new Regex("\*\[a-z\]\*", RegexOptions.IgnoreCase);
      	Match m = r.Match(s);
      	while(m.Success)
      	{		
      		string val = m.Value.Delete(0, 4).Delete(m.Value.Length - 4, 4);
      		if(val != null)
      			Console.WriteLine(v);
      		m = m.NextMatch();
      	}
      }
      

      - Nick Parker
      My Blog | My Articles

      H 1 Reply Last reply
      0
      • N Nick Parker

        Sure, here is a quick example so I am sure you can expand on it:

        private void ShowContent(string s)
        {
        	Regex r = new Regex("\*\[a-z\]\*", RegexOptions.IgnoreCase);
        	Match m = r.Match(s);
        	while(m.Success)
        	{		
        		string val = m.Value.Delete(0, 4).Delete(m.Value.Length - 4, 4);
        		if(val != null)
        			Console.WriteLine(v);
        		m = m.NextMatch();
        	}
        }
        

        - Nick Parker
        My Blog | My Articles

        H Offline
        H Offline
        Heath Stewart
        wrote on last edited by
        #3

        * is not a wildcard, though - you should actually just use "<td>[A-Za-z0-9]*</td>", which means that 0 or more alphanumeric characters (there are excape sequences you can use, too) are allowed between TD elements. What you have now will match 0 or more openning TD elements as well. This posting is provided "AS IS" with no warranties, and confers no rights. Software Design Engineer Developer Division Sustained Engineering Microsoft [My Articles]

        1 Reply Last reply
        0
        • A afronaut

          Hey all, Work is really boring so I'm going to write a screen scraper. But the best say to do this is using Regex's, something I need to work on because it's a particular area of weakness. If a page has the following structure: foo

          foo

          foo

          Is there a regex I could use to pick up what's between the tags? Like a regex to grab the title, another for the first table, and the second? Thanks much *->>Always working on my game, teach me *->>something new. cout << "dav1d\n";

          H Offline
          H Offline
          Heath Stewart
          wrote on last edited by
          #4

          Regex can be horribly unreliable and a complete pain when unforeseen formats creep up. I recommend using SgmlReader[^] written by a fellow Microsoftie. HTML is, if you don't know, an SGML grammar, as is XML and XHTML (which is actually an XML grammar that only looks like HTML because it uses the XHTML namespace as the default namespace so that namespace prefices aren't required). This posting is provided "AS IS" with no warranties, and confers no rights. Software Design Engineer Developer Division Sustained Engineering Microsoft [My Articles]

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups