Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Help in Regular expressions

Help in Regular expressions

Scheduled Pinned Locked Moved C#
csharpregexhelphtml
8 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    shankbond
    wrote on last edited by
    #1

    Hi, I am new to regular expressions, I am having a problem regarding matching a specific keyword in certain condition.

    string regexstring="\\bhtml\\b|[.]net\\b";
    Regex rgx = new Regex(regexstring, RegexOptions.IgnoreCase);
    MatchCollection matcol = null;
    string st_data = "I am a .net developer, but also know about asp.net, vb.net ; I work on c#/asp.net platform. I also know dhtml, html4.0, html/xml etc etc.";
    //I want the regex to capture all occureneces for html where html is a word seperated \b-->word boundary or is surrounded by decimals like html4.0 above.
    st_data = System.Web.HttpUtility.HtmlDecode(Regex.Replace(st_data, @"<(.|\n)*?>", string.Empty));
    matcol = rgx.Matches(st_data);

            foreach (Match mat in matcol)
            {
                //I WILL GET the mat.value here.
            }
    

    I tried various variations but of no use. I want to match html4.0 but some how I need only html out of it.kind of substringed match. I hope You understand my point. Please help any help shall be appreciated.

    Thanks Shankbond

    OriginalGriffO 1 Reply Last reply
    0
    • S shankbond

      Hi, I am new to regular expressions, I am having a problem regarding matching a specific keyword in certain condition.

      string regexstring="\\bhtml\\b|[.]net\\b";
      Regex rgx = new Regex(regexstring, RegexOptions.IgnoreCase);
      MatchCollection matcol = null;
      string st_data = "I am a .net developer, but also know about asp.net, vb.net ; I work on c#/asp.net platform. I also know dhtml, html4.0, html/xml etc etc.";
      //I want the regex to capture all occureneces for html where html is a word seperated \b-->word boundary or is surrounded by decimals like html4.0 above.
      st_data = System.Web.HttpUtility.HtmlDecode(Regex.Replace(st_data, @"<(.|\n)*?>", string.Empty));
      matcol = rgx.Matches(st_data);

              foreach (Match mat in matcol)
              {
                  //I WILL GET the mat.value here.
              }
      

      I tried various variations but of no use. I want to match html4.0 but some how I need only html out of it.kind of substringed match. I hope You understand my point. Please help any help shall be appreciated.

      Thanks Shankbond

      OriginalGriffO Offline
      OriginalGriffO Offline
      OriginalGriff
      wrote on last edited by
      #2

      I'm not sure exactly what you are trying to do, but have a look at match-but-don't-capture groups ( ?: ) Or, explain exactly what you want to achieve and I'll have a look.

      Did you know: That by counting the rings on a tree trunk, you can tell how many other trees it has slept with.

      "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
      "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

      S 1 Reply Last reply
      0
      • OriginalGriffO OriginalGriff

        I'm not sure exactly what you are trying to do, but have a look at match-but-don't-capture groups ( ?: ) Or, explain exactly what you want to achieve and I'll have a look.

        Did you know: That by counting the rings on a tree trunk, you can tell how many other trees it has slept with.

        S Offline
        S Offline
        shankbond
        wrote on last edited by
        #3

        Hi, I tried looking this stuff, but may be I did something wrong with match-don't capture. 1) I want the regex to match --> html56 but capture only html also 2) match 45html but capture only html, 3) and don't match or capture at all abchtmldef (not surrounded by alphabets one word only) 4) match html I used \b(?:\d*)html(?:\d*)\b It would be nice if someone can help.

        Thanks Shankbond

        OriginalGriffO 2 Replies Last reply
        0
        • S shankbond

          Hi, I tried looking this stuff, but may be I did something wrong with match-don't capture. 1) I want the regex to match --> html56 but capture only html also 2) match 45html but capture only html, 3) and don't match or capture at all abchtmldef (not surrounded by alphabets one word only) 4) match html I used \b(?:\d*)html(?:\d*)\b It would be nice if someone can help.

          Thanks Shankbond

          OriginalGriffO Offline
          OriginalGriffO Offline
          OriginalGriff
          wrote on last edited by
          #4

          The match to do that is quite simple:(?:\d|\s)(?<data>html)(?:\d|\s)

          Find (but do not capture) either a digit or a whitespace,
          Find and capture in a group called data the four characters 'h', 't', 'm', 'l' in that order,
          Find (but do not capture) either a digit or a whitespace.

          But I doubt that will solve your problem! What are you trying to achieve? It looks as if you are trying to process a CV and extract all the relevant job skills without manually looking at it. If so, then you may need to be a bit more clever / thorough about it, particularly with a trigger word such as "html" which appears in every web page...

          Did you know: That by counting the rings on a tree trunk, you can tell how many other trees it has slept with.

          "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
          "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

          S 1 Reply Last reply
          0
          • S shankbond

            Hi, I tried looking this stuff, but may be I did something wrong with match-don't capture. 1) I want the regex to match --> html56 but capture only html also 2) match 45html but capture only html, 3) and don't match or capture at all abchtmldef (not surrounded by alphabets one word only) 4) match html I used \b(?:\d*)html(?:\d*)\b It would be nice if someone can help.

            Thanks Shankbond

            OriginalGriffO Offline
            OriginalGriffO Offline
            OriginalGriff
            wrote on last edited by
            #5

            Just to add to what I said, go and get a copy of Expresso - it examines and generates Regular expressions. Expresso[^] It's free, and really can help create and understand complicated expressions. You can also feed it a sample file that you want to examine and it will show you what the Regex will capture. I wish I'd written it!

            Did you know: That by counting the rings on a tree trunk, you can tell how many other trees it has slept with.

            "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
            "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

            S 1 Reply Last reply
            0
            • OriginalGriffO OriginalGriff

              Just to add to what I said, go and get a copy of Expresso - it examines and generates Regular expressions. Expresso[^] It's free, and really can help create and understand complicated expressions. You can also feed it a sample file that you want to examine and it will show you what the Regex will capture. I wish I'd written it!

              Did you know: That by counting the rings on a tree trunk, you can tell how many other trees it has slept with.

              S Offline
              S Offline
              shankbond
              wrote on last edited by
              #6

              Thanks but I already have one :), my query is solved now I got the solution by using

              (?<=\\d+(\\.\\d*)?|\\b)html(?=\\d+(\\.\\d*)?|\\b)

              but I am having a new query now? (?:.....) is also a non capturing group so I can theoretically use it in place of look ahead and look behind but that does not work here? any solutions?

              Thanks Shankbond

              S 1 Reply Last reply
              0
              • OriginalGriffO OriginalGriff

                The match to do that is quite simple:(?:\d|\s)(?<data>html)(?:\d|\s)

                Find (but do not capture) either a digit or a whitespace,
                Find and capture in a group called data the four characters 'h', 't', 'm', 'l' in that order,
                Find (but do not capture) either a digit or a whitespace.

                But I doubt that will solve your problem! What are you trying to achieve? It looks as if you are trying to process a CV and extract all the relevant job skills without manually looking at it. If so, then you may need to be a bit more clever / thorough about it, particularly with a trigger word such as "html" which appears in every web page...

                Did you know: That by counting the rings on a tree trunk, you can tell how many other trees it has slept with.

                S Offline
                S Offline
                shankbond
                wrote on last edited by
                #7

                OriginalGriff wrote:

                particularly with a trigger word such as "html" which appears in every web page...

                Yes You are absolutly right. I did that with the help of a javascript.

                Thanks Shankbond

                1 Reply Last reply
                0
                • S shankbond

                  Thanks but I already have one :), my query is solved now I got the solution by using

                  (?<=\\d+(\\.\\d*)?|\\b)html(?=\\d+(\\.\\d*)?|\\b)

                  but I am having a new query now? (?:.....) is also a non capturing group so I can theoretically use it in place of look ahead and look behind but that does not work here? any solutions?

                  Thanks Shankbond

                  S Offline
                  S Offline
                  shankbond
                  wrote on last edited by
                  #8

                  can someone really explain that; I am curious about it.

                  Thanks Shankbond

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups