Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Can anyone explain how this Regex works?

Can anyone explain how this Regex works?

Scheduled Pinned Locked Moved C#
regextutorialquestion
4 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F Offline
    F Offline
    fiaolle
    wrote on last edited by
    #1

    Hi I have the regex below to split words and not split a string ".. .." instead take the whole string as on item to a List. Example: text="all "1 dl"" after split all[0]="all" all[1]="1 dl" I found it in Google and it works, but I don't understand how it works.

    string regexSpliter = @"(?<=^(?:[^""]*""[^""]*"")*[^""]*) ";
    List all =new List_ (System.Text.RegularExpressions.Regex.Split(text, regexSpliter));

    And if I remove the space before the last " in the string it doesn't work as I want. It seems like it splits all the characters written in the string text. Can anybody please explain the string regexSplitter and why it has to be a space last in the string. Many thanks Fia

    P B B 3 Replies Last reply
    0
    • F fiaolle

      Hi I have the regex below to split words and not split a string ".. .." instead take the whole string as on item to a List. Example: text="all "1 dl"" after split all[0]="all" all[1]="1 dl" I found it in Google and it works, but I don't understand how it works.

      string regexSpliter = @"(?<=^(?:[^""]*""[^""]*"")*[^""]*) ";
      List all =new List_ (System.Text.RegularExpressions.Regex.Split(text, regexSpliter));

      And if I remove the space before the last " in the string it doesn't work as I want. It seems like it splits all the characters written in the string text. Can anybody please explain the string regexSplitter and why it has to be a space last in the string. Many thanks Fia

      P Offline
      P Offline
      PIEBALDconsult
      wrote on last edited by
      #2

      It appears to be splitting on QUOTE characters. I don't know about the SPACE. Try asking in the http://www.codeproject.com/Forums/1580841/Regular-Expressions.aspx[^] forum. Edit: Also, see (?<= subexpression) Zero-width positive lookbehind assertion. here http://msdn.microsoft.com/en-us/library/az24scfc.aspx#grouping_constructs[^] Edit: On further thought, that appears to be a more complex expression than necessary -- the lookbehind seems needless and abused.

      1 Reply Last reply
      0
      • F fiaolle

        Hi I have the regex below to split words and not split a string ".. .." instead take the whole string as on item to a List. Example: text="all "1 dl"" after split all[0]="all" all[1]="1 dl" I found it in Google and it works, but I don't understand how it works.

        string regexSpliter = @"(?<=^(?:[^""]*""[^""]*"")*[^""]*) ";
        List all =new List_ (System.Text.RegularExpressions.Regex.Split(text, regexSpliter));

        And if I remove the space before the last " in the string it doesn't work as I want. It seems like it splits all the characters written in the string text. Can anybody please explain the string regexSplitter and why it has to be a space last in the string. Many thanks Fia

        B Offline
        B Offline
        BillWoodruff
        wrote on last edited by
        #3

        I truly appreciate the "deep art" of RegEx expressions, although I'd never spend time trying to reverse-engineer what any complex one, like this, does. My understanding is they are expanded internally into lots of code, but, when compiled, give excellent performance. Meanwhile, have you considered an alternative like:

        char[] c = new char[] {'\"'};

        string s = "all \"1 dl\"";

        List<string> sList = (s.Split(c)).Where(str => str != "").ToList();

        Disclaimer: the above code was created very quickly, and tested only on your input string, making the assumption that you would have to escape the internal quote delimiters. On your input it does work to create a two element List<string>, whose elements match the results of your RegEx. Whether the above code is appropriate/robust, etc., for your parsing needs, I have no idea; it's meant only to show the possibility of an alternative. good luck, Bill

        "I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone." Bjarne Stroustrop circa 1990

        1 Reply Last reply
        0
        • F fiaolle

          Hi I have the regex below to split words and not split a string ".. .." instead take the whole string as on item to a List. Example: text="all "1 dl"" after split all[0]="all" all[1]="1 dl" I found it in Google and it works, but I don't understand how it works.

          string regexSpliter = @"(?<=^(?:[^""]*""[^""]*"")*[^""]*) ";
          List all =new List_ (System.Text.RegularExpressions.Regex.Split(text, regexSpliter));

          And if I remove the space before the last " in the string it doesn't work as I want. It seems like it splits all the characters written in the string text. Can anybody please explain the string regexSplitter and why it has to be a space last in the string. Many thanks Fia

          B Offline
          B Offline
          BobJanova
          wrote on last edited by
          #4

          It's basically saying 'split on space or on this big group which matches a quoted string'. I guess the fact it's a parenthesised group is how it ends up being returned even though it was the split expression. I don't fully understand it but that's the basic idea. This isn't how I would parse a command string, I have some code for that in my Lobby Server article:

          // Command line utility
          using System.Text.RegularExpressions;

          namespace RedCorona.Util {
          public class Command {
          public static string[] Parse(string text){
          if(text.IndexOf('"') < 0) return text.Split(' ');
          else{
          MatchCollection mc = Regex.Matches(text, "\"(?[^\"]*)\" *|(?\\w+)");
          int len = mc.Count;
          string[] res = new string[len];
          for(int i = 0; i < len; i++) res[i] = mc[i].Groups["word"].Value;
          return res;
          }
          }
          }
          }

          I'm not sure if MatchCollection implements IEnumerable<string> and therefore whether you could do a one-liner as you have done there; this code comes from pre-generic days (which is why it returns an array not a List<string>). My simple brain can only think in terms of the matched groups not the delimiters so this regex matches a quoted (first part) or unquoted (second part) 'word'.

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups