Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Regular Expressions
  4. Can anybody explain how this Regex works.

Can anybody explain how this Regex works.

Scheduled Pinned Locked Moved Regular Expressions
csharpregextutorialquestion
8 Posts 3 Posters 16 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F Offline
    F Offline
    fiaolle
    wrote on last edited by
    #1

    Hi I'm using C# and I have the regex below to split words and not split a string ".. .." instead take the whole string as on item to a List. Example: text="all "1 dl"" after split all[0]="all" all[1]="1 dl" I found it in Google and it works, but I don't understand how it works.

    string regexSpliter = @"(?<=^(?:[^""]*""[^""]*"")*[^""]*) ";
    List all =new List_ (System.Text.RegularExpressions.Regex.Split(text, regexSpliter));

    And if I remove the space before the last " in the string it doesn't work as I want. It seems like it splits all the characters in to elements written in the string text. Can anybody please explain the string regexSplitter and why it has to be a space last in the string. Many thanks Fia

    P A 2 Replies Last reply
    0
    • F fiaolle

      Hi I'm using C# and I have the regex below to split words and not split a string ".. .." instead take the whole string as on item to a List. Example: text="all "1 dl"" after split all[0]="all" all[1]="1 dl" I found it in Google and it works, but I don't understand how it works.

      string regexSpliter = @"(?<=^(?:[^""]*""[^""]*"")*[^""]*) ";
      List all =new List_ (System.Text.RegularExpressions.Regex.Split(text, regexSpliter));

      And if I remove the space before the last " in the string it doesn't work as I want. It seems like it splits all the characters in to elements written in the string text. Can anybody please explain the string regexSplitter and why it has to be a space last in the string. Many thanks Fia

      P Offline
      P Offline
      PIEBALDconsult
      wrote on last edited by
      #2

      From what I can tell, that should be matching a SPACE that follows a anycharactersotherthanaquoteQUOTEanycharactersotherthanaquoteQUOTEanynumberofquotes, but you say it matches your test string so I must be misreading it. At any rate, you can make something simpler. What exactly do you have and what do you want from it?

      F 1 Reply Last reply
      0
      • P PIEBALDconsult

        From what I can tell, that should be matching a SPACE that follows a anycharactersotherthanaquoteQUOTEanycharactersotherthanaquoteQUOTEanynumberofquotes, but you say it matches your test string so I must be misreading it. At any rate, you can make something simpler. What exactly do you have and what do you want from it?

        F Offline
        F Offline
        fiaolle
        wrote on last edited by
        #3

        Hi Thanks for all replies. I'm trying to get words and if there are strings between quotes get that too that users have entered in a textbox. But I still don't understand how the string regexSplitter works. And I still don't understand why it has to be a space last in that string. Because when I remove it, it doesn't work as I want. Thanks Fia

        P 1 Reply Last reply
        0
        • F fiaolle

          Hi Thanks for all replies. I'm trying to get words and if there are strings between quotes get that too that users have entered in a textbox. But I still don't understand how the string regexSplitter works. And I still don't understand why it has to be a space last in that string. Because when I remove it, it doesn't work as I want. Thanks Fia

          P Offline
          P Offline
          PIEBALDconsult
          wrote on last edited by
          #4

          There has to be something. Have you tried other characters?

          F 1 Reply Last reply
          0
          • P PIEBALDconsult

            There has to be something. Have you tried other characters?

            F Offline
            F Offline
            fiaolle
            wrote on last edited by
            #5

            Hi What do you meen by something? I can write any characters I want in a word or a string. For example the text can contain 'hello by "to much" 10'. Thanks Fia

            P 1 Reply Last reply
            0
            • F fiaolle

              Hi What do you meen by something? I can write any characters I want in a word or a string. For example the text can contain 'hello by "to much" 10'. Thanks Fia

              P Offline
              P Offline
              PIEBALDconsult
              wrote on last edited by
              #6

              I mean the SPACE (or something else) needs to be there.

              A 1 Reply Last reply
              0
              • F fiaolle

                Hi I'm using C# and I have the regex below to split words and not split a string ".. .." instead take the whole string as on item to a List. Example: text="all "1 dl"" after split all[0]="all" all[1]="1 dl" I found it in Google and it works, but I don't understand how it works.

                string regexSpliter = @"(?<=^(?:[^""]*""[^""]*"")*[^""]*) ";
                List all =new List_ (System.Text.RegularExpressions.Regex.Split(text, regexSpliter));

                And if I remove the space before the last " in the string it doesn't work as I want. It seems like it splits all the characters in to elements written in the string text. Can anybody please explain the string regexSplitter and why it has to be a space last in the string. Many thanks Fia

                A Offline
                A Offline
                Andreas Gieriet
                wrote on last edited by
                #7

                See The 30 Minute Regex Tutorial and search for all occurances of (?<= in that article. This explains the meaning of (?<=...). You have always to separate the way you enter a pattern in C# and the pattern the Regex sees:

                C# @"..." pattern:

                @"(?<=^(?:[^""]*""[^""]*"")*[^""]*) "

                effective Regex pattern (here delimited by /.../):

                /(?<=^(?:[^"]*"[^"]*")*[^"]*) /

                I'm now only talking in Regex domain (the 2nd row), not how it is entered in the C# string. Let's start with the inner most part and work outwards:

                1. ..."[^"]*"...: "..."
                2. ...[^"]*"[^"]*"...: any number of non-"-char, followed by "..." from 1. above
                3. ...(?:[^"]*"[^"]*")*...: any repetition of the group described in 2. above
                4. ...^(?:...)*...: 3. above must match from the beginning of the text
                5. ...^(?:...)*[^"]*...: 4. above, followed by any number of non-"-char
                6. (?<=...) : match a space that is preceeded by the expression from 5. above; the (?<=...) is not part of the match

                The Regex searches for the space character and checks if the data before that space matches the prefix expression. If yes, the match is successful, otherwise, the Regex searches for the next space and checks again, etc. The given Regex and the given data match only on one space, the one after all. The underlined part matches with all: (?<=^(?:[^"]*"[^"]*")*[^"]*) . I.e. the regex splits the given data by spaces, respecting spaces within "..." strings as non-separators. Very complicated, though. I would do this differently, namely in positive terms (what you want to be part of the fields rather than what splits them):

                string pattern = @"\s*(""[^""]*""|\S+)\s*"; // maybe a more sophisticated pattern is
                // needed since the above expression seems
                // to match more,
                // but this is maybe an undesired side
                // effect of the complicated expression
                string[] split = Regex.Matches(input,

                1 Reply Last reply
                0
                • P PIEBALDconsult

                  I mean the SPACE (or something else) needs to be there.

                  A Offline
                  A Offline
                  Andreas Gieriet
                  wrote on last edited by
                  #8

                  See my explanation below (I know, this is very old topic, but I see it was not solved in this thread, so I added my lengthly explanation below). The Regex matches for spaces where the prefix expression ((?<=...)) matches. Far too complicated for cases where one wants to have a string split into part separated by spaces, ignoring spaces within "...". My preferred solution is using positive match criterion (as described in the sentence above):

                  string pattern = @"\s*(""[^""]*""|\S+)\s*";
                  var fields = Regex.Matches(input, pattern).Cast<Match>().Select(m=>m.Groups[1].Value);

                  Cheers Andi

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups