Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. How to "match" multiple occurrences of item in text ? ( Regular expression )

How to "match" multiple occurrences of item in text ? ( Regular expression )

Scheduled Pinned Locked Moved C / C++ / MFC
regexquestiontutorial
9 Posts 3 Posters 16 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Offline
    L Offline
    Lost User
    wrote on last edited by
    #1

    I am asking a question about regular expression. I am NOT looking for references to AI regular expression generators. Here in my text to find matches of ALL "hci" in it: "Devices:\n\thci1\t00:50:B6:80:4D:5D\n\thci0\t00:15:83:15:A2:CB\n" Here is my attempt to accomplish that task

    QRegularExpression re("(hci[0-9]+)"); // matches first hci1 twice

    It matches only first occurrence of hci1" and twice Please follow the established CodeProject instruction and read the post. PLEASE no references to AI regular expression generators. I need to learn TO BUILD regular repression - specially how to write it so it will attempt to match multiple items. PS I did try to use "global match" , no go.

    L CPalliniC J 4 Replies Last reply
    0
    • L Lost User

      I am asking a question about regular expression. I am NOT looking for references to AI regular expression generators. Here in my text to find matches of ALL "hci" in it: "Devices:\n\thci1\t00:50:B6:80:4D:5D\n\thci0\t00:15:83:15:A2:CB\n" Here is my attempt to accomplish that task

      QRegularExpression re("(hci[0-9]+)"); // matches first hci1 twice

      It matches only first occurrence of hci1" and twice Please follow the established CodeProject instruction and read the post. PLEASE no references to AI regular expression generators. I need to learn TO BUILD regular repression - specially how to write it so it will attempt to match multiple items. PS I did try to use "global match" , no go.

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      Salvatore Terress wrote:

      Please follow the established CodeProject instruction and read the post. PLEASE no references to AI regular expression generators.

      Please stop giving us orders. You have already been kicked off this forum once for your bad attitude. Everyone who tries to answer questions here does it in their own time and at no cost to you. If the answer is not what you were hoping for then feel free to ignore it.

      1 Reply Last reply
      0
      • L Lost User

        I am asking a question about regular expression. I am NOT looking for references to AI regular expression generators. Here in my text to find matches of ALL "hci" in it: "Devices:\n\thci1\t00:50:B6:80:4D:5D\n\thci0\t00:15:83:15:A2:CB\n" Here is my attempt to accomplish that task

        QRegularExpression re("(hci[0-9]+)"); // matches first hci1 twice

        It matches only first occurrence of hci1" and twice Please follow the established CodeProject instruction and read the post. PLEASE no references to AI regular expression generators. I need to learn TO BUILD regular repression - specially how to write it so it will attempt to match multiple items. PS I did try to use "global match" , no go.

        CPalliniC Offline
        CPalliniC Offline
        CPallini
        wrote on last edited by
        #3

        Why didn't you post the code? I mean, you commented

        // matches first hci1 twice

        But only the regular expression constructor is shown.

        "In testa che avete, Signor di Ceprano?" -- Rigoletto

        In testa che avete, signor di Ceprano?

        1 Reply Last reply
        0
        • L Lost User

          I am asking a question about regular expression. I am NOT looking for references to AI regular expression generators. Here in my text to find matches of ALL "hci" in it: "Devices:\n\thci1\t00:50:B6:80:4D:5D\n\thci0\t00:15:83:15:A2:CB\n" Here is my attempt to accomplish that task

          QRegularExpression re("(hci[0-9]+)"); // matches first hci1 twice

          It matches only first occurrence of hci1" and twice Please follow the established CodeProject instruction and read the post. PLEASE no references to AI regular expression generators. I need to learn TO BUILD regular repression - specially how to write it so it will attempt to match multiple items. PS I did try to use "global match" , no go.

          J Offline
          J Offline
          jschell
          wrote on last edited by
          #4

          Salvatore Terress wrote:

          It matches only first occurrence of hci1" and twice

          Did you mean something like 'it only matches once but there are two'?

          Salvatore Terress wrote:

          I need to learn TO BUILD regular repression

          Your regular expression is correct. So that means your usage of the regular expression engine/library is incorrect. Nothing to do with the regular expression itself Typically an engine/library will have an iteration idiom where each loop matches the next one. I didn't look at all but the following google seems to return results that would be relevant. {code} "QRegularExpression" iterate through matches {code}

          L 1 Reply Last reply
          0
          • J jschell

            Salvatore Terress wrote:

            It matches only first occurrence of hci1" and twice

            Did you mean something like 'it only matches once but there are two'?

            Salvatore Terress wrote:

            I need to learn TO BUILD regular repression

            Your regular expression is correct. So that means your usage of the regular expression engine/library is incorrect. Nothing to do with the regular expression itself Typically an engine/library will have an iteration idiom where each loop matches the next one. I didn't look at all but the following google seems to return results that would be relevant. {code} "QRegularExpression" iterate through matches {code}

            L Offline
            L Offline
            Lost User
            wrote on last edited by
            #5

            Since my last post is nowhere to be found here... I do appreciate all help resolving the issue. And since posting code was requested, here it is. It generally works retrieving multiple matches... such as "hci1 hci0 " what is missing is to retrieve PAIR of matches such as "hci1 00:xx:yy... in other words I can retrieve name or address but not BOTH. I would be grateful if somebody can give me the actual code and explanation how to match PAIR of values. As can be seen - I did try different placement of parentheses " (...)" but it did not work.

            {// global matching code block

                            text = " START MATCH NAME AND ADDRESS CODE BLOCK ";
                            textEditPtr\_DEBUG->append(text);
            

            #ifdef TASK
            DELETED
            #endif

                            QString word;
                            QStringList words;
                            QString pattern = "Devices:\\n\\thci1\\t00:50:B6:80:4D:5D\\n\\thci0\\t00:15:83:15:A2:CB\\n";
                            //QRegularExpression re("(\\\\w+)");
                            //QRegularExpression re("(\[0-F\]{2}\[:-\]){5})");
                            //QRegularExpression re("(\[0-F\]{2}\[:\])"); // match xx:
                            //QRegularExpression re("((\[0-F\]{2}\[:\]){5}\[0-F\]{2})"); // TOK match full BOTH addressxx:
                            // QRegularExpression re("((hci\[0-9\])((\[0-F\]{2}\[:\]){5}\[0-F\]{2}))"); NO GO 
                            //QRegularExpression re("(hci\[0-9\])"); TOK
                            QRegularExpression re("((\[0-F\]{2}\[:\]){5}\[0-F\]{2})");
                            QRegularExpressionMatchIterator i = re.globalMatch(pattern);
                       
                            text = " HAS MATCH MULTI";
                            qDebug()<< text;
            
                            textEditPtr\_DEBUG->append(text);
            

            #ifdef BYPASS no go
            foreach(auto item,i)
            {
            text = " TEST foreach ";
            text += item;
            qDebug()<< text;

                            textEditPtr\_DEBUG->append(text);
                            }
            

            #endif
            while (i.hasNext()) {
            QRegularExpressionMatch match = i.next();
            word = match.captured(1);
            words << word;
            qDebug()<< word;

                            text =  " Full match name and address...  ";
                            text += word;
                            qDebug()<< text;
            
                            textEditPtr\_DEBUG->append(text);
                            //}
            
            1 Reply Last reply
            0
            • L Lost User

              I am asking a question about regular expression. I am NOT looking for references to AI regular expression generators. Here in my text to find matches of ALL "hci" in it: "Devices:\n\thci1\t00:50:B6:80:4D:5D\n\thci0\t00:15:83:15:A2:CB\n" Here is my attempt to accomplish that task

              QRegularExpression re("(hci[0-9]+)"); // matches first hci1 twice

              It matches only first occurrence of hci1" and twice Please follow the established CodeProject instruction and read the post. PLEASE no references to AI regular expression generators. I need to learn TO BUILD regular repression - specially how to write it so it will attempt to match multiple items. PS I did try to use "global match" , no go.

              J Offline
              J Offline
              jschell
              wrote on last edited by
              #6

              Salvatore Terress wrote:

              Here in my text to find matches of ALL "hci" in it:

              So based on your sub - reply to me

              "Devices:\n\thci1\t00:50:B6:80:4D:5D\n\thci0\t00:15:83:15:A2:CB\n"

              What you actually want is the following

              hci 00:50:B6:80:4D:5D
              hci 00:15:83:15:A2:CB

              Your text sample represents a multi-line input presumably with tab separators. From that sample it is actually pointless to use a regex since normal parsing (or csv parser would work.) But in terms of Regex you have several problems 1. Dealing with multiple lines 2. Correctly matching the values. 3. Dealing with special characters (including perhaps #1 above.) 4. Dealing with a list of results - as I stated in my other reply. So your regex is not even close to correct for matching the address. Looking at 2 only because that is the part most likely to be usable as a regex the equivalent regex for Perl would look like the following. Since you are not using Perl yours might vary but I suspect not.

              (hci\w+([a-fA-F0-9][a-fA-F0-9]:)+[a-fA-F0-9][a-fA-F0-9])

              There are variations on the above - Making the tab explicit - Restricting the id to the exact length - variation on the hex digit but I prefer the above form

              L 1 Reply Last reply
              0
              • J jschell

                Salvatore Terress wrote:

                Here in my text to find matches of ALL "hci" in it:

                So based on your sub - reply to me

                "Devices:\n\thci1\t00:50:B6:80:4D:5D\n\thci0\t00:15:83:15:A2:CB\n"

                What you actually want is the following

                hci 00:50:B6:80:4D:5D
                hci 00:15:83:15:A2:CB

                Your text sample represents a multi-line input presumably with tab separators. From that sample it is actually pointless to use a regex since normal parsing (or csv parser would work.) But in terms of Regex you have several problems 1. Dealing with multiple lines 2. Correctly matching the values. 3. Dealing with special characters (including perhaps #1 above.) 4. Dealing with a list of results - as I stated in my other reply. So your regex is not even close to correct for matching the address. Looking at 2 only because that is the part most likely to be usable as a regex the equivalent regex for Perl would look like the following. Since you are not using Perl yours might vary but I suspect not.

                (hci\w+([a-fA-F0-9][a-fA-F0-9]:)+[a-fA-F0-9][a-fA-F0-9])

                There are variations on the above - Making the tab explicit - Restricting the id to the exact length - variation on the hex digit but I prefer the above form

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #7

                Thanks, but it still does not do what I want, as you expected it may. . Would it be OK to actually discuss and analyze this ? I am asking I am NOT telling the group shoud do that. QRegularExpression re("(hci\w+([a-fA-F0-9][a-fA-F0-9]:)+[a-fA-F0-9][a-fA-F0-9])"); There are , imho , important parts (to reg exp ) and they are ok when used alone. 1. If I opt to use QRegularExpressionMatchIterator i = re.globalMatch(pattern); There should be at lest three "code groups " delimited by "(,,,code group... )" 2. there are TWO base "reg expressions " one working OK as is "hci[0-9]" - function as "match hcix where x = [0-9] - some doc call [0-9] range or hci\w should perform SAME task 3. the analysis of "xx:" using [a-fA-F0-9] is simply "too cute " and [0-F] with "how many times do the previous - {2} works as well. I have no need to check for lower case "hexadecimal " since that actually does not exist anyway... So - I will , perhaps stubbornly, try to continue usage of at least ONE of the QT classes and maybe will find correct modification of the above. If i end up with plain "reg_exp" so be it... Thanks

                J 1 Reply Last reply
                0
                • L Lost User

                  Thanks, but it still does not do what I want, as you expected it may. . Would it be OK to actually discuss and analyze this ? I am asking I am NOT telling the group shoud do that. QRegularExpression re("(hci\w+([a-fA-F0-9][a-fA-F0-9]:)+[a-fA-F0-9][a-fA-F0-9])"); There are , imho , important parts (to reg exp ) and they are ok when used alone. 1. If I opt to use QRegularExpressionMatchIterator i = re.globalMatch(pattern); There should be at lest three "code groups " delimited by "(,,,code group... )" 2. there are TWO base "reg expressions " one working OK as is "hci[0-9]" - function as "match hcix where x = [0-9] - some doc call [0-9] range or hci\w should perform SAME task 3. the analysis of "xx:" using [a-fA-F0-9] is simply "too cute " and [0-F] with "how many times do the previous - {2} works as well. I have no need to check for lower case "hexadecimal " since that actually does not exist anyway... So - I will , perhaps stubbornly, try to continue usage of at least ONE of the QT classes and maybe will find correct modification of the above. If i end up with plain "reg_exp" so be it... Thanks

                  J Offline
                  J Offline
                  jschell
                  wrote on last edited by
                  #8

                  When you post code use code blocks.

                  Salvatore Terress wrote:

                  If I opt to use QRegularExpressionMatchIterator i = re.globalMatch(pattern);

                  Not sure how to emphasize what I have already said several times. That question is NOT about regular expressions. It is about how to use that specific library. I suggest you start with a less difficult example to figure out how iteration works. And google for examples as I documented in the other reply.

                  Salvatore Terress wrote:

                  There should be at lest three "code groups " delimited by "(,,,code group... )"

                  You mean the parens. Parens are used for 'capturing' and for 'grouping'. For the case I gave the inner ones was to group the outer for capturing. And yes that is a problem when you iterate. The library you are using might have a way to specify that the parens are not 'capturing'. In Perl it looks like the following. There would be only one capturing group which would match either 'abcxyz' or 'defxyz'.

                  ((?:abc|def)xyz)

                  I will note that I use that feature so rarely that I had to look it up.

                  Salvatore Terress wrote:

                  the analysis of "xx:"

                  As I mentioned there are variations to what I suggested. Myself I don't care for using '{2}' in a case like this. Just a preference. (Had to edit the above because I messed up the code block)

                  L 1 Reply Last reply
                  0
                  • J jschell

                    When you post code use code blocks.

                    Salvatore Terress wrote:

                    If I opt to use QRegularExpressionMatchIterator i = re.globalMatch(pattern);

                    Not sure how to emphasize what I have already said several times. That question is NOT about regular expressions. It is about how to use that specific library. I suggest you start with a less difficult example to figure out how iteration works. And google for examples as I documented in the other reply.

                    Salvatore Terress wrote:

                    There should be at lest three "code groups " delimited by "(,,,code group... )"

                    You mean the parens. Parens are used for 'capturing' and for 'grouping'. For the case I gave the inner ones was to group the outer for capturing. And yes that is a problem when you iterate. The library you are using might have a way to specify that the parens are not 'capturing'. In Perl it looks like the following. There would be only one capturing group which would match either 'abcxyz' or 'defxyz'.

                    ((?:abc|def)xyz)

                    I will note that I use that feature so rarely that I had to look it up.

                    Salvatore Terress wrote:

                    the analysis of "xx:"

                    As I mentioned there are variations to what I suggested. Myself I don't care for using '{2}' in a case like this. Just a preference. (Had to edit the above because I messed up the code block)

                    L Offline
                    L Offline
                    Lost User
                    wrote on last edited by
                    #9

                    SOLVED and CLOSED as initially asked for , by using named subgroup option. QRegularExpression re("(((?hci[0-9])(.*))?(?(([0-F]{2}[:]){5}[0-F]{2}))(.*))"); Thanks very much for all support given.

                    1 Reply Last reply
                    0
                    Reply
                    • Reply as topic
                    Log in to reply
                    • Oldest to Newest
                    • Newest to Oldest
                    • Most Votes


                    • Login

                    • Don't have an account? Register

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • World
                    • Users
                    • Groups