Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. How to extract all words - using regular expression

How to extract all words - using regular expression

Scheduled Pinned Locked Moved C / C++ / MFC
regexhelptutorialquestion
12 Posts 4 Posters 24 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Lost User

    EDIT OK using g - as in "global" shoud work , but it does not result = BTUL->EditLine_RegExp(result, "(\\w+ \g)"); Where is my error ?? I am having mental block - forgot how to add " match all words ":. Here is my failing attempt to do so result = BTUL->EditLine_RegExp(result, "(\\w+*)" ); Can somebody help me to modify my regular expression to match all words PLEASE No references to AI regular expression generators - they do not do well multiple entries.

    Richard Andrew x64R Offline
    Richard Andrew x64R Offline
    Richard Andrew x64
    wrote on last edited by
    #3

    Try this: Match all words[^]

    The difficult we do right away... ...the impossible takes slightly longer.

    1 Reply Last reply
    0
    • L Lost User

      EDIT OK using g - as in "global" shoud work , but it does not result = BTUL->EditLine_RegExp(result, "(\\w+ \g)"); Where is my error ?? I am having mental block - forgot how to add " match all words ":. Here is my failing attempt to do so result = BTUL->EditLine_RegExp(result, "(\\w+*)" ); Can somebody help me to modify my regular expression to match all words PLEASE No references to AI regular expression generators - they do not do well multiple entries.

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #4

      The following will find all words:

      /\w+/g

      You can test it quite quickly at RegExr: Learn, Build, & Test RegEx[^]. But it would help if you showed us the actual text you are working with, and the results you get. And please use <pre> tags around the code parts so your question is clear.

      L 1 Reply Last reply
      0
      • L Lost User

        EDIT OK using g - as in "global" shoud work , but it does not result = BTUL->EditLine_RegExp(result, "(\\w+ \g)"); Where is my error ?? I am having mental block - forgot how to add " match all words ":. Here is my failing attempt to do so result = BTUL->EditLine_RegExp(result, "(\\w+*)" ); Can somebody help me to modify my regular expression to match all words PLEASE No references to AI regular expression generators - they do not do well multiple entries.

        J Offline
        J Offline
        jschell
        wrote on last edited by
        #5

        Salvatore Terress wrote:

        Where is my error ??

        Your question is not specific to regular expressions but also to what is running the regular expression engine. But you did not provide that information. A 'g' is something that is external to regular expressions. So where you are using it is important and the only clue you provided is 'EditLine_RegExp' which googling for returned no results. But certainly since you didn't escape the backslash for the g that would never work. Other than that of course there is also the following - A space is not considered a word. Your capture group includes that. - There could be more than one space. - Are you matching on a single line? If not there are other complications. - If there is ONLY words in your line then it is pointless to use regex at all. Just split it. - If there are OTHER things besides words then I don't believe what you are doing will work (but again you didn't state what so maybe it is.)

        L 1 Reply Last reply
        0
        • L Lost User

          The following will find all words:

          /\w+/g

          You can test it quite quickly at RegExr: Learn, Build, & Test RegEx[^]. But it would help if you showed us the actual text you are working with, and the results you get. And please use <pre> tags around the code parts so your question is clear.

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #6

          Many thanks for all the support. FYI I did use "[ -~]+" to extract first continuous words. Now I am working on to extract ALL words... I did try pass "[/\w+/g]+" to my otherwise working function and ended up with [/w+/g]+ - the back slash is missing, And this is result , part of my debug messages

          " instring text \t\n Waiting to connect to bluetoothd..."
          " regular expression \t\n [/w+/g]+"
          " Has all match g"

          You asked for the string I am trying to extract stuff from here it is

                  "Waiting to connect to bluetoothd...\\r\\u001B\[0;94m\[bluetooth\]\\u001B\[0m#                                                                              \\r\\r\\u001B\[0;94m\[bluetooth\]\\u001B\[0m# \\r                        \\rAgent registered\\n\\u001B\[0;94m\[bluetooth\]\\u001B\[0m# "
          

          As you can see - the extracted "g" is from "Agent". No good... I suspect Qt is messing with passing the backslash...

              // result = BTUL->EditLine\_RegExp\_Ext(result, "\[ -~\]+",textEditPtr\_DEBUG, textEditPtr\_DEBUG);
             result = BTUL->EditLine\_RegExp\_Ext(result, "\[/\\w+/g\]+",textEditPtr\_DEBUG, textEditPtr\_DEBUG);
          

          "Waiting to connect to bluetoothd..."
          "Waiting to connect to bluetoothd..."
          "START EditLine_RegExp...QString BT_Utility_Library::EditLine_RegExp_Ext(QString, QString, QTextEdit *, QTextEdit *)1321"
          " instring text \t\n Waiting to connect to bluetoothd..."
          " regular expression \t\n [/w+/g]+"
          " Has all match g"

          L 1 Reply Last reply
          0
          • J jschell

            Salvatore Terress wrote:

            Where is my error ??

            Your question is not specific to regular expressions but also to what is running the regular expression engine. But you did not provide that information. A 'g' is something that is external to regular expressions. So where you are using it is important and the only clue you provided is 'EditLine_RegExp' which googling for returned no results. But certainly since you didn't escape the backslash for the g that would never work. Other than that of course there is also the following - A space is not considered a word. Your capture group includes that. - There could be more than one space. - Are you matching on a single line? If not there are other complications. - If there is ONLY words in your line then it is pointless to use regex at all. Just split it. - If there are OTHER things besides words then I don't believe what you are doing will work (but again you didn't state what so maybe it is.)

            L Offline
            L Offline
            Lost User
            wrote on last edited by
            #7

            Your question is not specific to regular expressions but also to what is running the regular expression engine. But you did not provide that information. Since most of "AI reg expressions" generators are working and Qt SAME expression does not - you have a point. I guess I will ask in Qt forum about that. Yes, there are other means to verify that the string contains desired word, (QString "contains" method works peachy ) however, I sure like to learn more about using regular expression - so I like to stick with reg expressions for now. ADDENDUM My post is about using regular expression - it is NOT about the function I am using to actually implement regular expression. That function works as expected and there is no need to evacuate that function here. If it did not work as desired I would say so.

            L 1 Reply Last reply
            0
            • L Lost User

              Many thanks for all the support. FYI I did use "[ -~]+" to extract first continuous words. Now I am working on to extract ALL words... I did try pass "[/\w+/g]+" to my otherwise working function and ended up with [/w+/g]+ - the back slash is missing, And this is result , part of my debug messages

              " instring text \t\n Waiting to connect to bluetoothd..."
              " regular expression \t\n [/w+/g]+"
              " Has all match g"

              You asked for the string I am trying to extract stuff from here it is

                      "Waiting to connect to bluetoothd...\\r\\u001B\[0;94m\[bluetooth\]\\u001B\[0m#                                                                              \\r\\r\\u001B\[0;94m\[bluetooth\]\\u001B\[0m# \\r                        \\rAgent registered\\n\\u001B\[0;94m\[bluetooth\]\\u001B\[0m# "
              

              As you can see - the extracted "g" is from "Agent". No good... I suspect Qt is messing with passing the backslash...

                  // result = BTUL->EditLine\_RegExp\_Ext(result, "\[ -~\]+",textEditPtr\_DEBUG, textEditPtr\_DEBUG);
                 result = BTUL->EditLine\_RegExp\_Ext(result, "\[/\\w+/g\]+",textEditPtr\_DEBUG, textEditPtr\_DEBUG);
              

              "Waiting to connect to bluetoothd..."
              "Waiting to connect to bluetoothd..."
              "START EditLine_RegExp...QString BT_Utility_Library::EditLine_RegExp_Ext(QString, QString, QTextEdit *, QTextEdit *)1321"
              " instring text \t\n Waiting to connect to bluetoothd..."
              " regular expression \t\n [/w+/g]+"
              " Has all match g"

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #8

              The backslash character is used as the escape character within strings, so you need to escape it:

              "[/\\w+/g]+"

              1 Reply Last reply
              0
              • L Lost User

                Your question is not specific to regular expressions but also to what is running the regular expression engine. But you did not provide that information. Since most of "AI reg expressions" generators are working and Qt SAME expression does not - you have a point. I guess I will ask in Qt forum about that. Yes, there are other means to verify that the string contains desired word, (QString "contains" method works peachy ) however, I sure like to learn more about using regular expression - so I like to stick with reg expressions for now. ADDENDUM My post is about using regular expression - it is NOT about the function I am using to actually implement regular expression. That function works as expected and there is no need to evacuate that function here. If it did not work as desired I would say so.

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #9

                Here is the actual snippet of the code. I have "hard coded " the RegExp

                 // "\[/\\\\w+/g\]+"
                    RegExp = "\[/\\\\w+/g\]+";
                    text = " validate regular expression   ";
                    text += RegExp;
                    qDebug() << text;
                    textDEBUG->append(text);
                
                    // RegExp = "\[/\\\\w+/g\]+";
                    text = " validate inString   ";
                    text += inString;
                    qDebug() << text;
                    textDEBUG->append(text);
                
                
                    QRegularExpression re(RegExp);
                
                
                    // QRegularExpression re("/(\[A-Z\])\\w+/g");
                

                //QRegularExpression re("([A-Z])\w+");

                    QRegularExpressionMatch match = re.match(inString);
                
                    if (match.hasMatch()) { // matches all
                        text = " Has all match ";
                        QStringList result = match.capturedTexts();
                
                        text += result.at(0);    // test show only first
                
                        qDebug() << text;
                        textDEBUG->append(text);
                
                        return result.at(0);
                

                Here is the relevant debug output

                "START EditLine_RegExp...QString BT_Utility_Library::EditLine_RegExp_Ext(QString, QString, QTextEdit *, QTextEdit *)1321"
                " instring text \t\n Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                " regular expression \t\n (\\w+\\s:?)"
                " validate regular expression [/\\w+/g]+"
                " validate inString Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                " Has all match Waiting"
                10:12:17: /mnt/RAID_124/BT/BT_Oct23_BASE_/mdi/MDI exited with code 0

                The expression matches ONLY the first word it finds. My goal is to match ALL the words in the inString. I am going to try one of the AI reg exp generators, but from experience using them this RegExp MAY work....

                K J 2 Replies Last reply
                0
                • L Lost User

                  Here is the actual snippet of the code. I have "hard coded " the RegExp

                   // "\[/\\\\w+/g\]+"
                      RegExp = "\[/\\\\w+/g\]+";
                      text = " validate regular expression   ";
                      text += RegExp;
                      qDebug() << text;
                      textDEBUG->append(text);
                  
                      // RegExp = "\[/\\\\w+/g\]+";
                      text = " validate inString   ";
                      text += inString;
                      qDebug() << text;
                      textDEBUG->append(text);
                  
                  
                      QRegularExpression re(RegExp);
                  
                  
                      // QRegularExpression re("/(\[A-Z\])\\w+/g");
                  

                  //QRegularExpression re("([A-Z])\w+");

                      QRegularExpressionMatch match = re.match(inString);
                  
                      if (match.hasMatch()) { // matches all
                          text = " Has all match ";
                          QStringList result = match.capturedTexts();
                  
                          text += result.at(0);    // test show only first
                  
                          qDebug() << text;
                          textDEBUG->append(text);
                  
                          return result.at(0);
                  

                  Here is the relevant debug output

                  "START EditLine_RegExp...QString BT_Utility_Library::EditLine_RegExp_Ext(QString, QString, QTextEdit *, QTextEdit *)1321"
                  " instring text \t\n Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                  " regular expression \t\n (\\w+\\s:?)"
                  " validate regular expression [/\\w+/g]+"
                  " validate inString Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                  " Has all match Waiting"
                  10:12:17: /mnt/RAID_124/BT/BT_Oct23_BASE_/mdi/MDI exited with code 0

                  The expression matches ONLY the first word it finds. My goal is to match ALL the words in the inString. I am going to try one of the AI reg exp generators, but from experience using them this RegExp MAY work....

                  K Offline
                  K Offline
                  k5054
                  wrote on last edited by
                  #10

                  Maybe this SO page can help you? [https://stackoverflow.com/questions/37003623/how-to-capture-multiple-repeated-groups\](https://stackoverflow.com/questions/37003623/how-to-capture-multiple-repeated-groups)

                  Keep Calm and Carry On

                  1 Reply Last reply
                  0
                  • L Lost User

                    EDIT OK using g - as in "global" shoud work , but it does not result = BTUL->EditLine_RegExp(result, "(\\w+ \g)"); Where is my error ?? I am having mental block - forgot how to add " match all words ":. Here is my failing attempt to do so result = BTUL->EditLine_RegExp(result, "(\\w+*)" ); Can somebody help me to modify my regular expression to match all words PLEASE No references to AI regular expression generators - they do not do well multiple entries.

                    L Offline
                    L Offline
                    Lost User
                    wrote on last edited by
                    #11

                    After some "RTFM" I came up with this code

                    if(inString.contains("Agent") & inString.contains("registered") )
                    {
                    text = "Match ";
                    }else
                    {
                    text = " No match ";
                    }

                        qDebug() << text;
                        textDEBUG->append(text);
                    

                    The " contains " actually accepts reg expression and string too , so I am not sure how to tell the difference. But it does what I want it to do.

                    1 Reply Last reply
                    0
                    • L Lost User

                      Here is the actual snippet of the code. I have "hard coded " the RegExp

                       // "\[/\\\\w+/g\]+"
                          RegExp = "\[/\\\\w+/g\]+";
                          text = " validate regular expression   ";
                          text += RegExp;
                          qDebug() << text;
                          textDEBUG->append(text);
                      
                          // RegExp = "\[/\\\\w+/g\]+";
                          text = " validate inString   ";
                          text += inString;
                          qDebug() << text;
                          textDEBUG->append(text);
                      
                      
                          QRegularExpression re(RegExp);
                      
                      
                          // QRegularExpression re("/(\[A-Z\])\\w+/g");
                      

                      //QRegularExpression re("([A-Z])\w+");

                          QRegularExpressionMatch match = re.match(inString);
                      
                          if (match.hasMatch()) { // matches all
                              text = " Has all match ";
                              QStringList result = match.capturedTexts();
                      
                              text += result.at(0);    // test show only first
                      
                              qDebug() << text;
                              textDEBUG->append(text);
                      
                              return result.at(0);
                      

                      Here is the relevant debug output

                      "START EditLine_RegExp...QString BT_Utility_Library::EditLine_RegExp_Ext(QString, QString, QTextEdit *, QTextEdit *)1321"
                      " instring text \t\n Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                      " regular expression \t\n (\\w+\\s:?)"
                      " validate regular expression [/\\w+/g]+"
                      " validate inString Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                      " Has all match Waiting"
                      10:12:17: /mnt/RAID_124/BT/BT_Oct23_BASE_/mdi/MDI exited with code 0

                      The expression matches ONLY the first word it finds. My goal is to match ALL the words in the inString. I am going to try one of the AI reg exp generators, but from experience using them this RegExp MAY work....

                      J Offline
                      J Offline
                      jschell
                      wrote on last edited by
                      #12

                      Salvatore Terress wrote:

                      learn more about using regular expression....RegExp = "[/\\w+/g]+";

                      Keep in mind that that form of a regular expression will be unlikely to work in any other regular expression interpreter. Perl, javascript, C# and Java (perhaps others) all use the same rules for most of the basics for regex and that will not work with any of them. For those that means the following - Match A-Za-z0-9. - Match a forward slash (redundant twice) - Match a 'g'. Redundant with the word class match.

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups