Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. How to extract all words - using regular expression

How to extract all words - using regular expression

Scheduled Pinned Locked Moved C / C++ / MFC
regexhelptutorialquestion
12 Posts 4 Posters 25 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Offline
    L Offline
    Lost User
    wrote on last edited by
    #1

    EDIT OK using g - as in "global" shoud work , but it does not result = BTUL->EditLine_RegExp(result, "(\\w+ \g)"); Where is my error ?? I am having mental block - forgot how to add " match all words ":. Here is my failing attempt to do so result = BTUL->EditLine_RegExp(result, "(\\w+*)" ); Can somebody help me to modify my regular expression to match all words PLEASE No references to AI regular expression generators - they do not do well multiple entries.

    K Richard Andrew x64R L J 5 Replies Last reply
    0
    • L Lost User

      EDIT OK using g - as in "global" shoud work , but it does not result = BTUL->EditLine_RegExp(result, "(\\w+ \g)"); Where is my error ?? I am having mental block - forgot how to add " match all words ":. Here is my failing attempt to do so result = BTUL->EditLine_RegExp(result, "(\\w+*)" ); Can somebody help me to modify my regular expression to match all words PLEASE No references to AI regular expression generators - they do not do well multiple entries.

      K Offline
      K Offline
      k5054
      wrote on last edited by
      #2

      Are you sure you want \w That includes 0-9 and underscore (_), as well as alphabetic characters, which may not be what you want. Is your intention to capture a set of all words in the input text? Is that what BTUL->EditLine_RegExp() does? I googled for EditLine_RegExp, and got precisely zero results, so maybe you can explain where that comes from and what it returns?

      Keep Calm and Carry On

      1 Reply Last reply
      0
      • L Lost User

        EDIT OK using g - as in "global" shoud work , but it does not result = BTUL->EditLine_RegExp(result, "(\\w+ \g)"); Where is my error ?? I am having mental block - forgot how to add " match all words ":. Here is my failing attempt to do so result = BTUL->EditLine_RegExp(result, "(\\w+*)" ); Can somebody help me to modify my regular expression to match all words PLEASE No references to AI regular expression generators - they do not do well multiple entries.

        Richard Andrew x64R Offline
        Richard Andrew x64R Offline
        Richard Andrew x64
        wrote on last edited by
        #3

        Try this: Match all words[^]

        The difficult we do right away... ...the impossible takes slightly longer.

        1 Reply Last reply
        0
        • L Lost User

          EDIT OK using g - as in "global" shoud work , but it does not result = BTUL->EditLine_RegExp(result, "(\\w+ \g)"); Where is my error ?? I am having mental block - forgot how to add " match all words ":. Here is my failing attempt to do so result = BTUL->EditLine_RegExp(result, "(\\w+*)" ); Can somebody help me to modify my regular expression to match all words PLEASE No references to AI regular expression generators - they do not do well multiple entries.

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          The following will find all words:

          /\w+/g

          You can test it quite quickly at RegExr: Learn, Build, & Test RegEx[^]. But it would help if you showed us the actual text you are working with, and the results you get. And please use <pre> tags around the code parts so your question is clear.

          L 1 Reply Last reply
          0
          • L Lost User

            EDIT OK using g - as in "global" shoud work , but it does not result = BTUL->EditLine_RegExp(result, "(\\w+ \g)"); Where is my error ?? I am having mental block - forgot how to add " match all words ":. Here is my failing attempt to do so result = BTUL->EditLine_RegExp(result, "(\\w+*)" ); Can somebody help me to modify my regular expression to match all words PLEASE No references to AI regular expression generators - they do not do well multiple entries.

            J Offline
            J Offline
            jschell
            wrote on last edited by
            #5

            Salvatore Terress wrote:

            Where is my error ??

            Your question is not specific to regular expressions but also to what is running the regular expression engine. But you did not provide that information. A 'g' is something that is external to regular expressions. So where you are using it is important and the only clue you provided is 'EditLine_RegExp' which googling for returned no results. But certainly since you didn't escape the backslash for the g that would never work. Other than that of course there is also the following - A space is not considered a word. Your capture group includes that. - There could be more than one space. - Are you matching on a single line? If not there are other complications. - If there is ONLY words in your line then it is pointless to use regex at all. Just split it. - If there are OTHER things besides words then I don't believe what you are doing will work (but again you didn't state what so maybe it is.)

            L 1 Reply Last reply
            0
            • L Lost User

              The following will find all words:

              /\w+/g

              You can test it quite quickly at RegExr: Learn, Build, & Test RegEx[^]. But it would help if you showed us the actual text you are working with, and the results you get. And please use <pre> tags around the code parts so your question is clear.

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              Many thanks for all the support. FYI I did use "[ -~]+" to extract first continuous words. Now I am working on to extract ALL words... I did try pass "[/\w+/g]+" to my otherwise working function and ended up with [/w+/g]+ - the back slash is missing, And this is result , part of my debug messages

              " instring text \t\n Waiting to connect to bluetoothd..."
              " regular expression \t\n [/w+/g]+"
              " Has all match g"

              You asked for the string I am trying to extract stuff from here it is

                      "Waiting to connect to bluetoothd...\\r\\u001B\[0;94m\[bluetooth\]\\u001B\[0m#                                                                              \\r\\r\\u001B\[0;94m\[bluetooth\]\\u001B\[0m# \\r                        \\rAgent registered\\n\\u001B\[0;94m\[bluetooth\]\\u001B\[0m# "
              

              As you can see - the extracted "g" is from "Agent". No good... I suspect Qt is messing with passing the backslash...

                  // result = BTUL->EditLine\_RegExp\_Ext(result, "\[ -~\]+",textEditPtr\_DEBUG, textEditPtr\_DEBUG);
                 result = BTUL->EditLine\_RegExp\_Ext(result, "\[/\\w+/g\]+",textEditPtr\_DEBUG, textEditPtr\_DEBUG);
              

              "Waiting to connect to bluetoothd..."
              "Waiting to connect to bluetoothd..."
              "START EditLine_RegExp...QString BT_Utility_Library::EditLine_RegExp_Ext(QString, QString, QTextEdit *, QTextEdit *)1321"
              " instring text \t\n Waiting to connect to bluetoothd..."
              " regular expression \t\n [/w+/g]+"
              " Has all match g"

              L 1 Reply Last reply
              0
              • J jschell

                Salvatore Terress wrote:

                Where is my error ??

                Your question is not specific to regular expressions but also to what is running the regular expression engine. But you did not provide that information. A 'g' is something that is external to regular expressions. So where you are using it is important and the only clue you provided is 'EditLine_RegExp' which googling for returned no results. But certainly since you didn't escape the backslash for the g that would never work. Other than that of course there is also the following - A space is not considered a word. Your capture group includes that. - There could be more than one space. - Are you matching on a single line? If not there are other complications. - If there is ONLY words in your line then it is pointless to use regex at all. Just split it. - If there are OTHER things besides words then I don't believe what you are doing will work (but again you didn't state what so maybe it is.)

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #7

                Your question is not specific to regular expressions but also to what is running the regular expression engine. But you did not provide that information. Since most of "AI reg expressions" generators are working and Qt SAME expression does not - you have a point. I guess I will ask in Qt forum about that. Yes, there are other means to verify that the string contains desired word, (QString "contains" method works peachy ) however, I sure like to learn more about using regular expression - so I like to stick with reg expressions for now. ADDENDUM My post is about using regular expression - it is NOT about the function I am using to actually implement regular expression. That function works as expected and there is no need to evacuate that function here. If it did not work as desired I would say so.

                L 1 Reply Last reply
                0
                • L Lost User

                  Many thanks for all the support. FYI I did use "[ -~]+" to extract first continuous words. Now I am working on to extract ALL words... I did try pass "[/\w+/g]+" to my otherwise working function and ended up with [/w+/g]+ - the back slash is missing, And this is result , part of my debug messages

                  " instring text \t\n Waiting to connect to bluetoothd..."
                  " regular expression \t\n [/w+/g]+"
                  " Has all match g"

                  You asked for the string I am trying to extract stuff from here it is

                          "Waiting to connect to bluetoothd...\\r\\u001B\[0;94m\[bluetooth\]\\u001B\[0m#                                                                              \\r\\r\\u001B\[0;94m\[bluetooth\]\\u001B\[0m# \\r                        \\rAgent registered\\n\\u001B\[0;94m\[bluetooth\]\\u001B\[0m# "
                  

                  As you can see - the extracted "g" is from "Agent". No good... I suspect Qt is messing with passing the backslash...

                      // result = BTUL->EditLine\_RegExp\_Ext(result, "\[ -~\]+",textEditPtr\_DEBUG, textEditPtr\_DEBUG);
                     result = BTUL->EditLine\_RegExp\_Ext(result, "\[/\\w+/g\]+",textEditPtr\_DEBUG, textEditPtr\_DEBUG);
                  

                  "Waiting to connect to bluetoothd..."
                  "Waiting to connect to bluetoothd..."
                  "START EditLine_RegExp...QString BT_Utility_Library::EditLine_RegExp_Ext(QString, QString, QTextEdit *, QTextEdit *)1321"
                  " instring text \t\n Waiting to connect to bluetoothd..."
                  " regular expression \t\n [/w+/g]+"
                  " Has all match g"

                  L Offline
                  L Offline
                  Lost User
                  wrote on last edited by
                  #8

                  The backslash character is used as the escape character within strings, so you need to escape it:

                  "[/\\w+/g]+"

                  1 Reply Last reply
                  0
                  • L Lost User

                    Your question is not specific to regular expressions but also to what is running the regular expression engine. But you did not provide that information. Since most of "AI reg expressions" generators are working and Qt SAME expression does not - you have a point. I guess I will ask in Qt forum about that. Yes, there are other means to verify that the string contains desired word, (QString "contains" method works peachy ) however, I sure like to learn more about using regular expression - so I like to stick with reg expressions for now. ADDENDUM My post is about using regular expression - it is NOT about the function I am using to actually implement regular expression. That function works as expected and there is no need to evacuate that function here. If it did not work as desired I would say so.

                    L Offline
                    L Offline
                    Lost User
                    wrote on last edited by
                    #9

                    Here is the actual snippet of the code. I have "hard coded " the RegExp

                     // "\[/\\\\w+/g\]+"
                        RegExp = "\[/\\\\w+/g\]+";
                        text = " validate regular expression   ";
                        text += RegExp;
                        qDebug() << text;
                        textDEBUG->append(text);
                    
                        // RegExp = "\[/\\\\w+/g\]+";
                        text = " validate inString   ";
                        text += inString;
                        qDebug() << text;
                        textDEBUG->append(text);
                    
                    
                        QRegularExpression re(RegExp);
                    
                    
                        // QRegularExpression re("/(\[A-Z\])\\w+/g");
                    

                    //QRegularExpression re("([A-Z])\w+");

                        QRegularExpressionMatch match = re.match(inString);
                    
                        if (match.hasMatch()) { // matches all
                            text = " Has all match ";
                            QStringList result = match.capturedTexts();
                    
                            text += result.at(0);    // test show only first
                    
                            qDebug() << text;
                            textDEBUG->append(text);
                    
                            return result.at(0);
                    

                    Here is the relevant debug output

                    "START EditLine_RegExp...QString BT_Utility_Library::EditLine_RegExp_Ext(QString, QString, QTextEdit *, QTextEdit *)1321"
                    " instring text \t\n Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                    " regular expression \t\n (\\w+\\s:?)"
                    " validate regular expression [/\\w+/g]+"
                    " validate inString Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                    " Has all match Waiting"
                    10:12:17: /mnt/RAID_124/BT/BT_Oct23_BASE_/mdi/MDI exited with code 0

                    The expression matches ONLY the first word it finds. My goal is to match ALL the words in the inString. I am going to try one of the AI reg exp generators, but from experience using them this RegExp MAY work....

                    K J 2 Replies Last reply
                    0
                    • L Lost User

                      Here is the actual snippet of the code. I have "hard coded " the RegExp

                       // "\[/\\\\w+/g\]+"
                          RegExp = "\[/\\\\w+/g\]+";
                          text = " validate regular expression   ";
                          text += RegExp;
                          qDebug() << text;
                          textDEBUG->append(text);
                      
                          // RegExp = "\[/\\\\w+/g\]+";
                          text = " validate inString   ";
                          text += inString;
                          qDebug() << text;
                          textDEBUG->append(text);
                      
                      
                          QRegularExpression re(RegExp);
                      
                      
                          // QRegularExpression re("/(\[A-Z\])\\w+/g");
                      

                      //QRegularExpression re("([A-Z])\w+");

                          QRegularExpressionMatch match = re.match(inString);
                      
                          if (match.hasMatch()) { // matches all
                              text = " Has all match ";
                              QStringList result = match.capturedTexts();
                      
                              text += result.at(0);    // test show only first
                      
                              qDebug() << text;
                              textDEBUG->append(text);
                      
                              return result.at(0);
                      

                      Here is the relevant debug output

                      "START EditLine_RegExp...QString BT_Utility_Library::EditLine_RegExp_Ext(QString, QString, QTextEdit *, QTextEdit *)1321"
                      " instring text \t\n Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                      " regular expression \t\n (\\w+\\s:?)"
                      " validate regular expression [/\\w+/g]+"
                      " validate inString Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                      " Has all match Waiting"
                      10:12:17: /mnt/RAID_124/BT/BT_Oct23_BASE_/mdi/MDI exited with code 0

                      The expression matches ONLY the first word it finds. My goal is to match ALL the words in the inString. I am going to try one of the AI reg exp generators, but from experience using them this RegExp MAY work....

                      K Offline
                      K Offline
                      k5054
                      wrote on last edited by
                      #10

                      Maybe this SO page can help you? [https://stackoverflow.com/questions/37003623/how-to-capture-multiple-repeated-groups\](https://stackoverflow.com/questions/37003623/how-to-capture-multiple-repeated-groups)

                      Keep Calm and Carry On

                      1 Reply Last reply
                      0
                      • L Lost User

                        EDIT OK using g - as in "global" shoud work , but it does not result = BTUL->EditLine_RegExp(result, "(\\w+ \g)"); Where is my error ?? I am having mental block - forgot how to add " match all words ":. Here is my failing attempt to do so result = BTUL->EditLine_RegExp(result, "(\\w+*)" ); Can somebody help me to modify my regular expression to match all words PLEASE No references to AI regular expression generators - they do not do well multiple entries.

                        L Offline
                        L Offline
                        Lost User
                        wrote on last edited by
                        #11

                        After some "RTFM" I came up with this code

                        if(inString.contains("Agent") & inString.contains("registered") )
                        {
                        text = "Match ";
                        }else
                        {
                        text = " No match ";
                        }

                            qDebug() << text;
                            textDEBUG->append(text);
                        

                        The " contains " actually accepts reg expression and string too , so I am not sure how to tell the difference. But it does what I want it to do.

                        1 Reply Last reply
                        0
                        • L Lost User

                          Here is the actual snippet of the code. I have "hard coded " the RegExp

                           // "\[/\\\\w+/g\]+"
                              RegExp = "\[/\\\\w+/g\]+";
                              text = " validate regular expression   ";
                              text += RegExp;
                              qDebug() << text;
                              textDEBUG->append(text);
                          
                              // RegExp = "\[/\\\\w+/g\]+";
                              text = " validate inString   ";
                              text += inString;
                              qDebug() << text;
                              textDEBUG->append(text);
                          
                          
                              QRegularExpression re(RegExp);
                          
                          
                              // QRegularExpression re("/(\[A-Z\])\\w+/g");
                          

                          //QRegularExpression re("([A-Z])\w+");

                              QRegularExpressionMatch match = re.match(inString);
                          
                              if (match.hasMatch()) { // matches all
                                  text = " Has all match ";
                                  QStringList result = match.capturedTexts();
                          
                                  text += result.at(0);    // test show only first
                          
                                  qDebug() << text;
                                  textDEBUG->append(text);
                          
                                  return result.at(0);
                          

                          Here is the relevant debug output

                          "START EditLine_RegExp...QString BT_Utility_Library::EditLine_RegExp_Ext(QString, QString, QTextEdit *, QTextEdit *)1321"
                          " instring text \t\n Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                          " regular expression \t\n (\\w+\\s:?)"
                          " validate regular expression [/\\w+/g]+"
                          " validate inString Waiting to connect to bluetoothd...\r\u001B[0;94m[bluetooth]\u001B[0m# \r\r\u001B[0;94m[bluetooth]\u001B[0m# \r \rAgent registered\n\u001B[0;94m[bluetooth]\u001B[0m# "
                          " Has all match Waiting"
                          10:12:17: /mnt/RAID_124/BT/BT_Oct23_BASE_/mdi/MDI exited with code 0

                          The expression matches ONLY the first word it finds. My goal is to match ALL the words in the inString. I am going to try one of the AI reg exp generators, but from experience using them this RegExp MAY work....

                          J Offline
                          J Offline
                          jschell
                          wrote on last edited by
                          #12

                          Salvatore Terress wrote:

                          learn more about using regular expression....RegExp = "[/\\w+/g]+";

                          Keep in mind that that form of a regular expression will be unlikely to work in any other regular expression interpreter. Perl, javascript, C# and Java (perhaps others) all use the same rules for most of the basics for regex and that will not work with any of them. For those that means the following - Match A-Za-z0-9. - Match a forward slash (redundant twice) - Match a 'g'. Redundant with the word class match.

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups