How to "match" multiple occurrences of item in text ? ( Regular expression )

Lost User

I am asking a question about regular expression. I am NOT looking for references to AI regular expression generators. Here in my text to find matches of ALL "hci" in it: "Devices:\n\thci1\t00:50:B6:80:4D:5D\n\thci0\t00:15:83:15:A2:CB\n" Here is my attempt to accomplish that task

QRegularExpression re("(hci[0-9]+)"); // matches first hci1 twice

It matches only first occurrence of hci1" and twice Please follow the established CodeProject instruction and read the post. PLEASE no references to AI regular expression generators. I need to learn TO BUILD regular repression - specially how to write it so it will attempt to match multiple items. PS I did try to use "global match" , no go.

Lost User

Salvatore Terress wrote:

Please follow the established CodeProject instruction and read the post. PLEASE no references to AI regular expression generators.

Please stop giving us orders. You have already been kicked off this forum once for your bad attitude. Everyone who tries to answer questions here does it in their own time and at no cost to you. If the answer is not what you were hoping for then feel free to ignore it.

CPallini

Why didn't you post the code? I mean, you commented

// matches first hci1 twice

But only the regular expression constructor is shown.

"In testa che avete, Signor di Ceprano?" -- Rigoletto

jschell

Salvatore Terress wrote:

It matches only first occurrence of hci1" and twice

Did you mean something like 'it only matches once but there are two'?

Salvatore Terress wrote:

I need to learn TO BUILD regular repression

Your regular expression is correct. So that means your usage of the regular expression engine/library is incorrect. Nothing to do with the regular expression itself Typically an engine/library will have an iteration idiom where each loop matches the next one. I didn't look at all but the following google seems to return results that would be relevant. {code} "QRegularExpression" iterate through matches {code}

Lost User

Since my last post is nowhere to be found here... I do appreciate all help resolving the issue. And since posting code was requested, here it is. It generally works retrieving multiple matches... such as "hci1 hci0 " what is missing is to retrieve PAIR of matches such as "hci1 00:xx:yy... in other words I can retrieve name or address but not BOTH. I would be grateful if somebody can give me the actual code and explanation how to match PAIR of values. As can be seen - I did try different placement of parentheses " (...)" but it did not work.

{// global matching code block

                text = " START MATCH NAME AND ADDRESS CODE BLOCK ";
                textEditPtr\_DEBUG->append(text);

#ifdef TASK
DELETED
#endif

                QString word;
                QStringList words;
                QString pattern = "Devices:\\n\\thci1\\t00:50:B6:80:4D:5D\\n\\thci0\\t00:15:83:15:A2:CB\\n";
                //QRegularExpression re("(\\\\w+)");
                //QRegularExpression re("(\[0-F\]{2}\[:-\]){5})");
                //QRegularExpression re("(\[0-F\]{2}\[:\])"); // match xx:
                //QRegularExpression re("((\[0-F\]{2}\[:\]){5}\[0-F\]{2})"); // TOK match full BOTH addressxx:
                // QRegularExpression re("((hci\[0-9\])((\[0-F\]{2}\[:\]){5}\[0-F\]{2}))"); NO GO 
                //QRegularExpression re("(hci\[0-9\])"); TOK
                QRegularExpression re("((\[0-F\]{2}\[:\]){5}\[0-F\]{2})");
                QRegularExpressionMatchIterator i = re.globalMatch(pattern);
           
                text = " HAS MATCH MULTI";
                qDebug()<< text;

                textEditPtr\_DEBUG->append(text);

#ifdef BYPASS no go
foreach(auto item,i)
{
text = " TEST foreach ";
text += item;
qDebug()<< text;

                textEditPtr\_DEBUG->append(text);
                }

#endif
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
word = match.captured(1);
words << word;
qDebug()<< word;

                text =  " Full match name and address...  ";
                text += word;
                qDebug()<< text;

                textEditPtr\_DEBUG->append(text);
                //}

jschell

Salvatore Terress wrote:

Here in my text to find matches of ALL "hci" in it:

So based on your sub - reply to me

"Devices:\n\thci1\t00:50:B6:80:4D:5D\n\thci0\t00:15:83:15:A2:CB\n"

What you actually want is the following

hci 00:50:B6:80:4D:5D
hci 00:15:83:15:A2:CB

Your text sample represents a multi-line input presumably with tab separators. From that sample it is actually pointless to use a regex since normal parsing (or csv parser would work.) But in terms of Regex you have several problems 1. Dealing with multiple lines 2. Correctly matching the values. 3. Dealing with special characters (including perhaps #1 above.) 4. Dealing with a list of results - as I stated in my other reply. So your regex is not even close to correct for matching the address. Looking at 2 only because that is the part most likely to be usable as a regex the equivalent regex for Perl would look like the following. Since you are not using Perl yours might vary but I suspect not.

(hci\w+([a-fA-F0-9][a-fA-F0-9]:)+[a-fA-F0-9][a-fA-F0-9])

There are variations on the above - Making the tab explicit - Restricting the id to the exact length - variation on the hex digit but I prefer the above form

Lost User

Thanks, but it still does not do what I want, as you expected it may. . Would it be OK to actually discuss and analyze this ? I am asking I am NOT telling the group shoud do that. QRegularExpression re("(hci\w+([a-fA-F0-9][a-fA-F0-9]:)+[a-fA-F0-9][a-fA-F0-9])"); There are , imho , important parts (to reg exp ) and they are ok when used alone. 1. If I opt to use QRegularExpressionMatchIterator i = re.globalMatch(pattern); There should be at lest three "code groups " delimited by "(,,,code group... )" 2. there are TWO base "reg expressions " one working OK as is "hci[0-9]" - function as "match hcix where x = [0-9] - some doc call [0-9] range or hci\w should perform SAME task 3. the analysis of "xx:" using [a-fA-F0-9] is simply "too cute " and [0-F] with "how many times do the previous - {2} works as well. I have no need to check for lower case "hexadecimal " since that actually does not exist anyway... So - I will , perhaps stubbornly, try to continue usage of at least ONE of the QT classes and maybe will find correct modification of the above. If i end up with plain "reg_exp" so be it... Thanks

jschell

When you post code use code blocks.

Salvatore Terress wrote:

If I opt to use QRegularExpressionMatchIterator i = re.globalMatch(pattern);

Not sure how to emphasize what I have already said several times. That question is NOT about regular expressions. It is about how to use that specific library. I suggest you start with a less difficult example to figure out how iteration works. And google for examples as I documented in the other reply.

Salvatore Terress wrote:

There should be at lest three "code groups " delimited by "(,,,code group... )"

You mean the parens. Parens are used for 'capturing' and for 'grouping'. For the case I gave the inner ones was to group the outer for capturing. And yes that is a problem when you iterate. The library you are using might have a way to specify that the parens are not 'capturing'. In Perl it looks like the following. There would be only one capturing group which would match either 'abcxyz' or 'defxyz'.

((?:abc|def)xyz)

I will note that I use that feature so rarely that I had to look it up.

Salvatore Terress wrote:

the analysis of "xx:"

As I mentioned there are variations to what I suggested. Myself I don't care for using '{2}' in a case like this. Just a preference. (Had to edit the above because I messed up the code block)

Lost User

SOLVED and CLOSED as initially asked for , by using named subgroup option. QRegularExpression re("(((?hci[0-9])(.*))?(?(([0-F]{2}[:]){5}[0-F]{2}))(.*))"); Thanks very much for all support given.