Where to start - need to extract...using regular expression
-
Where to start - need to extract... I am asking for help as far as where to start looking to resolve this issue. I really do not want a solution without knowing how it was formed - been there, done that and did not learn much from such approach. I do not need RTFM instructions either... Here is my task I am using C++ "system calls" to run what is normally a command while running "terminal" in Linux. ( do not ask for reasons ...) I get what I call "raw" output which includes "control characters" - as such these are NOT visible while the command is run in "terminal". I have some success using regular expression to remove these control characters... Now I want to use regular expression to extract ( first word ) on each line which DOES not have an option(s) and eventually retrieve the command description located on the same line hence I like to build a "dictionary of commands without options "... Here is an example - command list does not have "options" system-alias expect as an option
[0;94madmin [0mAdmin Policy Submenu
[1;39mlist [0mList available controllers
[1;39mshow [ctrl] [0mController information
[1;39mselect [0mSelect default controller
[1;39mdevices [0mList available devices
[1;39mpaired-devices [0mList paired devices
[1;39msystem-alias [0mSet controller alias
[1;39mreset-alias [0mReset controller aliasI would appreciate a reply in style ..."try this xyz resource and pay attention to chapter such and such..." Thanks any help will be greatly appreciated. PS I know to to build regular expression using Internet resource...
-
Where to start - need to extract... I am asking for help as far as where to start looking to resolve this issue. I really do not want a solution without knowing how it was formed - been there, done that and did not learn much from such approach. I do not need RTFM instructions either... Here is my task I am using C++ "system calls" to run what is normally a command while running "terminal" in Linux. ( do not ask for reasons ...) I get what I call "raw" output which includes "control characters" - as such these are NOT visible while the command is run in "terminal". I have some success using regular expression to remove these control characters... Now I want to use regular expression to extract ( first word ) on each line which DOES not have an option(s) and eventually retrieve the command description located on the same line hence I like to build a "dictionary of commands without options "... Here is an example - command list does not have "options" system-alias expect as an option
[0;94madmin [0mAdmin Policy Submenu
[1;39mlist [0mList available controllers
[1;39mshow [ctrl] [0mController information
[1;39mselect [0mSelect default controller
[1;39mdevices [0mList available devices
[1;39mpaired-devices [0mList paired devices
[1;39msystem-alias [0mSet controller alias
[1;39mreset-alias [0mReset controller aliasI would appreciate a reply in style ..."try this xyz resource and pay attention to chapter such and such..." Thanks any help will be greatly appreciated. PS I know to to build regular expression using Internet resource...
First off, have you tried pre-pending
TERM=dumb
to your command string. That *should* remove all the control chars from the command output e.g.[k5054@localhost ~]$ TERM=vt100 infocmp
Reconstructed via infocmp from file: /usr/share/terminfo/v/vt100
vt100|vt100-am|DEC VT100 (w/advanced video),
am, mc5i, msgr, xenl, xon,
cols#80, it#8, lines#24, vt#3,
acsc=``aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~,
bel=^G, blink=\E[5m$<2>, bold=\E[1m$<2>,
clear=\E[H\E[J$<50>, cr=\r, csr=\E[%i%p1%d;%p2%dr,
cub=\E[%p1%dD, cub1=^H, cud=\E[%p1%dB, cud1=\n,
cuf=\E[%p1%dC, cuf1=\E[C$<2>,
cup=\E[%i%p1%d;%p2%dH$<5>, cuu=\E[%p1%dA,
cuu1=\E[A$<2>, ed=\E[J$<50>, el=\E[K$<3>, el1=\E[1K$<3>,
enacs=\E(B\E)0, home=\E[H, ht=^I, hts=\EH, ind=\n, ka1=\EOq,
ka3=\EOs, kb2=\EOr, kbs=^H, kc1=\EOp, kc3=\EOn, kcub1=\EOD,
kcud1=\EOB, kcuf1=\EOC, kcuu1=\EOA, kent=\EOM, kf0=\EOy,
kf1=\EOP, kf10=\EOx, kf2=\EOQ, kf3=\EOR, kf4=\EOS, kf5=\EOt,
kf6=\EOu, kf7=\EOv, kf8=\EOl, kf9=\EOw, lf1=pf1, lf2=pf2,
lf3=pf3, lf4=pf4, mc0=\E[0i, mc4=\E[4i, mc5=\E[5i, rc=\E8,
rev=\E[7m$<2>, ri=\EM$<5>, rmacs=^O, rmam=\E[?7l,
rmkx=\E[?1l\E>, rmso=\E[m$<2>, rmul=\E[m$<2>,
rs2=\E<\E>\E[?3;4;5l\E[?7;8h\E[r, sc=\E7,
sgr=\E[0%?%p1%p6%|%t;1%;%?%p2%t;4%;%?%p1%p3%|%t;7%;%?%p4%t;5%;m%?%p9%t\016%e\017%;$<2>,
sgr0=\E[m\017$<2>, smacs=^N, smam=\E[?7h, smkx=\E[?1h\E=,
smso=\E[7m$<2>, smul=\E[4m$<2>, tbc=\E[3g,[k5054@localhost ~]$ TERM=dumb infocmp
Reconstructed via infocmp from file: /usr/share/terminfo/d/dumb
dumb|80-column dumb tty,
am,
cols#80,
bel=^G, cr=\r, cud1=\n, ind=\n,
[k5054@localhost ~]$In general you can set any environment variable this way, so you might do something like
LD_LIBRARY_PATH=/home/k5054/lib DEBUG=1 ./foo
Which would add LD_LIBRARY_PATH and DEBUG variables to the environment, but only for the duration of the given command. But on to your problem. Assuming you've managed to remove your control characters, what it looks like you want to do is to match any line that does not have an option to it. Based on what you have here, you could match on any line that does not contain either a '
[
' (i.e. a required argumetn) or a '<
' (i.e. an optional argument). So the regex for that would be[^<\[]
. Note we need to esc -
First off, have you tried pre-pending
TERM=dumb
to your command string. That *should* remove all the control chars from the command output e.g.[k5054@localhost ~]$ TERM=vt100 infocmp
Reconstructed via infocmp from file: /usr/share/terminfo/v/vt100
vt100|vt100-am|DEC VT100 (w/advanced video),
am, mc5i, msgr, xenl, xon,
cols#80, it#8, lines#24, vt#3,
acsc=``aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~,
bel=^G, blink=\E[5m$<2>, bold=\E[1m$<2>,
clear=\E[H\E[J$<50>, cr=\r, csr=\E[%i%p1%d;%p2%dr,
cub=\E[%p1%dD, cub1=^H, cud=\E[%p1%dB, cud1=\n,
cuf=\E[%p1%dC, cuf1=\E[C$<2>,
cup=\E[%i%p1%d;%p2%dH$<5>, cuu=\E[%p1%dA,
cuu1=\E[A$<2>, ed=\E[J$<50>, el=\E[K$<3>, el1=\E[1K$<3>,
enacs=\E(B\E)0, home=\E[H, ht=^I, hts=\EH, ind=\n, ka1=\EOq,
ka3=\EOs, kb2=\EOr, kbs=^H, kc1=\EOp, kc3=\EOn, kcub1=\EOD,
kcud1=\EOB, kcuf1=\EOC, kcuu1=\EOA, kent=\EOM, kf0=\EOy,
kf1=\EOP, kf10=\EOx, kf2=\EOQ, kf3=\EOR, kf4=\EOS, kf5=\EOt,
kf6=\EOu, kf7=\EOv, kf8=\EOl, kf9=\EOw, lf1=pf1, lf2=pf2,
lf3=pf3, lf4=pf4, mc0=\E[0i, mc4=\E[4i, mc5=\E[5i, rc=\E8,
rev=\E[7m$<2>, ri=\EM$<5>, rmacs=^O, rmam=\E[?7l,
rmkx=\E[?1l\E>, rmso=\E[m$<2>, rmul=\E[m$<2>,
rs2=\E<\E>\E[?3;4;5l\E[?7;8h\E[r, sc=\E7,
sgr=\E[0%?%p1%p6%|%t;1%;%?%p2%t;4%;%?%p1%p3%|%t;7%;%?%p4%t;5%;m%?%p9%t\016%e\017%;$<2>,
sgr0=\E[m\017$<2>, smacs=^N, smam=\E[?7h, smkx=\E[?1h\E=,
smso=\E[7m$<2>, smul=\E[4m$<2>, tbc=\E[3g,[k5054@localhost ~]$ TERM=dumb infocmp
Reconstructed via infocmp from file: /usr/share/terminfo/d/dumb
dumb|80-column dumb tty,
am,
cols#80,
bel=^G, cr=\r, cud1=\n, ind=\n,
[k5054@localhost ~]$In general you can set any environment variable this way, so you might do something like
LD_LIBRARY_PATH=/home/k5054/lib DEBUG=1 ./foo
Which would add LD_LIBRARY_PATH and DEBUG variables to the environment, but only for the duration of the given command. But on to your problem. Assuming you've managed to remove your control characters, what it looks like you want to do is to match any line that does not have an option to it. Based on what you have here, you could match on any line that does not contain either a '
[
' (i.e. a required argumetn) or a '<
' (i.e. an optional argument). So the regex for that would be[^<\[]
. Note we need to escThanks for prompt reply. Unfortunately I need to limit my reply... I had an eye surgery and having a heck of a time reading small font... and there is no easy way to set EVERYTHING to larger font... each app has it own setting... I should have thought about that BEFORE getting my eyeballs refurbish... Now if I use CAPS some people will get offended.... again... CHEERS