Regex select from list
-
Hello, I am trying to learn Regex on my own, but got stuck with select from an unordered list. At this moment i managed to list the cattegories, but i cant figure how to 1) capture the node ID for each cattegory (for example 560884 for the first one) 2) how to define ">" not to be listed as cattegory. Here is my code:
select div#wayfinding-breadcrumbs_feature_div li >>> category_tree {
select span.a-list-item >> category_name;
select div#wayfinding-breadcrumbs_container .a-link-normal >> attr(href) >> capture "[node=\\d+]" >> node_id;
}Here is part of the output:
"category_tree": [{
"category_name": "Portable Sound & Video",
"node_id": null
}, {
"category_name": "›",
"node_id": null
}, {
"category_name": "Accessories",
"node_id": nullHere is source code:
* [Portable Sound & Video](/mp3-ipod-headphones-DAB-radio/b/ref=dp_bc_aui_C_1?ie=UTF8&node=560884) * › * [Accessories](/Accessories-Portable-Sound-Vision-Tapes/b/ref=dp_bc_aui_C_2?ie=UTF8&node=560910) * › * [Portable Speakers & Docks](/b/ref=dp_bc_aui_C_3?ie=UTF8&node=16700222031)
Thank you for your help
-
Hello, I am trying to learn Regex on my own, but got stuck with select from an unordered list. At this moment i managed to list the cattegories, but i cant figure how to 1) capture the node ID for each cattegory (for example 560884 for the first one) 2) how to define ">" not to be listed as cattegory. Here is my code:
select div#wayfinding-breadcrumbs_feature_div li >>> category_tree {
select span.a-list-item >> category_name;
select div#wayfinding-breadcrumbs_container .a-link-normal >> attr(href) >> capture "[node=\\d+]" >> node_id;
}Here is part of the output:
"category_tree": [{
"category_name": "Portable Sound & Video",
"node_id": null
}, {
"category_name": "›",
"node_id": null
}, {
"category_name": "Accessories",
"node_id": nullHere is source code:
* [Portable Sound & Video](/mp3-ipod-headphones-DAB-radio/b/ref=dp_bc_aui_C_1?ie=UTF8&node=560884) * › * [Accessories](/Accessories-Portable-Sound-Vision-Tapes/b/ref=dp_bc_aui_C_2?ie=UTF8&node=560910) * › * [Portable Speakers & Docks](/b/ref=dp_bc_aui_C_3?ie=UTF8&node=16700222031)
Thank you for your help
Jukec wrote:
"[node=\\d+]"
No idea what language that is in. But all of the major ones use the same regex semantics for the most part. The square brackets should not be there. Presumably the rest of the code is actually going to 'capture' what is matched. That is a specific term for regex. If so it will look like 'node=16700222031' which means you would need to parse it again to get the number out.