Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Regular Expressions
  4. Regex select from list

Regex select from list

Scheduled Pinned Locked Moved Regular Expressions
helptutorialregexquestion
2 Posts 2 Posters 7 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    Jukec
    wrote on last edited by
    #1

    Hello, I am trying to learn Regex on my own, but got stuck with select from an unordered list. At this moment i managed to list the cattegories, but i cant figure how to 1) capture the node ID for each cattegory (for example 560884 for the first one) 2) how to define ">" not to be listed as cattegory. Here is my code:

    select div#wayfinding-breadcrumbs_feature_div li >>> category_tree {
    select span.a-list-item >> category_name;
    select div#wayfinding-breadcrumbs_container .a-link-normal >> attr(href) >> capture "[node=\\d+]" >> node_id;
    }

    Here is part of the output:

    "category_tree": [{
    "category_name": "Portable Sound & Video",
    "node_id": null
    }, {
    "category_name": "›",
    "node_id": null
    }, {
    "category_name": "Accessories",
    "node_id": null

    Here is source code:

         *               [Portable Sound & Video](/mp3-ipod-headphones-DAB-radio/b/ref=dp_bc_aui_C_1?ie=UTF8&node=560884)
                
             
         *                ›
             
         *               [Accessories](/Accessories-Portable-Sound-Vision-Tapes/b/ref=dp_bc_aui_C_2?ie=UTF8&node=560910)
                
             
         *                ›
             
         *                [Portable Speakers & Docks](/b/ref=dp_bc_aui_C_3?ie=UTF8&node=16700222031)
    

    Thank you for your help

    J 1 Reply Last reply
    0
    • J Jukec

      Hello, I am trying to learn Regex on my own, but got stuck with select from an unordered list. At this moment i managed to list the cattegories, but i cant figure how to 1) capture the node ID for each cattegory (for example 560884 for the first one) 2) how to define ">" not to be listed as cattegory. Here is my code:

      select div#wayfinding-breadcrumbs_feature_div li >>> category_tree {
      select span.a-list-item >> category_name;
      select div#wayfinding-breadcrumbs_container .a-link-normal >> attr(href) >> capture "[node=\\d+]" >> node_id;
      }

      Here is part of the output:

      "category_tree": [{
      "category_name": "Portable Sound & Video",
      "node_id": null
      }, {
      "category_name": "›",
      "node_id": null
      }, {
      "category_name": "Accessories",
      "node_id": null

      Here is source code:

           *               [Portable Sound & Video](/mp3-ipod-headphones-DAB-radio/b/ref=dp_bc_aui_C_1?ie=UTF8&node=560884)
                  
               
           *                ›
               
           *               [Accessories](/Accessories-Portable-Sound-Vision-Tapes/b/ref=dp_bc_aui_C_2?ie=UTF8&node=560910)
                  
               
           *                ›
               
           *                [Portable Speakers & Docks](/b/ref=dp_bc_aui_C_3?ie=UTF8&node=16700222031)
      

      Thank you for your help

      J Offline
      J Offline
      jschell
      wrote on last edited by
      #2

      Jukec wrote:

      "[node=\\d+]"

      No idea what language that is in. But all of the major ones use the same regex semantics for the most part. The square brackets should not be there. Presumably the rest of the code is actually going to 'capture' what is matched. That is a specific term for regex. If so it will look like 'node=16700222031' which means you would need to parse it again to get the number out.

      1 Reply Last reply
      0
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Don't have an account? Register

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups