Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Find sub-url

Find sub-url

Scheduled Pinned Locked Moved C / C++ / MFC
comtutorialquestion
3 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    jeff6
    wrote on last edited by
    #1

    Hi, I have an URL like http://my.lotro.com/home[^] Is it possible to find sub-urls like http://my.lotro.com/home/character/4368675/150026162587528896[^] ? (similar to FindFirstFile(), ...) (web spiders find them for example) Thanks.

    J P 2 Replies Last reply
    0
    • J jeff6

      Hi, I have an URL like http://my.lotro.com/home[^] Is it possible to find sub-urls like http://my.lotro.com/home/character/4368675/150026162587528896[^] ? (similar to FindFirstFile(), ...) (web spiders find them for example) Thanks.

      J Offline
      J Offline
      Jochen Arndt
      wrote on last edited by
      #2

      Web spiders request documents and parse them. During parsing, all links from the document are stored to be processed later. If you want to find sub-urls, you must do something similar. But this will not find URLs that are not linked. For some URLs without a file specification you may get a directory listing containing all files and sub-directories. But most servers will send you a default page (often index.html) when no file is specified or deny the request (listing of directories prohibited). UPDATE: You may use the GNU wget utility (also available for Windows) to perform such scanning. This command will download and parse all files and delete them afterwards while printing a line for each URL.

      wget -r -nd --delete-after http://my.lotro.com/home/

      1 Reply Last reply
      0
      • J jeff6

        Hi, I have an URL like http://my.lotro.com/home[^] Is it possible to find sub-urls like http://my.lotro.com/home/character/4368675/150026162587528896[^] ? (similar to FindFirstFile(), ...) (web spiders find them for example) Thanks.

        P Offline
        P Offline
        pasztorpisti
        wrote on last edited by
        #3

        There is no standard directory listing method in the HTTP protocol. Directory listing is usually disabled (security hole) for most if not the whole site and even if its enabled you get back an index file if present (like with http://my.lotro.com/home[^]). Even if the get on the directory returns a directory listing its still a non-standard generated html page that you have to parse somehow. Its a waste of time trying to solve this problem because this can not be solved. Web spiders just follow links found on websites, they do not do directory listings.

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups