Nid help in getting list of URLS in a webpage/website
-
Hi, is it possible for me to code a program in c++ to gather all URLs contained in a website/webpage? or if not does anyone have a code to find URLs in a text file? Does anyone have a sample code or a link that may give me information about this? Your help will be appreciated. Thanks in advance :-D
-
Hi, is it possible for me to code a program in c++ to gather all URLs contained in a website/webpage? or if not does anyone have a code to find URLs in a text file? Does anyone have a sample code or a link that may give me information about this? Your help will be appreciated. Thanks in advance :-D
Use the following regular expression:
"<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\""
I've taken it from an example of boost regex library. -
Use the following regular expression:
"<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\""
I've taken it from an example of boost regex library. -
If you haven't already done so, download the boost library. The regex library has an example ("regex_split_example_2.cpp"), that scans a file and returns a list of the URLs.