I need a privileged web crawler
-
I've been asked to write a web crawler that will scan our corporate website and report back on how the site is interlinked to itself. I've searched the web, and have not been able to find anything that meets my needs. What I am specifically looking for:
- I want source code, not an app, in either C# or VB.net. Because of security issues, I need to be able to scan the code itself and make sure that there are no backdoors or data syphoning.
- It needs to be able to run under user credentials, as most of the site is behind a login.
It also needs to record what pages it checks, what that page links to and what pages link to it in a SQL database, but if I have the source I can easily add that functionality myself. So, any recommendation on where I can find such code?
-
I've been asked to write a web crawler that will scan our corporate website and report back on how the site is interlinked to itself. I've searched the web, and have not been able to find anything that meets my needs. What I am specifically looking for:
- I want source code, not an app, in either C# or VB.net. Because of security issues, I need to be able to scan the code itself and make sure that there are no backdoors or data syphoning.
- It needs to be able to run under user credentials, as most of the site is behind a login.
It also needs to record what pages it checks, what that page links to and what pages link to it in a SQL database, but if I have the source I can easily add that functionality myself. So, any recommendation on where I can find such code?
Try these: https://code.google.com/p/abot/[^] A Simple Crawler Using C# Sockets[^]
I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)
-
Try these: https://code.google.com/p/abot/[^] A Simple Crawler Using C# Sockets[^]
I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)
Thanks, I will!
-
I've been asked to write a web crawler that will scan our corporate website and report back on how the site is interlinked to itself. I've searched the web, and have not been able to find anything that meets my needs. What I am specifically looking for:
- I want source code, not an app, in either C# or VB.net. Because of security issues, I need to be able to scan the code itself and make sure that there are no backdoors or data syphoning.
- It needs to be able to run under user credentials, as most of the site is behind a login.
It also needs to record what pages it checks, what that page links to and what pages link to it in a SQL database, but if I have the source I can easily add that functionality myself. So, any recommendation on where I can find such code?
you could start here http://www.heatonresearch.com/articles/series/20[^] - they have a download for their book 'HTTP Programming Recipes for C# Bots' which is pretty cheap and shows you how to build a web spider/crawler - look at the download here for 'HTTP Recipes for Programming C# Bots' http://www.heatonresearch.com/download[^] [Edit] heres a more direct link to the ebook for purchase, although the content is online (not sure if all) http://www.heatonresearch.com/book/cat/2[^] [/edit] 'g'