Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Web Development
  3. I need a privileged web crawler

I need a privileged web crawler

Scheduled Pinned Locked Moved Web Development
csharpdatabasesecurityquestion
4 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G Offline
    G Offline
    Gregory Gadow
    wrote on last edited by
    #1

    I've been asked to write a web crawler that will scan our corporate website and report back on how the site is interlinked to itself. I've searched the web, and have not been able to find anything that meets my needs. What I am specifically looking for:

    • I want source code, not an app, in either C# or VB.net. Because of security issues, I need to be able to scan the code itself and make sure that there are no backdoors or data syphoning.
    • It needs to be able to run under user credentials, as most of the site is behind a login.

    It also needs to record what pages it checks, what that page links to and what pages link to it in a SQL database, but if I have the source I can easily add that functionality myself. So, any recommendation on where I can find such code?

    Kornfeld Eliyahu PeterK G 2 Replies Last reply
    0
    • G Gregory Gadow

      I've been asked to write a web crawler that will scan our corporate website and report back on how the site is interlinked to itself. I've searched the web, and have not been able to find anything that meets my needs. What I am specifically looking for:

      • I want source code, not an app, in either C# or VB.net. Because of security issues, I need to be able to scan the code itself and make sure that there are no backdoors or data syphoning.
      • It needs to be able to run under user credentials, as most of the site is behind a login.

      It also needs to record what pages it checks, what that page links to and what pages link to it in a SQL database, but if I have the source I can easily add that functionality myself. So, any recommendation on where I can find such code?

      Kornfeld Eliyahu PeterK Offline
      Kornfeld Eliyahu PeterK Offline
      Kornfeld Eliyahu Peter
      wrote on last edited by
      #2

      Try these: https://code.google.com/p/abot/[^] A Simple Crawler Using C# Sockets[^]

      I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

      "It never ceases to amaze me that a spacecraft launched in 1977 can be fixed remotely from Earth." ― Brian Cox

      G 1 Reply Last reply
      0
      • Kornfeld Eliyahu PeterK Kornfeld Eliyahu Peter

        Try these: https://code.google.com/p/abot/[^] A Simple Crawler Using C# Sockets[^]

        I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

        G Offline
        G Offline
        Gregory Gadow
        wrote on last edited by
        #3

        Thanks, I will!

        1 Reply Last reply
        0
        • G Gregory Gadow

          I've been asked to write a web crawler that will scan our corporate website and report back on how the site is interlinked to itself. I've searched the web, and have not been able to find anything that meets my needs. What I am specifically looking for:

          • I want source code, not an app, in either C# or VB.net. Because of security issues, I need to be able to scan the code itself and make sure that there are no backdoors or data syphoning.
          • It needs to be able to run under user credentials, as most of the site is behind a login.

          It also needs to record what pages it checks, what that page links to and what pages link to it in a SQL database, but if I have the source I can easily add that functionality myself. So, any recommendation on where I can find such code?

          G Offline
          G Offline
          Garth J Lancaster
          wrote on last edited by
          #4

          you could start here http://www.heatonresearch.com/articles/series/20[^] - they have a download for their book 'HTTP Programming Recipes for C# Bots' which is pretty cheap and shows you how to build a web spider/crawler - look at the download here for 'HTTP Recipes for Programming C# Bots' http://www.heatonresearch.com/download[^] [Edit] heres a more direct link to the ebook for purchase, although the content is online (not sure if all) http://www.heatonresearch.com/book/cat/2[^] [/edit] 'g'

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups