Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
CODE PROJECT For Those Who Code
  • Home
  • Articles
  • FAQ
Community
  1. Home
  2. The Lounge
  3. The saga continues

The saga continues

Scheduled Pinned Locked Moved The Lounge
csharpcomsysadminbeta-testingperformance
13 Posts 6 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D David Cunningham

    So, I thought I'd give you an update on the CodeProject performance project. After dealing with the cached ASP page backlog bug, and the errant .Net Beta 2 ADO bug, and the Windows Load Balancing conflicting with the network redirector bug :| we now have a series of web-spiders that mine the site multiple times a day and swamp it. Next step is to write an ISAPI filter to combat this. In a weird way this is pretty cool, although I think everyone's patience is running thin and we'd just like to get to the point where CP just rocks. David http://www.dundas.com

    K Offline
    K Offline
    Konstantin Vasserman
    wrote on last edited by
    #2

    Isn't there some kind of robots.txt file you can drop in your virtual root to instruct spiders not to parse your site? ;)

    C 1 Reply Last reply
    0
    • D David Cunningham

      So, I thought I'd give you an update on the CodeProject performance project. After dealing with the cached ASP page backlog bug, and the errant .Net Beta 2 ADO bug, and the Windows Load Balancing conflicting with the network redirector bug :| we now have a series of web-spiders that mine the site multiple times a day and swamp it. Next step is to write an ISAPI filter to combat this. In a weird way this is pretty cool, although I think everyone's patience is running thin and we'd just like to get to the point where CP just rocks. David http://www.dundas.com

      K Offline
      K Offline
      Konstantin Vasserman
      wrote on last edited by
      #3

      Another anti-spider idea would be to block spider IPs on your firewall. I believe that most well known spiders operate from the static published IP addresses. It just seems to be an overkill to write ISAPI filter just to stop traffic from a few well known places. I am sorry if you guys have already thought about all of it. I am just trying to help... Thank you for all your work.

      B 1 Reply Last reply
      0
      • K Konstantin Vasserman

        Another anti-spider idea would be to block spider IPs on your firewall. I believe that most well known spiders operate from the static published IP addresses. It just seems to be an overkill to write ISAPI filter just to stop traffic from a few well known places. I am sorry if you guys have already thought about all of it. I am just trying to help... Thank you for all your work.

        B Offline
        B Offline
        Brad Bruce
        wrote on last edited by
        #4

        I could run a spider from my Dial-Up account. You can't just filter addresses! The only real pattern to determine if a spider is loose, is to see how many hits you're getting from a single IP address in a certain amount of time. If you get 500 hits/minute... you have a spider...

        K D 2 Replies Last reply
        0
        • B Brad Bruce

          I could run a spider from my Dial-Up account. You can't just filter addresses! The only real pattern to determine if a spider is loose, is to see how many hits you're getting from a single IP address in a certain amount of time. If you get 500 hits/minute... you have a spider...

          K Offline
          K Offline
          Konstantin Vasserman
          wrote on last edited by
          #5

          If you run a spider from your dial-up account your bandwidth and therefore traffic to this site will be very insignificant. Not enough to cause a lot of problems. Of cause, there are people with DSL and cable out there, but do you know anyone who runs spiders on their home computers? Maybe it's just me, but I've never heard of people operating spiders from home...

          B 1 Reply Last reply
          0
          • K Konstantin Vasserman

            If you run a spider from your dial-up account your bandwidth and therefore traffic to this site will be very insignificant. Not enough to cause a lot of problems. Of cause, there are people with DSL and cable out there, but do you know anyone who runs spiders on their home computers? Maybe it's just me, but I've never heard of people operating spiders from home...

            B Offline
            B Offline
            Brad Bruce
            wrote on last edited by
            #6

            When Codeguru started, my company had a very slow connection. I would run a spider and archive the entire site to my hard drive (overnight), and reference everything offline. Zafir even had zipped archives of the site for a while because so many people were running spiders on his site.

            K 1 Reply Last reply
            0
            • K Konstantin Vasserman

              Isn't there some kind of robots.txt file you can drop in your virtual root to instruct spiders not to parse your site? ;)

              C Offline
              C Offline
              ColinDavies
              wrote on last edited by
              #7

              Konstantin Vasserman wrote: Isn't there some kind of robots.txt file you can drop in your virtual root to instruct spiders not to parse your site? The robots.txt is only accessed by nice spiders ! Those are the one's that you want "google" altavista etc. What I guess is some spiders are monitoring CP, for whatever purpose they have. Regardz Colin J Davies

              Sonork ID 100.9197:Colin

              I live in Bob's HungOut now

              Click here for free technical assistance!

              1 Reply Last reply
              0
              • B Brad Bruce

                When Codeguru started, my company had a very slow connection. I would run a spider and archive the entire site to my hard drive (overnight), and reference everything offline. Zafir even had zipped archives of the site for a while because so many people were running spiders on his site.

                K Offline
                K Offline
                Konstantin Vasserman
                wrote on last edited by
                #8

                OK. Another example of what you are talking about is Microsoft ISA Server. It comes with a feature that can pre-fetch web sites in advance (overnight). It actually reduces the number of hits to this site due to caching. So CP shouldn't block "spiders" like this anyway. Again, I might be totally off the target here. I guess, we need to know what kind of spiders David is actually talking about. If they are seeing a lot of traffic from the search engine kind of spiders - my idea will work. If they are talking about some other spiders then you are right and address blocking might not help.

                1 Reply Last reply
                0
                • B Brad Bruce

                  I could run a spider from my Dial-Up account. You can't just filter addresses! The only real pattern to determine if a spider is loose, is to see how many hits you're getting from a single IP address in a certain amount of time. If you get 500 hits/minute... you have a spider...

                  D Offline
                  D Offline
                  David Cunningham
                  wrote on last edited by
                  #9

                  Brad, This is pretty much the conclusion we've come to. We need to be able to dynamically limit the number of hits/minute that are allowed per IP address so that regular viewers don't see any change, but spiders are limited to a reasonable pace. Colin's point is also important. You don't want to cut off valuable spidering from search engines. David http://www.dundas.com

                  K 1 Reply Last reply
                  0
                  • D David Cunningham

                    Brad, This is pretty much the conclusion we've come to. We need to be able to dynamically limit the number of hits/minute that are allowed per IP address so that regular viewers don't see any change, but spiders are limited to a reasonable pace. Colin's point is also important. You don't want to cut off valuable spidering from search engines. David http://www.dundas.com

                    K Offline
                    K Offline
                    Konstantin Vasserman
                    wrote on last edited by
                    #10

                    Brad, David and Colin I apologize for the confusion. You guys were correct. My bad.

                    D 1 Reply Last reply
                    0
                    • K Konstantin Vasserman

                      Brad, David and Colin I apologize for the confusion. You guys were correct. My bad.

                      D Offline
                      D Offline
                      David Cunningham
                      wrote on last edited by
                      #11

                      Konstantin, Hey man, no apologies are necessary. We're ecstatic that everyone cares so much about making CP successful, so bring on the good ideas! David http://www.dundas.com

                      1 Reply Last reply
                      0
                      • D David Cunningham

                        So, I thought I'd give you an update on the CodeProject performance project. After dealing with the cached ASP page backlog bug, and the errant .Net Beta 2 ADO bug, and the Windows Load Balancing conflicting with the network redirector bug :| we now have a series of web-spiders that mine the site multiple times a day and swamp it. Next step is to write an ISAPI filter to combat this. In a weird way this is pretty cool, although I think everyone's patience is running thin and we'd just like to get to the point where CP just rocks. David http://www.dundas.com

                        S Offline
                        S Offline
                        seanix99
                        wrote on last edited by
                        #12

                        Just a simple question, why spiders are slowing CP more than any others sites? If spiders are really the problem, CP will not the only site to have problem. Can we have average of hit/min for a day? Because, i remember one day hey at the job, we have made a simple test program to test our new WEB server(PIII-1000 Mhz), we just want to know it can serve 10000hits/min. We have used five computer that shoot 2000 hits/min to the server. The hit have been made on 50 differents pages. It sure that our site is a simple html site, very basic text with images. Our server running the test successfully. I just tell you this, because i'm thinking that the CP site is too big or too fat. We all like CP site, but i think that all the style loungde every where is slowing CP. I'm curious to saw, if you desactivate all the loundge, and profile for the CP site if we could make faster. Just another question, do cache banner? If i'm out of the road with my questions, just tell me, i'm not an expert, and i never make a web site. OK, now enought, that was just my opinion. I'm sure you will finish to make CP rocks. For that moment, just add a good classic rock music when we load the home page. So for people that are not enought patient, you will have just to listen the music. Sylvain P.S. If you don't like my written english, imagine when i try to speak in english....

                        C 1 Reply Last reply
                        0
                        • S seanix99

                          Just a simple question, why spiders are slowing CP more than any others sites? If spiders are really the problem, CP will not the only site to have problem. Can we have average of hit/min for a day? Because, i remember one day hey at the job, we have made a simple test program to test our new WEB server(PIII-1000 Mhz), we just want to know it can serve 10000hits/min. We have used five computer that shoot 2000 hits/min to the server. The hit have been made on 50 differents pages. It sure that our site is a simple html site, very basic text with images. Our server running the test successfully. I just tell you this, because i'm thinking that the CP site is too big or too fat. We all like CP site, but i think that all the style loungde every where is slowing CP. I'm curious to saw, if you desactivate all the loundge, and profile for the CP site if we could make faster. Just another question, do cache banner? If i'm out of the road with my questions, just tell me, i'm not an expert, and i never make a web site. OK, now enought, that was just my opinion. I'm sure you will finish to make CP rocks. For that moment, just add a good classic rock music when we load the home page. So for people that are not enought patient, you will have just to listen the music. Sylvain P.S. If you don't like my written english, imagine when i try to speak in english....

                          C Offline
                          C Offline
                          Chris Maunder
                          wrote on last edited by
                          #13

                          There is always the battle between performance and features. We've traditionally erred on the side of having a more functional site at the expense of lightning fast response. With the move to a more scalable hardware setup we are now in the position of being able to take the easy route of throwing new hardware at the site when traffic increases demand it (and, of course, funds allow it ;)) Part of the problem is that we've been running a site that is extremely database intensive and gets nearly 9 million page views a month off a single web server. I've turned off the discussion boards, ads, rating systems etc and the speed does go up nicely. Even so, with a web farm and SQL cluster we'll reduce request execution times from under a second to a mere fraction of a second. This won't necessarily make the site any faster than it is today, but will allow more visitors to hit the site and will lower the chances of slow-downs caused by traffic overloads. A lot of stuff is cached, and we can always do more caching - but is it worth spending a week every so often rewriting the internals to get an extra 10%-15% performance overall when we can get an instant 30% increase by throwing in another $US500 box? In any case it's always a race of time and resources against ever increasing visitor numbers. Soon we'll start maxing out the SQL backend, then we'll need another web server, then a faster file server, then an even fatter connection, and then someone will invent the 42hr day and everything will be perfect ;) cheers, Chris Maunder

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups