Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Web Development
  3. .htaccess bot protection rule

.htaccess bot protection rule

Scheduled Pinned Locked Moved Web Development
phpapachedatabaseagentic-aiquestion
1 Posts 1 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G Offline
    G Offline
    GregStevens
    wrote on last edited by
    #1

    I've been trying to create a group of rules in .htaccess that will deny bots access to specific groups of pages in my site. Specifically, my site is a Wiki (running MediaWiki), and I would like to prevent bots from accessing any pages in the "User", "Talk" or "Special" namespaces, while allowing them to spider other pages on the site. Below is my attempt... but it's not working. The basic approach I'm trying to use is this: 1. set an environment variable that identifies if the REQUEST_URI's that I want to exclude 2. set an environment variable that detects if the user agent is a bot 3. If both of the above environment variables are set, deny the page. Can anyone give me a tip as to why the code below is NOT doing what I describe above?

    RewriteEngine on

    Identify non-bot pages with environment variable

    RewriteCond %{REQUEST_URI} ^/reference/index.php?title=User:.* [OR]
    RewriteCond %{REQUEST_URI} ^/reference/index.php?title=Talk:.* [OR]
    RewriteCond %{REQUEST_URI} ^/reference/index.php?title=Special:.*
    RewriteRule ^.* - [E=PAGE_NO_BOT:1]

    Identify bot user agents with environment variable

    RewriteCond %{HTTP_USER_AGENT} ^.*Googlebot.* [OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*robot.* [OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Slurp.* [OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Scooter.*
    RewriteRule ^.* - [E=CLIENT_IS_BOT:1]

    If it is a bot AND it is looking at a non-bot page, deny

    RewriteCond %{ENV:PAGE_NO_BOT} ^1$
    RewriteCond %{ENV:CLIENT_IS_BOT} ^1$
    RewriteRule ^.* - [F,L]

    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Recent
    • Tags
    • Popular
    • World
    • Users
    • Groups