Rewrite

The Apache web server (and only that web server) has an incredibly powerful set of commands called "mod_rewrite". The purpose of these commands is to allow URLs to be modified based upon any of a number of conditions. The advantage of mod_rewrite is that it is extremely powerful. The disadvantage is these functions are very complex and touchy, and could, if misused, cause your web site to stop working altogether.

Rewrite actually consists of a series of commands in the .htaccess file. 

RewriteEngine On - Turn on the rewriting engine.

RewriteCond - The condition to be tested for.

RewriteRule - The value to set into the current URL if RewriteCond is true.

To put it simply, what these commands allow you to do is say "if this is true then change the current URL to a specific value".

Some common uses for "mod_rewrite" are described in the sections below.

Restrict Access to the .htaccess File

You may want to keep people from using their browsers to read your .htaccess file. This is important as it may reveal details of your configuration that may make your server or web site less secure.

RewriteRule ^\.htaccess$ - [F]

This translates to:

^ start of string to be checked
$ end of string to be checked
\.htaccess is the URL to be checked
F means return Forbidden error

Restrict Hostile Spiders

One very common use for mod_rewrite is the detection of which user agent (spider or browser) is accessing the website. Based upon the value for HTTP_USER_AGENT, a decision could be made to reject the agent, send it to a different page or simply allow it to continue unchanged.

As an example, there are quite a few hostile spiders written by highly unethical people to gather email addresses from web pages. These addresses are then spammed ruthlessly. The code below will stop some of these (at least those that identify themselves) from their hostile endeavors.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Crescent              [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker          [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerSE        [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerElite     [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector        [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon           [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf             [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro          [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archive            [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver           [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT         [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit       [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.*      [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO             [OR]
RewriteCond %{HTTP_USER_AGENT} ^Telesoft              [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage           [OR]
RewriteCond %{HTTP_USER_AGENT} ^TV33_Mercator         [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL
RewriteRule ^.*$ /lists/ [F,L]

Once I installed these exact lines in my own .htaccess file, the level of spamming sharply reduced. It is important to note that these lines will result in "Forbidden" errors which will appear in your log files.

Here is an example of a very complete .htaccess file (from a post on webmasterworld.com by "Superman").

RewriteEngine On 
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com 
[OR] 
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] 
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] 
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] 
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] 
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] 
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] 
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] 
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Zeus 
RewriteRule 
^.*$ http://www.site-where-you-want-to-send-the-bot [L,R] 
Internet Tips Contents
404 Errors Advertising Autoresponse Awardmaster Basics Browsers Careers Chatting Disasters Domains Email Emoticons Ezines Free Stuff Fun Stuff FTP Graphics Homepages HTML Reference HTML Tutorial Interactive Legal Links Msg Boards Microsoft Money Multimedia Networks Newsgroups Newsletter Products RFC's Ringmaster Searches Security Sticky Sites Surfing TANSTAAFL Telnet Viral Webmaster Your System