Click to return to home page
Richard Lowe Jr Home

Rewrite

The Apache web server (and only that web server) has an incredibly powerful set of commands called "mod_rewrite". The purpose of these commands is to allow URLs to be modified based upon any of a number of conditions. The advantage of mod_rewrite is that it is extremely powerful. The disadvantage is these functions are very complex and touchy, and could, if misused, cause your web site to stop working altogether.

Rewrite actually consists of a series of commands in the .htaccess file. 

RewriteEngine On - Turn on the rewriting engine.

RewriteCond - The condition to be tested for.

RewriteRule - The value to set into the current URL if RewriteCond is true.

To put it simply, what these commands allow you to do is say "if this is true then change the current URL to a specific value".

Some common uses for "mod_rewrite" are described in the sections below.

Restrict Access to the .htaccess File

You may want to keep people from using their browsers to read your .htaccess file. This is important as it may reveal details of your configuration that may make your server or web site less secure.

RewriteRule ^\.htaccess$ - [F]

This translates to:

^ start of string to be checked
$ end of string to be checked
\.htaccess is the URL to be checked
F means return Forbidden error

Restrict Hostile Spiders

One very common use for mod_rewrite is the detection of which user agent (spider or browser) is accessing the website. Based upon the value for HTTP_USER_AGENT, a decision could be made to reject the agent, send it to a different page or simply allow it to continue unchanged.

As an example, there are quite a few hostile spiders written by highly unethical people to gather email addresses from web pages. These addresses are then spammed ruthlessly. The code below will stop some of these (at least those that identify themselves) from their hostile endeavors.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Crescent              [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker          [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerSE        [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerElite     [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector        [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon           [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf             [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro          [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archive            [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver           [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT         [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit       [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.*      [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO             [OR]
RewriteCond %{HTTP_USER_AGENT} ^Telesoft              [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage           [OR]
RewriteCond %{HTTP_USER_AGENT} ^TV33_Mercator         [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL
RewriteRule ^.*$ /lists/ [F,L]

Once I installed these exact lines in my own .htaccess file, the level of spamming sharply reduced. It is important to note that these lines will result in "Forbidden" errors which will appear in your log files.

Here is an example of a very complete .htaccess file (from a post on webmasterworld.com by "Superman").

RewriteEngine On 
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com 
[OR] 
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] 
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] 
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] 
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] 
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR] 
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR] 
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] 
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Zeus 
RewriteRule 
^.*$ http://www.site-where-you-want-to-send-the-bot [L,R] 

Additional Reading

  • Htaccess file If you have direct access to the .htaccess file, then you are in luck. You can do some very cool things.
  • Htaccess file - Protecting directories The most common usages of .htaccess is to restrict access to all of the files within a directory.
  • Htaccess file - MIME types You can use .htaccess to define your own MIME types. What this means is you can associated file types with actions to be performed.
  • Htaccess file - Custom error pages You can use htaccess to define custom error pages to trap 404 and other error conditions.
  • Htaccess file - Redirect You can redirect visitors to other pages using the redirect function of htaccess.
  • Htaccess file - Deny Deny is especially useful to block out unwanted spiders and malicious visitors.
  • Htaccess file - Rewrite Rewrite is a complex, but useful feature which can help you stop spam harvesters.
  • Htaccess file - Redirect Worms Away Is your site running on an Apache server and is your log file being filled with useless errors from the recent worm penetrations? Is your bandwidth being used to no good cause? Here's a solution which might help.

Unless otherwise noted, all photos and text is Copyright © Richard G Lowe, Jr.