Robots in the Wiki

Posted on August 14th, 2006 by

We made some minor adjustments to our installation of MediaWiki to prevent robots such as Googlebot from indexing irrelevant pages like article edit pages and history pages.

Essentially, we prepended a “/w/” to all non-article pages and then used mod_rewrite to remove the /w/ so the pages still work normally. The robots.txt file then prohibits any nicely behaving robots from visiting pages that have /w/ before them.

Here is a snapshot of our MediaWiki configuration file:
$wgScriptPath = '/gts/';
$wgScript = $wgScriptPath . 'w/index.php';
$wgRedirectScript = "$wgScriptPath/redirect.php";
$wgArticlePath = $wgScriptPath . '$1';

Our .htaccess file:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /gts
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [S=40]
RewriteRule ^(.*?)/?$ index.php?title=$1 [L,QSA]
# Remove the /w/ from the edit links
RewriteRule ^/w/(.*)$ /$1 [L,QSA]
</IfModule>

And our robots.txt:
User-agent: *
Disallow: /gts/w/

 

Comments are closed.