per subdomain robots.txt in Apache

When I develop a website, I tend to go subdomain crazy. If the site is, I probably configure,,,, etc. Some allow concurrent logins as different users such as yourself, an admin account and maybe as a particular live user. Others are to keep your static/media resources cookie free, as well as allow parallel sockets for html/media content.

In my typical use case, all of these subdomains point to the same Apache instance. So you can go to a particular relative URL at any of the subdomains, and you will get the same page. But I certainly don't want Google to index all those subdomains; I want a single canonical domain for the site. I'm hardly an expert in Apache configuration, so it took me an hour to track down the solution to this problem.

Essentially, I wanted to serve up a permissive robots.txt file, as well as a sitemap.

# robots.txt @
User-agent: *

However, all subdomains should return a different, restrictive robots.txt for the same URL.

# norobots.txt @,, etc
User-agent: *
Disallow: /

Here is a sample of my /etc/apache2/sites-available/default config file that made this happen.


ExpiresActive On
ExpiresByType text/css "access plus 12 years"
ExpiresByType application/javascript "access plus 12 years"
ExpiresByType image/png "access plus 12 years"
ExpiresByType image/gif "access plus 12 years"
ExpiresByType image/jpeg "access plus 12 years"
FileETag none


<VirtualHost *:80>
 alias /robots.txt /home/apache/website/norobots.txt

<VirtualHost *:80>
 alias /robots.txt /home/apache/website/robots.txt
 alias /sitemap.txt /home/apache/website/sitemap.txt

Prior to this, all my directives were in a single VirtualHost. The key revelation on my part was that they don't need to be. Rather, Apache configs support inheritance from a global scope. So my VirtualHosts end up just being what's different between and any other ServerName.