per subdomain robots.txt in Apache

When I develop a website, I tend to go subdomain crazy. If the site is crazy.net, I probably configure www.crazy.net, admin.crazy.net, static.crazy.net, youare.crazy.net, etc. Some allow concurrent logins as different users such as yourself, an admin account and maybe as a particular live user. Others are to keep your static/media resources cookie free, as well as allow parallel sockets for html/media content.

In my typical use case, all of these subdomains point to the same Apache instance. So you can go to a particular relative URL at any of the subdomains, and you will get the same page. But I certainly don't want Google to index all those subdomains; I want a single canonical domain for the site. I'm hardly an expert in Apache configuration, so it took me an hour to track down the solution to this problem.

Essentially, I wanted www.crazy.net to serve up a permissive robots.txt file, as well as a sitemap.

# robots.txt @ http://www.crazy.net/robots.txt
User-agent: *
Sitemap: http://www.crazy.net/sitemap.txt

However, all subdomains should return a different, restrictive robots.txt for the same URL.

# norobots.txt @ http://admin.crazy.net/robots.txt, http://static.crazy.net/robots.txt, etc
User-agent: *
Disallow: /

Here is a sample of my /etc/apache2/sites-available/default config file that made this happen.

ServerAdmin admin@example.com

ExpiresActive On
ExpiresByType text/css "access plus 12 years"
ExpiresByType application/javascript "access plus 12 years"
ExpiresByType image/png "access plus 12 years"
ExpiresByType image/gif "access plus 12 years"
ExpiresByType image/jpeg "access plus 12 years"
FileETag none

...

<VirtualHost *:80>
 alias /robots.txt /home/apache/website/norobots.txt
</VirtualHost>

<VirtualHost *:80>
 ServerName www.crazy.net
 alias /robots.txt /home/apache/website/robots.txt
 alias /sitemap.txt /home/apache/website/sitemap.txt
</VirtualHost>

Prior to this, all my directives were in a single VirtualHost. The key revelation on my part was that they don't need to be. Rather, Apache configs support inheritance from a global scope. So my VirtualHosts end up just being what's different between www.crazy.net and any other ServerName.



I'm currently working at NerdWallet, a startup in San Francisco trying to bring clarity to all of life's financial decisions. We're hiring like crazy. Hit me up on Twitter, I would love to talk.

Follow @chase_seibert on Twitter