Varnish caching for unauthenticated Django views

Varnish is an HTTP accelerator designed for content-heavy dynamic web sites. In contrast to other HTTP accelerators, such as Squid, which began life as a client-side cache, or Apache, which is primarily an origin server, Varnish was designed from the ground up as an HTTP accelerator. -- Wikipedia

Varnish is pretty awesome. Typical usage would be to stick a caching layer in front of a slow CMS like Drupal, and tell it cache all page requests for an hour. You can serve thousands of hits to the homepage over the course of that hour for just one back-end call. Not only are you taking a lot of load off the backend, but the actual responses will be lightning quick. By default, Varnish doesn't cache pages with cookies set, though.

A lot of people running Varnish are not aware of one simple fact. If the client sends you a cookie Varnish will not issue a cached page but instead fetch a new page from the backend servers. Varnish takes a cookie as a good indication that the content is specially adapted and will, with the default configuration, not cache it. So, having a cookie that is not handled correctly will completely kill your performance. -- Varnish Blog

This makes complete sense. Most web frameworks use cookie based sessions. If you didn't look at the cookies to determine whether to cache, you would be sticking the private dynamic content for the first user to come along into the cache, and then serving it up for other users. If Facebook did that, you would login and see another users feed.

To use Varnish with Django, you just have to be careful about when you're using the request.session scope. If you put anything in there, even if the user is not logged in, the user will get a sessionid cookie, and Varnish will stop caching. Similarly, if you show the user a form with CSRF protection enabled (which it is by default), they will get a csrftoken cookie.

In general, those are the only two cookie Django will set, and either of those constitutes a page that should not be cached. In addition, you may have other cookies, such as Google Analytics. One solution is to strip these out in vcl_recv(), before they get passed to the back-end. Note: this does not mean that the client will no longer have access to them. As far as the client is concerned, the cookie is still set on your site. You're just not passing to to the webservers.

sub vcl_recv {

  # unless sessionid/csrftoken is in the request, don't pass ANY cookies (referral_source, utm, etc)
  if (req.request == "GET" && (req.url ~ "^/static" || (req.http.cookie !~ "sessionid" && req.http.cookie !~ "csrftoken"))) {
    remove req.http.Cookie;
  }

  # normalize accept-encoding to account for different browsers
  # see: https://www.varnish-cache.org/trac/wiki/VCLExampleNormalizeAcceptEncoding
  if (req.http.Accept-Encoding) {
    if (req.http.Accept-Encoding ~ "gzip") {
      set req.http.Accept-Encoding = "gzip";
    } elsif (req.http.Accept-Encoding ~ "deflate") {
      set req.http.Accept-Encoding = "deflate";
    } else {
      # unkown algorithm
      remove req.http.Accept-Encoding;
    }
  }

}

sub vcl_fetch {

  # static files always cached
  if (req.url ~ "^/static") {
       unset beresp.http.set-cookie;
       return (deliver);
  }

  # pass through for anything with a session/csrftoken set
  if (beresp.http.set-cookie ~ "sessionid" || beresp.http.set-cookie ~ "csrftoken") {
    return (pass);
  } else {
    return (deliver);
  }

}

This config is also accounting for differences in Accept-Encoding between various browsers and versions. Otherwise, you will end up with one copy of each page per unique Accept-Encoding value. Also, I'm making sure to always cache anything under /static, which is where all my media files are located.



I'm currently working at NerdWallet, a startup in San Francisco trying to bring clarity to all of life's financial decisions. We're hiring like crazy. Hit me up on Twitter, I would love to talk.

Follow @chase_seibert on Twitter