Per-request Query Caching in Django
The Django ORM is a wonderful thing. It makes it so easy to access the database, that sometimes you forget that it’s even happening. That is, until you open django-debug-toolbar and see that you’re suddenly running hundreds of queries! Not only that, but looking at the actual queries, many of them are duplicates! You think “Where did all these queries come from? Stupid coworkers, not writing efficient code!” Then you inevitably realize that half of the extra queries were ones you wrote yourself. How does this happen?
It’s all too easy. Maybe you have a User
object with a helper method on it that performs a join to get their recent activity. You’re passing user
instances around in many of your method calls. So as not to assume a wider contract than necessary with the caller, utility methods all over the place are calling this helper method. Your code is nice and tight; you’re not repeating yourself anywhere, but some page requests are calling this function from various places in the stack half a dozen times!
Why is this a big deal? After the first query, the database will probably have a nice warm version in its cache. What you will likely see in the debug toolbar is that many of your duplicate queries will return in less than 2 milliseconds. However, any latency to the database server can still kill you. Plus, even tiny queries are still causing some contention and load on the database.
There are various existing solutions for query caching in Django. In general, they all require that you expire cache results manually if you have edge cases like writing data to your database. In other words, they are likely to introduce bugs.
What I have come up with is a monkey-patch for some Django internals to cache the results of individual SQL statements, but only inside the life cycle of a single request. This will take zero load off of your database if you have perfect code. For mere mortals, it could likely reduce your database calls significantly.
You start by adding a piece of middlware:
Then, you have to enable that middleware in settings.py
:
Finally, here is the query_cache
patch itself.
What’s going on here is that I’m replacing the Django internal execute_sql
method with a wrapper that caches results in a thread local dictionary. It only caches small result sets. For any result more that 100 rows, Django will fire up a database cursor and a generator. Caching those without eagerly querying for the entire dataset would be quite tricky, so I bail in that case. I have noticed that in my codebase, the majority of repeated calls are for a single record, or a small set of records.
So as not to have to deal with any tricky invalidation cases, I simply delete the cache if any UPDATE, INSERT or DELETE statement is run.
Of course, this will not work if you have long running page requests that purposely make the same request over and over, waiting for a particular result.