Python/Django: disk based caching decorator

One of the great things about Django's caching framework is that you can cache complex objects. Say you have a list of dictionaries representing favorite movies from Netflix. You can jam that sucker right into the cache as is. No need to normalize it into a relational database schema, unless you actually need to deal with it relationaly.

The most popular Django cache back-end is memcached. If you do such caching a lot, you will eventually run into a limit on the maximum size of those objects, which is 1MB. Now, you could just up the configured limit. But you need to worry about potentially kicking other items out of the cache early as you increase the average cached object's size.

In the case of Netflix movies, maybe the speed of the cache isn't important. If you're just caching them to save an API hit, and you only need the data occasionally, maybe you can get away to caching to disk. This will keep the hit rate on your regular cache high.

Here is a quick a dirty decorator that takes any object that pickle can serialize, and writes it to a file. Subsequent calls to the decorator just bring backed the cached object for a configurable duration of time.

def cache_disk(seconds = 900, cache_folder="/tmp"):
    def doCache(f):
        def inner_function(*args, **kwargs):

            # calculate a cache key based on the decorated method signature
            key = sha1(str(f.__module__) + str(f.__name__) + str(args) + str(kwargs)).hexdigest()
            filepath = os.path.join(cache_folder, key)

            # verify that the cached object exists and is less than $seconds old
            if os.path.exists(filepath):
                modified = os.path.getmtime(filepath)
                age_seconds = time.time() - modified
                if age_seconds < seconds:
                    return pickle.load(open(filepath, "rb"))

            # call the decorated function...
            result = f(*args, **kwargs)

            # ... and save the cached object for next time
            pickle.dump(result, open(filepath, "wb"))

            return result
        return inner_function
    return doCache

You can then wrap any function in this decorator to cache the results to disk.

@cache_disk(seconds = 900, cache_folder="/tmp"):
def get_netflix_favorites(account_id):
   ... do somthing really expensive
   return {
      "account_id": account_id,
      "data": {
           ... more stuff here

I'm currently working at NerdWallet, a startup in San Francisco trying to bring clarity to all of life's financial decisions. We're hiring like crazy. Hit me up on Twitter, I would love to talk.

Follow @chase_seibert on Twitter