Python/Django: disk based caching decorator
One of the great things about Django's caching framework is that you can cache complex objects. Say you have a list of dictionaries representing favorite movies from Netflix. You can jam that sucker right into the cache as is. No need to normalize it into a relational database schema, unless you actually need to deal with it relationaly.
The most popular Django cache back-end is memcached. If you do such caching a lot, you will eventually run into a limit on the maximum size of those objects, which is 1MB. Now, you could just up the configured limit. But you need to worry about potentially kicking other items out of the cache early as you increase the average cached object's size.
In the case of Netflix movies, maybe the speed of the cache isn't important. If you're just caching them to save an API hit, and you only need the data occasionally, maybe you can get away to caching to disk. This will keep the hit rate on your regular cache high.
Here is a quick a dirty decorator that takes any object that pickle can serialize, and writes it to a file. Subsequent calls to the decorator just bring backed the cached object for a configurable duration of time.
def cache_disk(seconds = 900, cache_folder="/tmp"): def doCache(f): def inner_function(*args, **kwargs): # calculate a cache key based on the decorated method signature key = sha1(str(f.__module__) + str(f.__name__) + str(args) + str(kwargs)).hexdigest() filepath = os.path.join(cache_folder, key) # verify that the cached object exists and is less than $seconds old if os.path.exists(filepath): modified = os.path.getmtime(filepath) age_seconds = time.time() - modified if age_seconds < seconds: return pickle.load(open(filepath, "rb")) # call the decorated function... result = f(*args, **kwargs) # ... and save the cached object for next time pickle.dump(result, open(filepath, "wb")) return result return inner_function return doCache
You can then wrap any function in this decorator to cache the results to disk.
@cache_disk(seconds = 900, cache_folder="/tmp"): def get_netflix_favorites(account_id): ... do somthing really expensive return { "account_id": account_id, "data": { ... more stuff here } }