Enabling SOLR autocommit with a custom Haystack backend
02 Jul 2014
By default Django Haystack makes updates to your Solr index available for
searching immediately. It does this in the simplest way possible, it commits every single update individually.
That can be quite slow. I have an index with 35 million records, and under heavy write load commits of 1,000
records can slow down and take up to 5 seconds for each chunk. In extreme cases, Solr can refuse to accept
that much write load at once, and throw an exception like the following:
You can see the basic issue by looking at the logs that Haystack creates each time it issues a write request to the
Solr REST API:
As of Solr 4.0, we have much more performant options for bulk indexing. A common setup
is to use autocommit (set by default to 15 seconds) and abstain from manually committing by passing commit=false on
the REST API URL. Though Haystack supports passing a commit boolean to the various back-end implementations of update,
remove and clear, this parameter is never explicitly set. Instead, you can implement your own
search back-end subclass to pass this value.
Then you can use this new AutoCommitSolrEngine in your HAYSTACK_CONNECTIONS setting.
Note: By default, indexed items will not show up in searches right away. That's what soft-commit is for.
To make your auto-committed items available to search in a timely fashion, you must set a autoSoftCommit.maxTime
in your Solr config. This is NOT set by default.
Alternately, you can set autoCommit.openSearcher to true, which will cause a new searcher worker to be instantiated
every time you auto-commit. This could slow down the first searches that come in after an auto commit, however.
I'm currently working at NerdWallet, a startup in San Francisco trying to bring clarity to all of life's financial decisions. We're hiring like crazy. Hit me up on Twitter, I would love to talk.