Hibernate Search - Shard Query Optimization

Hibernate Search shards allow you to break down your index data into separate Lucene directories. Typically, indexes would be broken down either into N equals chunks (using a hashing algorithm), or by some logical criteria (customer, location, etc). The former was done for performance; smaller indexes mean faster indexing. The later was typically done to make customers feel better.

In my current project, we have another reason. We're breaking down indexes by customer, but for purely technical reasons. Separate indexes are more robust; if you have a fatal corruption of an index, now you only have to re-index a fraction of your data. The other is speed. Since there is no reason for searches to be cross-customer, why not take advantage of smaller indexes for query performance?

Unfortunately, Hibernate Search defaults to searching ALL the shards, and then merging the result sets. While some of this can be done in parallel, in the end the search is much slower than before. While this strategy is necessary in the hashing case, it's needlessly wasteful in the customer case.

Granted, the customer case was definitely not the initial shard use case. But it did get enough demand to warrant a new JIRA issue, HSEARCH-251.

I actually got to work with the Hibernate Search maintainers to provide this functionality.

http://anonsvn.jboss.org/repos/hibernate/search/trunk
Revision: 16755
Author: epbernard
Date: 8:43:09 PM, Wednesday, June 10, 2009
Message:
HSEARCH-251 Query on a shard subset based on a filter activation
----
Modified : /search/trunk/src/main/docbook/en-US/modules/configuration.xml
Modified : /search/trunk/src/main/docbook/en-US/modules/query.xml
Modified : /search/trunk/src/main/java/org/hibernate/search/filter/ChainedFilter.java
Added : /search/trunk/src/main/java/org/hibernate/search/filter/FullTextFilterImplementor.java
Added : /search/trunk/src/main/java/org/hibernate/search/filter/ShardSensitiveOnlyFilter.java
Modified : /search/trunk/src/main/java/org/hibernate/search/query/FullTextFilterImpl.java
Modified : /search/trunk/src/main/java/org/hibernate/search/query/FullTextQueryImpl.java
Modified : /search/trunk/src/main/java/org/hibernate/search/store/IdHashShardingStrategy.java
Modified : /search/trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java
Modified : /search/trunk/src/main/java/org/hibernate/search/store/NotShardedStrategy.java
Modified : /search/trunk/src/test/java/org/hibernate/search/test/configuration/UselessShardingStrategy.java
Added : /search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategy.java
Added : /search/trunk/src/test/java/org/hibernate/search/test/shards/CustomerShardingStrategyTest.java

You can download the latest build now, and give it a shot. Here is an example of how to use the new feature.

Here is your entity, with the filter defined.

@Indexed(index="Email")
// this "impl" is only a flag, not the actual filter class
@FullTextFilterDef(name="shard", impl=ShardSensitiveOnlyFilter.class)
public class Email {
...

Here is the filter.

public class ShardFilter {

    private Integer index;

    public void setIndex(Integer setIndex) {
        this.index = setIndex;
    }

    @Key
    public FilterKey getKey() {
        StandardFilterKey key = new StandardFilterKey();
        key.addParameter(index);
        return key;
    }

    @Factory
    public Filter getFilter() {
  Query query = new TermQuery(new Term("index", index.toString()));
        return new CachingWrapperFilter(new QueryWrapperFilter(query));
    }

}

Here is your indexing strategy, which implements the new method getDirectoryProvidersForQuer(). From here, you can define which shards a given Filter could possibly return data from.

public class SpecificShardingStrategy extends IdHashShardingStrategy {

 @Override
 public DirectoryProvider<?>[] getDirectoryProvidersForQuery(FullTextFilterImplementor[] filters) {

  FullTextFilter filter = getFilter(filters, "shard");
  if (filter == null) {
   return getDirectoryProvidersForAllShards();
  }
  else {
   return new DirectoryProvider[] { getDirectoryProvidersForAllShards()[Integer.parseInt(filter.getParameter("index").toString())] };
  }
 }

 private FullTextFilter getFilter(FullTextFilterImplementor[] filters, String name) {
  for (FullTextFilterImplementor filter: filters) {
   if (filter.getName().equals(name)) return filter;
  }
  return null;
 }

}

Finally, here is the actual search code.

FullTextSession fts = Search.getFullTextSession( s );
QueryParser parser = new QueryParser("id", new StopAnalyzer() );
FullTextQuery fullTextQuery = fts.createFullTextQuery( parser.parse( "body:message" ) );
fullTextQuery.enableFullTextFilter("shard").setParameter("index", 0);

Of course, there are many more ways to shard the cat. For example, the filter could be on customerID, region, etc. Thanks to the Hibernate Search team for incorporating my code!