Django/Haystack: latitude/longitude radius search w/ SOLR

Haystack is a great indexed search framework for Django. Getting started is easy, and it includes many data types and facets out of the box. However, one data type it does not do natively is location based search. Specifically, I wanted to do radius searching based on latitude and longitude. GIS search is coming, but it could be a little while:

There aren't words to express how badly I want to incorporate geospatial search. However, I'm waiting on Solr 1.5's official implementation, as well as seeing if the Xapian folks land their GIS branch. I haven't pursued the third-party options because it makes the setup more complex and there's no guarantees on compatibility, which increase my support headaches. - Daniel, Haystack Google Group

In the meantime, you can use one of several SOLR extensions, or you can just do it yourself. Here is a very basic, but functional, implementation.

First, I introduce a custom index field "GeoPointField" on my SearchIndex model, mapped to some existing fields in my Django ORM that store latitude and longitude as float values.

class GeoPointField(indexes.CharField):

    def __init__(self, **kwargs):
        kwargs["default"] = "000000000000000"
        super(GeoPointField, self).__init__(**kwargs)

    def convert(self, value):
        return geo2solr(value)

class JobIndex(indexes.SearchIndex):

    text = indexes.CharField(document=True, use_template=True, template_name="search/index/job_text.txt")
    latitude = GeoPointField(model_attr="location__latitude", null=True)
    longitude = GeoPointField(model_attr="location__longitude", null=True)

    def index_queryset(self):
        return Job.objects.all()

site.register(Job, JobIndex)

What's the deal with that default of "000000000000000"? Because everything is a string in SOLR, I have decided to encode lat/long as a string between "000000000000000" and "999999999999999". The actual "geo2solr" algorithm for the mapping can be arbitrary, as long as the outputs maintain relative comparability when compared as strings. i.e., a > b implies that geo2solr(a) > geo2solr(b).

def geo2solr(lat_or_long):
    """ Converts a floating point latitude or longitude to a string for the SOLR index.
    The string representations need to be str-comparable to each other. Ie,
    04235863500 < 05000000000

    Negative values are handled by adding 180 to everything (Longitude is +/- 180)

    >>> geo2solr(42.35863500)
    '000022235863500'

    >>> geo2solr(-42.35863500)
    '000013764136500'

    >>> geo2solr('42.35863500')
    '000022235863500'

    >>> geo2solr('-179.35863500')
    '000000064136500'

    >>> geo2solr(0)
    "000018000000000"

    >>> geo2solr(None)

    >>> geo2solr("foobar")

    >>> geo2solr(50000)

    """
    try:
        value = float(lat_or_long)
        if -180 <= abs(value) <= 180:
            return str(int((value + 180) * 100000000)).rjust(15, "0")
    except TypeError:
        pass
    except ValueError:
        pass
    return None

To do the actual radius search part requires a way to calculate the radius range in my geo-coded units, and a SearchForm/FacetedSearchForm to perform that search.

def geo_radius(lat, long, miles=50):
    """ Given a latitude and longitude (as floats), returns two offsets, one
    for latitude and one for longitude, that defines a radius (in miles)
    around the location.

    Uses Haversine formula: http://en.wikipedia.org/wiki/Haversine_formula
    http://blog.fedecarg.com/2009/02/08/geo-proximity-search-the-haversine-equation/

    Note: long is not actually used. This is correct. Longitude distance is also
    based on latitude in Haversine.

    >>> geo_radius(53.754842, -2.708077)
    (0.7246376811594203, 1.2256204746052668)

    """
    return Decimal(miles / 69.0), Decimal(miles / abs(math.cos(math.radians(lat)) * 69.0))

class LocationRadiusSearchForm(FacetedSearchForm):

    q = forms.CharField(required=False)
    location = ChoiceField(required=False)

    def search(self):
        sqs = super(LocationRadiusSearchForm, self).search()
        try:
            location_name = self.cleaned_data.get("location")
            location = Location.objects.get(name=location_name)
            if location:
                lat_offset, long_offset = geo_radius(location.latitude, location.longitude)
                sqs = sqs.filter_and(
                    latitude__range=[
                        geo2solr(location.latitude - lat_offset),
                        geo2solr(location.latitude + lat_offset)],
                    longitude__range=[
                        geo2solr(location.longitude - long_offset),
                        geo2solr(location.longitude + long_offset)])
        except Exception:
            pass

        return sqs

That's it! This will generated SOLR queries like "longitude:[000005666355690 TO 000005849715509] AND latitude:[000021705248731 TO 000021850176268]"