Django Haystack: Searching for a term with and without accents

Question

Django Haystack: Searching for a term with and without accents

I am implementing a search engine in my django project using django haystack. The problem is that some of the fields in my models have some French accents, and I would like to find records that contain a query with and without accents.

I think the best idea is to create SearchIndex with fields with accents and with the same field without accents.

Any idea or hint on this?

Here is the code

Imagine the following models:

Cars(models.Model): name = models.CharField()

and the following Haystack index:

 Cars(indexes.SearchIndex): name = indexes.CharField(model_attr='name') cleaned_name = indexes.CharField(model_attr='name') def prepare_cleaned_name(self, object): return strip_accents(object.name)

now, in my index template, I put both fields:

 {{ object.cleaned_name }} {{ object.name }}

So this is some kind of pseudo code, I don’t know if it works, but if you have any ideas on this, let me know!

+4

django encoding search-engine django-haystack

dzen Feb 10 '10 at 22:41

source share

3 answers

Yes, you're on the right track. Sometimes you want to store fields several times with various transformations.

An example of this in my application is that I have two title fields. One for the search that is obtained (the process by which test ~ = test ~ = tester), and the other for sorting, which is left alone (interruption interferes with the sort order).

This is a similar case.

In my schema.xml, this is being processed:

 <field name="title" type="text" indexed="true" stored="true" multiValued="false" /> <field name="title_sort" type="string" indexed="true" stored="true" multiValued="false" />

The string type is responsible for saving the header version as is.

By the way, it’s you who remove the emphasis to facilitate the search for words, this is what you should pay attention to: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory

+3

Koobz Feb 11 '10 at 8:08

source share

You should do something like the following:

 Cars(indexes.SearchIndex): name = indexes.CharField(model_attr='name') def prepare(self, obj): self.prepared_data = super(Cars, self).prepare(obj) self.prepared_data['name'] += '\n' + strip_accents(self.prepared_data['name']) return self.prepared_data

I do not like this solution. I would like to know how to set up my server to do this for me. I am using whoosh.

0

semente Sep 28 '12 at 15:09

source share

dzen · Accepted Answer · 2010-02-14T14:05:15+0000

I find a way to index both values from the same field in my model.

First write a method in your model that returns the ascii value of the fields:

 class Car(models.Model): name = model.CharField() def ascii_name(self): return strip_accents(self.name)

So, in your template used to generate the index, you can do this:

 {{ object.name }} {{ object.ascii_name }}

Then you just need to rebuild your indexes!

Django Haystack: Searching for a term with and without accents

More articles: