How to configure Haystack / Elasticsearch to handle abbreviations and apostrophes at the beginning of a word

I have time trying to figure out the characters of the apostrophe at the beginning or in the middle of words. I can deal with possessive English, but I also try to please the French and process words like "d'action", where the character of the apostrophe comes at the beginning of the word, and not at the end, like "her."

A search through haystack auto_query for "d action" will return results, but "d'action" returns nothing. If I request the elasticsearch _search API (_search? Q = D% 27ACTION), I directly get the results for "d'action". So I wonder if this is a hay mover problem.

My configuration:

'settings': { "analysis": { "char_filter": { "quotes": { "type": "mapping", "mappings": [ "\\u0091=>\\u0027", "\\u0092=>\\u0027", "\\u2018=>\\u0027", "\\u2019=>\\u0027", "\\u201B=>\\u0027" ] } }, "analyzer": { "ch_analyzer": { "type": "custom", "tokenizer": "standard", "filter": ['ch_en_possessive_word_delimiter', 'ch_fr_stemmer'], "char_filter": ['html_strip', 'quotes'], }, }, "filter": { "ch_fr_stemmer" : { "type": "snowball", "language": "French" }, "ch_en_possessive_word_delimiter": { "type": "word_delimiter", "stem_english_possessive": True } } } } 

I have also subclassed ElasticsearchSearchBackend and BaseEngine, so I can add the above configuration:

 class ConfigurableESBackend(ElasticsearchSearchBackend): # Word reserved by Elasticsearch for special use. RESERVED_WORDS = ( 'AND', 'NOT', 'OR', 'TO', ) # Characters reserved by Elasticsearch for special use. # The '\\' must come first, so as not to overwrite the other slash replacements. RESERVED_CHARACTERS = ( '\\', '+', '-', '&&', '||', '!', '(', ')', '{', '}', '[', ']', '^', '"', '~', '*', '?', ':', ) def setup(self): """ Defers loading until needed. """ # Get the existing mapping & cache it. We'll compare it # during the ``update`` & if it doesn't match, we'll put the new # mapping. try: self.existing_mapping = self.conn.get_mapping(index=self.index_name) except Exception: if not self.silently_fail: raise unified_index = haystack.connections[self.connection_alias].get_unified_index() self.content_field_name, field_mapping = self.build_schema(unified_index.all_searchfields()) current_mapping = { 'modelresult': { 'properties': field_mapping, '_boost': { 'name': 'boost', 'null_value': 1.0 } } } if current_mapping != self.existing_mapping: try: # Make sure the index is there first. self.conn.create_index(self.index_name, settings.ELASTICSEARCH_INDEX_SETTINGS) self.conn.put_mapping(self.index_name, 'modelresult', mapping=current_mapping) self.existing_mapping = current_mapping except Exception: if not self.silently_fail: raise self.setup_complete = True class CHElasticsearchSearchEngine(BaseEngine): backend = ConfigurableESBackend query = ElasticsearchSearchQuery 
+5
source share
1 answer

Ok, so this had nothing to do with the configuration, but instead was a problem with the .txt template used to index haystack.

I have had:

 {{ object.some_model.name_en }} {{ object.some_model.name_fr }} 

Which caused such characters as "to be converted to html entitle ( ' )), which led to the fact that the search never found the result. Use of" safe "is fixed:

 {{ object.some_model.name_en|safe }} {{ object.some_model.name_fr|safe }} 
+6
source

Source: https://habr.com/ru/post/1201827/


All Articles