Specifying and Using NGramTokenizer with a C # NEST Client for Elastic Search

Updated to show a working sample.

I am trying to do a partial search in a collection of usernames in ElasticSearch.

A search around pointed me in the direction of nGram Tokenizer , but I am not sure about the correct implementation and am not getting any results.

This is the appropriate code removed from the project I'm working on.

I tried different combinations and search types to no avail.

setup.cs

 var client = new ElasticClient(settings.ConnectionSettings); // (Try and) Setup the nGram tokenizer. var indexSettings = new IndexSettings(); var custonAnalyzer = new CustomAnalyzer(); customAnalyzer.Tokenizer = "mynGram"; customAnalyzer.Filter = new List<string> { "lowercase" }; indexSettings.Analysis.Analyzers.Add("mynGram", customAnalyzer); indexSettings.Analysis.Tokenizers.Add("mynGram", new NGramTokenizer { MaxGram = 10, MinGram = 2 }); client.CreateIndex(settings.ConnectionSettings.DefaultIndex, indexSettings); client.MapFromAttributes<Profile>(); // Create and add a new profile object. var profile = new Profile { Id = "1", Username = "Russell" }; client.IndexAsync(profile); // Do search for object var s = new SearchDescriptor<Profile>().Query(t => t.Term(c => c.Username, "russ")); var results = client.Search<Profile>(s); 

Profile.cs

 public class Profile { public string Id { get; set; } [ElasticProperty(IndexAnalyzer = "mynGram")] public string Username { get; set; } } 

Any advice would be highly appreciated.

+6
source share
1 answer

Take a look at this from the docs on nGram token filters :

  "settings" : { "analysis" : { "analyzer" : { "my_ngram_analyzer" : { "tokenizer" : "my_ngram_tokenizer" } }, "tokenizer" : { "my_ngram_tokenizer" : { "type" : "nGram", "min_gram" : "2", "max_gram" : "3", "token_chars": [ "letter", "digit" ] } } } } 

Some comments

  • You need to add mynGram to your analyzer, otherwise it will not be used. They work like that. Each indexed field has an analyzer for it, an analyzer - one tokenizer, followed by zero or more token filters. You defined a good nGram tokenizer ( mynGram ), but you did not use it in customAnalyzer , it uses a standard customAnalyzer . (Basically you just define, but never use mynGram .)

  • You need to say elasticsearch to use your customAnalyzer in your mapping: "properties": {"string_field": {"type": "string", "index_analyzer": customAnalyzer" }}

  • You must change maxGram to a larger number (maybe 10), otherwise the search in 4 letters will not behave exactly like autocomplete (or cannot return anything, it depends on the search time analyzer).

  • Use the _analyze api test point to test the analyzer. Something like this should work.

    curl -XGET ' http://yourserver.com:9200?index_name/_analyze?analyzer=customAnalyzer ' -d 'rlewis'

Good luck

+3
source

Source: https://habr.com/ru/post/950681/


All Articles