What is a good alternative for the output field in Elasticsearch 5.1 Suggestions for completion?

The first error I encountered while indexing my data in ES 5.1 was my sentence mapping containing an output field.

message [MapperParsingException[failed to parse]; nested: IllegalArgumentException[unknown field name [output], must be one of [input, weight, contexts]];]

So, I deleted it, but now many of my auto-completions are incorrect, because it returns the input that it matched, instead of a single line of output.

After some googling, I found this article from ES, which mentioned the following:

Since proposals are document-oriented, proposal metadata (e.g., output) should now be indicated as a field in the document. Support for specifying output when entering sentence pointers has been removed. Currently, the text for inputting the results of a proposal is always an unanalyzable value for inputting proposals (the same as not specifying output when specifying sentences in indices up to 5.0).

I found that the original value is associated with the _source field, which is returned with the sentence, but this is not a solution for me, because the key and structure change depending on the source object from which it comes.

I can add an additional “output” field in the source object, but this is not a solution for me, because in some cases I have this structure:

 { "id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0", "synonyms": ["All available colours", "Colors"], "autoComplete": [{ "input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"] }, { "input": ["colors"] }] } 

in ES 2.4, the structure was like this:

 { "id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0", "synonyms": ["All available colours", "Colors"], "SmartSynonym": [{ "input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"], "output": ["All available colours"] }, { "input": ["colors"], "output": ["Colors"] }] } 

This was not a problem when the exit field was present in every Autocomplete object.

How to return the original value in ES 5.1 (for example, all available colors) when you request “colors available all” in a simple way without doing a lot of manual searching.

Another user related question: Output field in an autocomplete sentence

+5
source share
1 answer

Updated Answer


We ended up removing the custom plugin from the original answer because it was hard to get it working in Elastic Cloud . Instead, we simply created a separate document for autocomplete and deleted them from all our other documents.

An object

 public class Suggest{ /* * Contains the actual value it needs to return * iphone 8 plus, plus iphone 8, 8 plus iphone, ... * will all result into iphone 8 plus for example */ private String autocompleteOutput; /* * Contains the field and all the values of that field to autocomplete */ private Map<String, AutoComplete> autoComplete; @JsonCreator Suggest() { } public Suggest(String autocompleteOutput, Map<String, AutoComplete> autoComplete) { this.autocompleteOutput = autocompleteOutput; this.autoComplete = autoComplete; } public String getAutocompleteOutput() { return autocompleteOutput; } public void setAutocompleteOutput(String autocompleteOutput) { this.autocompleteOutput = autocompleteOutput; } public Map<String, AutoComplete> getAutoComplete() { return autoComplete; } public void setAutoComplete(Map<String, AutoComplete> autoComplete) { this.autoComplete = autoComplete; } } public class AutoComplete { /* * Contains the permutation values from the lucene filter (see original answer */ private String[] input; @JsonCreator AutoComplete() { } public AutoComplete(String[] input) { this.input = input; } public String[] getInput() { return input; } } 

with the following display

 { "suggest": { "dynamic_templates": [ { "autocomplete": { "path_match": "autoComplete.*", "match_mapping_type": "*", "mapping": { "type": "completion", "analyzer": "lowercase_keyword_analyzer" } } } ], "properties": {} } } 

This allows us to use the autocompleteOutput field from _source

Original answer


After some research, I finished creating a new Elasticsearch 5.1.1 plugin

Create a lucene filter

 import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute; import java.io.IOException; import java.util.*; /** * Created by glenn on 13.01.17. */ public class PermutationTokenFilter extends TokenFilter { private final CharTermAttribute charTermAtt; private final PositionIncrementAttribute posIncrAtt; private final OffsetAttribute offsetAtt; private Iterator<String> permutations; private int origOffset; /** * Construct a token stream filtering the given input. * * @param input */ protected PermutationTokenFilter(TokenStream input) { super(input); this.charTermAtt = addAttribute(CharTermAttribute.class); this.posIncrAtt = addAttribute(PositionIncrementAttribute.class); this.offsetAtt = addAttribute(OffsetAttribute.class); } @Override public final boolean incrementToken() throws IOException { while (true) { //see if permutations have been created already if (permutations == null) { //see if more tokens are available if (!input.incrementToken()) { return false; } else { //Get value String value = String.valueOf(charTermAtt); //permute over buffer value and create iterator permutations = permutation(value).iterator(); origOffset = posIncrAtt.getPositionIncrement(); } } //see if there are remaining permutations if (permutations.hasNext()) { //Reset the attribute to starting point clearAttributes(); //use the next permutation String permutation = permutations.next(); //add te permutation to the attributes and remove old attributes charTermAtt.setEmpty().append(permutation); posIncrAtt.setPositionIncrement(origOffset); offsetAtt.setOffset(0,permutation.length()); //remove permutation from iterator permutations.remove(); origOffset = 0; return true; } permutations = null; } } /** * Changes the order of a multi value keyword so the completion suggester still knows the original value without * tokenizing it if the users asks the words in a different order. * * @param value unpermuted value ex: Yellow Crazy Banana * @return Permuted values ex: * Yellow Crazy Banana, * Yellow Banana Crazy, * Crazy Yellow Banana, * Crazy Banana Yellow, * Banana Crazy Yellow, * Banana Yellow Crazy */ private Set<String> permutation(String value) { value = value.trim().replaceAll(" +", " "); // Use sets to eliminate semantic duplicates (aab is still aab even if you switch the two 'a in case one word occurs multiple times in a single value) // Switch to HashSet for better performance Set<String> set = new HashSet<String>(); String[] words = value.split(" "); // Termination condition: only 1 permutation for a array of 1 word if (words.length == 1) { set.add(value); } else if (words.length <= 6) { // Give each word a chance to be the first in the permuted array for (int i = 0; i < words.length; i++) { // Remove the word at index i from the array String pre = ""; for (int j = 0; j < i; j++) { pre += words[j] + " "; } String post = " "; for (int j = i + 1; j < words.length; j++) { post += words[j] + " "; } String remaining = (pre + post).trim(); // Recurse to find all the permutations of the remaining words for (String permutation : permutation(remaining)) { // Concatenate the first word with the permutations of the remaining words set.add(words[i] + " " + permutation); } } } else { Collections.addAll(set, words); set.add(value); } return set; } } 

This filter will accept the original All Available Colors input token and rearrange it into all possible combinations (see original question)

Create a factory

 import org.apache.lucene.analysis.TokenStream; import org.elasticsearch.index.analysis.AbstractTokenFilterFactory; import org.elasticsearch.common.settings.Settings; import org.elasticsearch.env.Environment; import org.elasticsearch.index.IndexSettings; /** * Created by glenn on 16.01.17. */ public class PermutationTokenFilterFactory extends AbstractTokenFilterFactory { public PermutationTokenFilterFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) { super(indexSettings, name, settings); } public PermutationTokenFilter create(TokenStream input) { return new PermutationTokenFilter(input); } } 

This class is required to provide a filter for the Elasticsearch plugin.

Create an Elasticsearch Plugin

Follow this guide to configure the necessary configuration for the Elasticsearch plugin.

 <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>be.smartspoken</groupId> <artifactId>permutation-plugin</artifactId> <version>5.1.1-SNAPSHOT</version> <packaging>jar</packaging> <name>Plugin: Permutation</name> <description>Permutation plugin for elasticsearch</description> <properties> <lucene.version>6.3.0</lucene.version> <elasticsearch.version>5.1.1</elasticsearch.version> <java.version>1.8</java.version> <log4j2.version>2.7</log4j2.version> </properties> <dependencies> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-api</artifactId> <version>${log4j2.version}</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>${log4j2.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-test-framework</artifactId> <version>${lucene.version}</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>${lucene.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers-common</artifactId> <version>${lucene.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>${elasticsearch.version}</version> <scope>provided</scope> </dependency> </dependencies> <build> <resources> <resource> <directory>src/main/resources</directory> <filtering>false</filtering> <excludes> <exclude>*.properties</exclude> </excludes> </resource> </resources> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.6</version> <configuration> <appendAssemblyId>false</appendAssemblyId> <outputDirectory>${project.build.directory}/releases/</outputDirectory> <descriptors> <descriptor>${basedir}/src/main/assemblies/plugin.xml</descriptor> </descriptors> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.3</version> <configuration> <source>${java.version}</source> <target>${java.version}</target> </configuration> </plugin> </plugins> </build> </project> 

Make sure that you are using the correct version of Elasticsearch, Lucene, and Log4J (2) in the pom.xml file and specify the correct configuration files.

 import be.smartspoken.plugin.permutation.filter.PermutationTokenFilterFactory; import org.elasticsearch.index.analysis.TokenFilterFactory; import org.elasticsearch.indices.analysis.AnalysisModule; import org.elasticsearch.plugins.AnalysisPlugin; import org.elasticsearch.plugins.Plugin; import java.util.HashMap; import java.util.Map; /** * Created by glenn on 13.01.17. */ public class PermutationPlugin extends Plugin implements AnalysisPlugin{ @Override public Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> getTokenFilters() { Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> extra = new HashMap<>(); extra.put("permutation", PermutationTokenFilterFactory::new); return extra; } } 

provide factory plugin.

After you installed the new plugin, you need to restart your Elasticsearch.

Use plugin

Add a new custom analyzer that makes fun of 2.x functionality

  Settings.builder() .put("number_of_shards", 2) .loadFromSource(jsonBuilder() .startObject() .startObject("analysis") .startObject("analyzer") .startObject("permutation_analyzer") .field("tokenizer", "keyword") .field("filter", new String[]{"permutation","lowercase"}) .endObject() .endObject() .endObject() .endObject().string()) .loadFromSource(jsonBuilder() .startObject() .startObject("analysis") .startObject("analyzer") .startObject("lowercase_keyword_analyzer") .field("tokenizer", "keyword") .field("filter", new String[]{"lowercase"}) .endObject() .endObject() .endObject() .endObject().string()) .build(); 

Now you only need to provide custom analyzers for matching objects.

 { "my_object": { "dynamic_templates": [{ "autocomplete": { "path_match": "my.autocomplete.object.path", "match_mapping_type": "*", "mapping": { "type": "completion", "analyzer": "permutation_analyzer", /* custom analyzer */ "search_analyzer": "lowercase_keyword_analyzer" /* custom analyzer */ } } }], "properties": { /*your other properties*/ } } } 

It will also improve performance because you no longer have to wait for permutations to be created.

0
source

Source: https://habr.com/ru/post/1261919/


All Articles