Highlight ElasticSearch Autocomplete

I have the following data for indexing on ElasticSearch.

enter image description here

I want to implement the autocomplete function and highlight why a particular document matches the request.

These are my index settings:

{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 15
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

Index Analysis

  • Separation of text at word boundaries.
  • Removes pontuation.
  • lower case
  • Edge NGrams each token

So, the inverted index is as follows:

enter image description here

This is how I defined the mappings for the name field:

{
    "index_type": {
        "properties": {
            "name": {
                "type":     "string",
                "index_analyzer":  "autocomplete", 
                "search_analyzer": "standard" 
            }
        }
    }
}

When I request:

GET http://localhost:9200/index/type/_search

{
    "query": {
        "match": {
            "name": "soft"
        }
    },
    "highlight": {
        "fields" : {
            "name" : {}
        }
    }
}

Search: soft

Using a standard tokenizer, “soft” is a term that can be found on an inverted index. This search matches the documents: 1, 3, 4, 5, 6, 7, which is true, but the highlighted part that I would expect to be “soft” rather than the whole word:

{
  "hits": [
    {
      "_source": {
        "name": "SoftwareRocks everytime"
      },
      "highlight": {
        "name": [
          "<em>SoftwareRocks</em> everytime"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG"
      },
      "highlight": {
        "name": [
          "<em>Software</em> AG"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG2"
      },
      "highlight": {
        "name": [
          "<em>Software</em> AG2"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG good software better"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> AG good <em>software</em> better"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> AG"
        ]
      }
    },
    {
      "_source": {
        "name": "is soft ware ok"
      },
      "highlight": {
        "name": [
          "is <em>soft</em> ware ok"
        ]
      }
    }
  ]
}

Search: ag software

, " " " " "" , . : 1, 3, 4, 5, 6, , , " " "" , " " "" :

{
  "hits": [
    {
      "_source": {
        "name": "Software AG"
      },
      "highlight": {
        "name": [
          "<em>Software</em> <em>AG</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG2"
      },
      "highlight": {
        "name": [
          "<em>Software</em> <em>AG2</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> <em>AG</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG good software better"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> <em>AG</em> good <em>software</em> better"
        ]
      }
    },
    {
      "_source": {
        "name": "SoftwareRocks everytime"
      },
      "highlight": {
        "name": [
          "<em>SoftwareRocks</em> everytime"
        ]
      }
    }
  ]
}

elasticsearch, , . , , . - , ?

, -, - ElasticSearch . , . , , ElasticSearch, , ( , , ).

( PHP):

public function search($term)
{
    $params = [
        'index' => $this->getIndexName(),
        'type' => $this->getIndexType(),
        'body' => [
            'query' => [
                'match' => [
                    'name' => $term
                ]
            ]
        ]
    ];

    $results = $this->client->search($params);

    $hits = $results['hits']['hits'];

    $data = [];

    $wrapBefore = '<strong>';
    $wrapAfter = '</strong>';

    foreach ($hits as $hit) {
        $data[] = [
            $hit['_source']['id'],
            $hit['_source']['name'],
            preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name']))
        ];
    }

    return $data;
}

, :

enter image description here

, , ElasticSearch , .

+4
1

, . xhr requests "att" .

url - https://search.elastic.co/suggest?q=att
    {
        "current_page": 1,
        "last_page": 4,
        "total_hits": 49,
        "hits": [
            {
                "tags": [],
                "url": "/elasticon/tour/2016/jp/not-attending",
                "section": "Elasticon",
                "title": "Not <em>Attending</em> - JP"
            },
            {
                "section": "Elasticon",
                "title": "<em>Attending</em> from Training - JP",
                "tags": [],
                "url": "/elasticon/tour/2016/jp/attending-training"
            },
            {
                "tags": [],
                "url": "/elasticon/tour/2016/jp/attending-keynote",
                "title": "<em>Attending</em> from Keynote - JP",
                "section": "Elasticon"
            },
            {
                "tags": [],
                "url": "/elasticon/tour/2016/not-attending",
                "section": "Elasticon",
                "title": "Thank You - Not <em>Attending</em>"
            },
            {
                "tags": [],
                "url": "/elasticon/tour/2016/attending",
                "section": "Elasticon",
                "title": "Thank You - <em>Attending</em>"
            },
            {
                "section": "Blog",
                "title": "What It Like to <em>Attend</em> Elastic Training",
                "tags": [],
                "url": "/blog/what-its-like-to-attend-elastic-training"
            },
            {
                "tags": "Elasticsearch",
                "url": "/guide/en/elasticsearch/plugins/5.0/mapper-attachments-highlighting.html",
                "section": "Docs/",
                "title": "Highlighting <em>attachments</em>"
            },
            {
                "title": "<em>attachments</em> » email",
                "section": "Docs/",
                "tags": "Logstash",
                "url": "/guide/en/logstash/5.0/plugins-outputs-email.html#plugins-outputs-email-attachments"
            },
            {
                "section": "Docs/",
                "title": "Configuring Email <em>Attachments</em> » Actions",
                "tags": "Watcher",
                "url": "/guide/en/watcher/2.4/actions.html#configuring-email-attachments"
            },
            {
                "url": "/guide/en/watcher/2.4/actions.html#hipchat-action-attributes",
                "tags": "Watcher",
                "title": "HipChat Action <em>Attributes</em> » Actions",
                "section": "Docs/"
            },
            {
                "title": "Slack Action <em>Attributes</em> » Actions",
                "section": "Docs/",
                "tags": "Watcher",
                "url": "/guide/en/watcher/2.4/actions.html#slack-action-attributes"
            }
        ],
        "aggs": {
            "sections": [
                {
                    "Elasticon": 5
                },
                {
                    "Blog": 1
                },
                {
                    "Docs/": 43
                }
            ],
            "top_tags": [
                {
                    "XPack": 14
                },
                {
                    "Elasticsearch": 12
                },
                {
                    "Watcher": 9
                },
                {
                    "Logstash": 4
                },
                {
                    "Clients": 3
                },
                {
                    "Shield": 1
                }
            ]
        }
    }

"att", . , .

+1

Source: https://habr.com/ru/post/1660518/


All Articles