Indexing a field of comma-separated values ​​in Elastic Search

I use Nutch to crawl a site and index it in Elastic search. My site has meta tags, some of which contain a list of identifiers separated by commas (which I intend to use for search). For instance:

contentTypeIds = "2.5.15". (note: no square brackets).

When the ES indexes this, I cannot find contentTypeIds: 5 and find documents whose contentTypeIds contain 5; this query returns only documents whose contentTypeIds are "5". However, I want to find documents whose contentTypeIds contains 5.

In Solr, this is accomplished by setting the contentTypeIds field to multiValued = "true" in schema.xml. I can not find how to do something like this in ES.

I am new to ES, so I probably missed something. Thank you for your help!

+6
source share
1 answer

Create a custom parser that will separate the indexed text into tokens with commas.

Then you can try to perform a search. If you do not need relevance, you can use a filter to search for documents. My example shows how you can try to search using a term filter .

Below you can find how to do this using the sense plugin.

DELETE testindex PUT testindex { "index" : { "analysis" : { "tokenizer" : { "comma" : { "type" : "pattern", "pattern" : "," } }, "analyzer" : { "comma" : { "type" : "custom", "tokenizer" : "comma" } } } } } PUT /testindex/_mapping/yourtype { "properties" : { "contentType" : { "type" : "string", "analyzer" : "comma" } } } PUT /testindex/yourtype/1 { "contentType" : "1,2,3" } PUT /testindex/yourtype/2 { "contentType" : "3,4" } PUT /testindex/yourtype/3 { "contentType" : "1,6" } GET /testindex/_search { "query": {"match_all": {}} } GET /testindex/_search { "filter": { "term": { "contentType": "6" } } } 

Hope this helps.

+11
source

Source: https://habr.com/ru/post/990024/


All Articles