Elasticsearch - comparing a nested field with another field in a document

I need to compare 2 fields in the same document where the actual value doesn't matter. Consider this document:

_source: { id: 123, primary_content_type_id: 12, content: [ { id: 4, content_type_id: 1 assigned: true }, { id: 5, content_type_id: 12, assigned: false } ] } 

I need to find all documents in which the primary content is not assigned. I cannot find a way to compare primary_content_type_id with nested content.content_type_id to make sure they have the same value. This is what I tried using the script. I donโ€™t think I understand the scripts, but this may be a way to solve this problem:

 { "filter": { "nested": { "path": "content", "filter": { "bool": { "must": [ { "term": { "content.assigned": false } }, { "script": { "script": "primary_content_type_id==content.content_type_id" } } ] } } } } } 

Please note that it works fine if I remove the script filter part and replace it with another term filter, where content_type_id = 12 , and also add another filter where primary_content_id = 12 . The problem is that I will not know (and it does not matter for my use case) what the values โ€‹โ€‹of primary_content_type_id or content.content_type_id . It is simply important that the assigned value is false for the content, where content_type_id corresponds to primary_content_type_id .

Is it possible to check this check with elasticsearch?

+6
source share
1 answer

In the case of nested search, you view nested objects without a parent . Unfortunately, there is no hidden connection that can be applied with nested objects.

At least for the time being, this means that you are not getting both the "parent" and the attached document in the script. You can confirm this by replacing your script with both of them and checking the result:

 # Parent Document does not exist "script": { "script": "doc['primary_content_type_id'].value == 12" } # Nested Document should exist "script": { "script": "doc['content.content_type_id'].value == 12" } 

You can do this with the least performance metric by going through object (instead of essentially using ES for you with nested ). This means that you will need to re-index the documents and nested documents as one document for this to work. Given how you try to use it, it probably won't be too different and may even improve (especially considering the lack of an alternative).

 # This assumes that your default scripting language is Groovy (default in 1.4) # Note1: "find" will loop across all of the values, but it will # appropriately short circuit if it finds any! # Note2: It would be preferable to use doc throughout, but since we need the # arrays (plural!) to be in the _same_ order, then we need to parse the # _source. This inherently means that you must _store_ the _source, which # is the default. Parsing the _source only happens on the first touch. "script": { "script": "_source.content.find { it.content_type_id == _source.primary_content_type_id && ! it.assigned } != null", "_cache" : true } 

I cached the result because nothing happens here (for example, not comparing dates with now ), so it is safe enough for caching, thereby greatly speeding up the search in the future. Most filters are cached by default, but scripts are one of the few exceptions .

Since it must compare both values โ€‹โ€‹to make sure it finds the correct internal object, you are duplicating some work, but this is almost inevitable. The term filter is likely to be better than just doing this check without it.

+7
source

Source: https://habr.com/ru/post/978483/


All Articles