Remove duplicates from search results

We have a list of products that look like this:

{"name": "", "image_url":"", tags:0}

The problem is that there are enough duplicate products. There is no real way to control the addition of duplicates in elasticsearch due to the huge size of the data, so I'm looking for a way to filter duplicates during a query.

Duplicate products are defined as "products with the same name and image_url field." In addition, if a product has more than zero β€œtags”, we never want to remove it from the search results!

Any ideas how I can do this?

+4
source share
1 answer

I had a similar problem. There are several possible ways.

1 - . Java, . DB . , . , ES. ( ), ES.

2 - ES. script / , ES. , .

+1

Source: https://habr.com/ru/post/1543551/


All Articles