You didnโt talk much about your data structure, but I come out of your question that you have post objects that contain a date field and, presumably, a thread_id field, that is, a way to identify which thread the message belongs to?
Do you also have a thread object or is your thread_id sufficient?
In any case, your stated goal is to return a list of streams that contain messages in a specific date range. This means that you need to group your threads (rather than returning the same thread_id multiple times for each message in a date range).
This grouping can be done using facets .
Thus, the request in JSON will look like this:
curl -XGET 'http://127.0.0.1:9200/posts/post/_search?pretty=1&search_type=count' -d ' { "facets" : { "thread_id" : { "terms" : { "size" : 20, "field" : "thread_id" } } }, "query" : { "filtered" : { "query" : { "text" : { "content" : "any keywords to match" } }, "filter" : { "numeric_range" : { "date" : { "lt" : "2011-02-01", "gte" : "2011-01-01" } } } } } } '
Note:
- I use
search_type=count , because I really do not want the messages to be returned, just thread_id s - I pointed out that I want the 20 most common
thread_id ( size: 20 ). The default will be 10 - I use
numeric_range for the date field, because dates usually have many different values, and the numeric_range filter takes a different approach to the range filter, which makes it more efficient in this situation - If your
thread_id looks like how-to-perform-a-date-range-elasticsearch-query , you can use these values โโdirectly. But if you have a separate thread object, you can use the multi-get API to retrieve these - your
thread_id field should be displayed as { "index": "not_analyzed" } , so that the whole value is treated as a single term, and not analyzed for individual terms
source share