The elasticity of the search for several total amounts

We have many documents in each index (~ 10 000 000). But each document is very small and contains almost integer values.

We needed SUM the whole number field.

  • First step . We request all available fields with a mapping .

Example:

GET INDEX/TYPE/_mapping
  1. Second step . We create a query with fields from the display.

Example:

GET INDEX/TYPE/_search
{
    // SOME FILTERS TO REDUCE THE NUMBER OF DOCUMENTS
    "size":0,
    "aggs":{  
        "FIELD 1":{  
            "sum":{  
                "field":"FIELD 1"
            }
        },
        "FIELD 2":{  
            "sum":{  
                "field":"FIELD 2"
            }
        },
        // ...
        "FIELD N":{  
            "sum":{  
                "field":"FIELD N"
            }
        }
    }
}

Our problem is that the second query execution time is linear with number field N .

This is unacceptable, as these are only sums . Therefore, we tried to create our own aggregation with a scripted metric (groovy) .

Example: total 2 fields:

// ...
"aggs": {
    "test": {
        "scripted_metric": {
            "init_script": "_agg['t'] = []",
            "map_script": "_agg.t.add(doc)",
            "combine_script": "res = [:]; res['FIELD 1'] = 0; res['FIELD 2'] = 0; for (t in _agg.t) { res['FIELD 1'] += t.['FIELD 1']; res['FIELD 2'] += t.['FIELD 2']; }; return res", 
            "reduce_script": "res = [:]; res['FIELD 1'] = 0; res['FIELD 2'] = 0; for (t in _aggs) { res['FIELD 1'] += t.['FIELD 1']; res['FIELD 2'] += t.['FIELD 2']; }; return res"
        }
    }
}
// ...

, script, , .

.

- script ? ?

+4
1

N , - ?

10 . , 100%? ( , 50).

, , .

doc_values ​​ , . , 10-25%.

+1

Source: https://habr.com/ru/post/1606930/


All Articles