Get the number of fields in the index

For optimization purposes, I am trying to reduce the total number of fields. However, before I am going to do this, I want to get an idea of ​​how many fields I actually have. There seems to be no information at the endpoint of _stats , and I cannot understand how the migration tool calculates the number of fields.

Is there any way, with an endpoint or other ways, to get the total number of fields of a specified index?

+8
source share
6 answers

To improve a little on what the other answer provided, you can get a match, and then just count the number of times the result is the type keyword, which gives the number of fields, since each field needs a type

 curl -s -XGET localhost:9200/index/_mapping?pretty | grep type | wc -l 
+20
source

Val's first answer solves the problem for me as well. But I just wanted to list some corner cases that can lead to misleading numbers.

  1. The document has fields with the word type.

for instance

  "content_type" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", } } }, 

This will match the grep type three times, while it should only do it twice, that is, it should not match "content_type". This script is easy to fix.

Instead

 curl -s -XGET localhost:9200/index/_mapping?pretty | grep type 

using

 curl -s -XGET localhost:9200/index/_mapping?pretty | grep '"type"' 

to get an exact type match

  1. The document has a field with the exact name "type"

for instance

 "type" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword" } } }, 

In this case, also a match three times, and not twice. But using

 curl -s -XGET localhost:9200/index/_mapping?pretty | grep '"type"' 

not going to cut it. We will need to skip fields with the keyword "type" as a substring, as well as with an exact match. In this case, we can add an additional filter as follows:

 curl -s -XGET localhost:9200/index/_mapping?pretty |\ grep '"type"' | grep -v "{" 

In addition to the above 2 scenarios, if you use the API to transmit numbers for tracking, that is, something like an AWS or Graphite cloud clock, you can use the following code to call the API - retrieve data and perform a recursive keyword search. "type" - while skipping any fuzzy matches and delving into the fields with the exact name "type".

 import sys import json import requests # The following find function is a minor edit of the function posted here # https://stackoverflow.com/questions/9807634/find-all-occurrences-of-a-key-in-nested-python-dictionaries-and-lists def find(key, value): for k, v in value.iteritems(): if k == key and not isinstance(v, dict) and not isinstance(v, list): yield v elif isinstance(v, dict): for result in find(key, v): yield result elif isinstance(v, list): for d in v: for result in find(key, d): yield result def get_index_type_count(es_host): try: response = requests.get('https://%s/_mapping/' % es_host) except Exception as ex: print('Failed to get response - %s' % ex) sys.exit(1) indices_mapping_data = response.json() output = {} for index, mapping_data in indices_mapping_data.iteritems(): output[index] = len(list(find('type', mapping_data))) return output if __name__ == '__main__': print json.dumps(get_index_type_count(sys.argv[1]), indent=2) 

The above code is also posted here as a hist - https://gist.github.com/saurabh-hirani/e8cbc96844307a41ff4bc8aa8ebd7459

+3
source

You can get this information with the _mapping endpoint of the _mapping API, see https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-mapping.html

The get mapping API allows you to retrieve mapping definitions for an index or index / type.

GET / twitter / _mapping / tweet

With curl: curl [elasticsearch adress]/[index]/_mapping?pretty

+1
source

It’s just a quick way to get a relative score in Kiban without writing a script (I don’t believe that it is 100% accurate, but it’s an easy way to determine why your dynamic fields explode to huge values ​​for some reason).

Run this query in the Kibana developer tools.

GET/index_name/_mapping

Inside Kibana output, search for all instances of "type" (including quotation marks). This will count the number of cases and give you an answer. (In this example, 804)

enter image description here

This can be useful if you scratch your head about why you get the error [remote_transport_exception]

Limit of total fields [1000] in index [index_name] has been exceeded

+1
source

A field can have more than one "type": for example,

 "datapath-id": { "fields": { "keyword": { "ignore_above": 256, "type": "keyword" } }, "type": "text" } 

We can ignore the "type" inside the "fields" to get the exact number of fields. One example is:

 import json def myprint(d, field_count): for k, v in d.iteritems(): if isinstance(v, dict): if k != "fields": field_count = myprint(v, field_count) else: print "{0} : {1}".format(k, v) field_count += 1 return field_count with open("output/mappings.json") as f: d = json.load(f) final_field_count = myprint(d, field_count=0) print "field count", final_field_count 
0
source

You can try this:

 curl -s -XGET "http://localhost:9200/index/_field_caps?fields=*" | jq '.fields|length' 
0
source

Source: https://habr.com/ru/post/1259716/


All Articles