Effect of using 100-character string as _Id column in Elastic Search

I plan to store events in an elastic search. It can have about 100 million events at any given time. To disable events, I plan to create a _id column with a length of 100 characters, combining below the entity_id field - UUID (37 characters) + event_creation_time (30 characters) + event_type (30 characters)

This store will have normal reads and writes along with aggregate requests (without updates / deletes) Could you tell me if there will be any performance impact or any other side effects of using such long columns of the _id string instead of the default identifiers .

Thanks Harish

+4
source share
1 answer

_idBy default, it is not indexed or saved , so there is no performance problem storage.

Since you will be indexing millions of documents, the only serious performance issue you will encounter is this bulk indexing. You have to make sure there is one sequential patternfor your _ids. From Documents

  • If you do not have a natural identifier for each document, use the Elasticsearchs automatic identifier function. It is optimized to avoid the version because the auto-generated identifier is unique.
  • , Lucene. , UUID-1, ; , . , , UUID-4, , Lucene.

, Lucene committer Michael McCandless _id IMO, , .

, !

+2

Source: https://habr.com/ru/post/1622622/


All Articles