What is the recommended setting for an Elasticsearch cluster that contains data on a TB scale and above?

Question

What is the recommended setting for an Elasticsearch cluster that contains data on a TB scale and above?

I currently have several Elasticsearch nodes running on several bare metal machines containing TB-sized indexes. We are in the process of restructuring our infrastructure, and I'm not sure if this is the best way.

I looked at Docker, Mesos, and Vagrant as alternatives, but I'm not sure if they are even possible. I think there are four situations (along with what I had):

Mesos-Elasticsearch : This package launches Elasticsearch on Mesos. This seems great, but it looks like it allows you to scale data nodes with a small disk size. In addition, there are no hosts / clients. At the moment, the Gythub package is pretty alpha - I got the error message "No route to the host" and MasterNotDiscoveredException when configured by default. Anyone have experience with this?
Docker . I am not very familiar with containers, but Dockerhub has several containers for Elasticsearch. In addition, Mesos allows you to run containers on top of it. I am concerned about the low disk space in each container, as my data is TB-scale. In addition, data is saved. Is container disk resizing possible or is there a different installation for Docker containers?
Vagrant VMs . I would suggest that for each ES node, a virtual machine is required to host resources. Are there any significant advantages to this compared to running bare metal? This does not seem to be compatible with Mesos.
Bare metal . This is the current setting.

I would like to know which of the four is preferred for the Elasticsearch cluster at TB level. Pros and cons of each option?

+5

docker vagrant database-design elasticsearch mesos

Captainmcchinchillas Mar 15 '16 at 14:44

source share

2 answers

My company has more or less the same questions, but maybe it’s even deeper when it comes to having POC, etc.

We are currently launching a 3 node ES cluster on Mesos 0.27.1 through Marathon with a custom Docker image . We mount host volumes (paths) in containers, which means that you can mount, for example, Ceph volume on a Mesos Slave host. But this is somehow a completely manual process. The biggest problem is data security, because by default the data is stored only on the host itself and the behavior when the application is scaled in a marathon (you must use restrictions so that only one node is launched for the slave Mesos, etc.).

We also tried the mentioned Mesos ES structure several months ago, but then we were not satisfied with the state of the structure. From what I see on docs , it has improved significantly over the past months, but some important functions are still missing in Mesos (support for persistent volumes, for example, for Docker toner drivers) ... But this is not a structure problem, but with Mesos.

I will give the Mesos framework another try soon. I especially like the ability to set --externalVolumeDriver , which means that now we can probably use the Docker RBD volume driver (since we use Ceph) ...

+1

Tobi Mar 16 '16 at 9:17

source share

Phil winder · Accepted Answer · 2016-03-16T08:29:36+0000

I am the author of the Apache Mesos Elasticsearch Framework. I would advise you to try all these approaches and choose what you have the best experience. And when it comes to performance, make sure you have performance requirements, and then run the tests. There are other things to consider. What I will touch upon in questions.

The Elasticsearch Framework for mesos is the most robust of these four options. Elasticsearch nodes run as Mesos tasks. If any tasks fail (hardware or software failures), they are restarted somewhere else in the Mesos cluster. If you want to add nodes (to improve performance) or remove nodes (to reduce resource usage), it's as simple as sending a single-line curl request. The data is very secure. The default configuration (may be overridden) replicates all data to all nodes. Thus, the cluster may suffer from a catastrophic event and leave one node, and not lose any data. You can also use any Docker toner plugin to write data to an external volume instead, so that if tasks die, the data is still contained in the cloud volumes. There are a few more features check out the website. Also check out the video on the youtube Container Solutions channel. We also develop tools to facilitate development, see minimesos .
This is perfectly reasonable, but you must think about how you would organize your cluster. And what happens if one or more containers die? Can you bear this loss? If so, this might be the best option for DevOps (i.e. you can replicate and test a cluster that looks like the real thing).
This is the only option that I would be against. This would be good for development, but you would see significant production productivity. You could potentially have a full VM stack (rogue) inside another virtual machine (cloud). Overhead is not needed. Link 1 , Link 2 .
This is the official Elastic recommended method and is likely to provide maximum performance for this hardware configuration. But since these are static deployments, a) a large part of computer resources will be wasted (unused RAM / CPU / etc.), B) there is a significant (especially in large organizations!) Delay in the provision of new instances (compared to one api ) and c) if the copy fails, it will not be replaced and will not be fixed until someone fixes it (compare with automatic fault tolerance). If your Elasticsearch requirements are fixed, you don’t need the flexibility like DevOps, and you are not against a little downtime, then this is probably the easiest method (but make sure that you have configured your ES configuration correctly).

So, if it were me, I would think of a docked installation for testing, small POCs, and possibly very small production tasks. Something more than then, I would choose the Mesos Elasticsearch option each time.

What is the recommended setting for an Elasticsearch cluster that contains data on a TB scale and above?

More articles: