Usecases: InfluxDB vs Prometheus

Following the Prometheus webpage, one of the main differences between Prometheus and InfluxDB is the usecase: while Prometheus saves time series, only InfluxDB is better focused on saving individual events. Since a lot of work has been done on the InfluxDB engine, I wonder if this is true.

I want to set up a time series database and, apart from the push / push model (and possibly the performance difference), I don’t see the big thing that separates both projects. Can someone explain the difference in size?

+45
database influxdb prometheus
Oct 26 '15 at 16:03
source share
4 answers

CEO and developer of InfluxDB. The next version of InfluxDB (0.9.5) will have a new storage engine. Thanks to this engine, we will be able to efficiently store either the data of one event or regularly recurring series. those. irregular and regular time series.

InfluxDB supports int64, float64, bool and string data types, using different compression schemes for each of them. Prometheus only supports float64.

For compression, version 0.9.5 will have compression compatible with Prometheus. In some cases, we will see better results, because we change the compression at timestamps based on what we see. The best random scenario is a regular series counted at exact intervals. In cases where by default we can compress timestamps of 1k points as an 8-byte start time, delta (zigzag encoded) and counter (also zigzag encoded).

Depending on the form of data that we saw, 2.5 bytes per point on average after lumps.

YMMV based on timestamps, data type and data form. For example, random floats with nanosecond timestamps with large variable deltas will be the worst.

Variable accuracy in timestamps is another feature InfluxDB has. It can be a second, millisecond, microsecond or nanosecond scale. Prometheus is fixed in milliseconds.

Another difference is that the entries in InfluxDB are strong after the client sends a success response. Prometheus buffers are written to memory and, by default, erased every 5 minutes, which opens a window of potential data loss.

Our hope is that after the release of 0.9.5 InfluxDB, it will be a good choice for Prometheus users to use as a long-term metrics repository (with Prometheus). I'm pretty sure that support is already in Prometheus, but until release 0.9.5 drops, it can be a little rocky. Obviously, we will have to work together and do a bunch of testing, but this is what I hope for.

If single servers were swallowed, I would expect Prometheus to have better performance (although we did not test here and did not have a number) because of their more limited data model and because they do not add disk writes before they are written from the index .

The query language between them is very different. I'm not sure that they support what we have not yet or vice versa, so you will need to delve into the documents on both to see if there is something you can do that you need. In the long run, our goal is for InfluxDB query functionality to be a superset of Graphite, RRD, Prometheus, and other time series solutions. I say supernets because we want to cover them later in addition to more analytic functions. Obviously, it will take us a while to get there.

Finally, the longer-term goal of InfluxDB is to maintain high availability and horizontal scalability through clustering. The current clustering implementation is not yet complete and is only in alpha. However, we are working on this, and this is the main goal of the project for the project. Our clustering structure is that the data is ultimately consistent.

As far as I know, Prometheus' approach is to use double-entry for HA (so there is no guarantee of consistency) and use federation for horizontal scalability. I'm not sure how the request to federated servers will work.

Inside an InfluxDB cluster, you can request server boundaries without copying all the data over the network. This is because each request is split into a MapReduce job, which runs on the fly.

Probably more, but this is what I can think of at the moment.

+60
Oct 27 '15 at 23:42 on
source share

We have a marketing message from two companies in other answers. Now let him ignore him and return to the sad real world of time series.

Some story

InfluxDB and prometheus were made to replace the old tools of the past era (RRDtool, graphite).

InfluxDB is a time series database. Prometheus is a metrics collection and alert tool with a storage engine written just for that. (I'm not really sure that you could [or should] reuse the storage engine for anything else)

Limitations

Unfortunately, creating a database is a very difficult undertaking. The only way both of these devices can send something is to reset all the hard functions associated with high availability and clustering.

Let's say this is one application in which only one node is running.

Prometheus does not have the goal of supporting clustering and replication in general . The official way to support fault tolerance is to "launch 2 nodes and send data to both of them." Uch. (Note that this is seriously ONLY the existing way, he wrote many times in the official documentation).

InfluxDB talked about clustering for years ... until it was officially canceled in March. Clustering is no longer included in the table for InfluxDB . Just forget it. When this is done (suppose it is ever), it will be available only in the Enterprise Edition.

https://influxdata.com/blog/update-on-influxdb-clustering-high-availability-and-monetization/

Over the next few years, we hope a well-designed time series database will be created that will handle all the complex database problems: replication, disaster recovery, data security, scalability, backup ...

There is currently no silver bullet.

What to do

Estimate the amount of expected data.

100 metrics * 100 sources * 1 second => 10,000 dots per second => 864 megapits per day.

The best part about time series databases is that they use a compact format, they compress well, they combine data points and clear old data. (In addition, they have functions related to time series data.)

Suppose a datapoint is treated as 4 bytes, it is only a few gigabytes per day. Fortunately for us, there are systems with 10 cores and 10 TB drives. This could probably work on a single node.

An alternative is to use the classic NoSQL database (Cassandra, ElasticSearch or Riak), and then develop the missing bits in the application. These databases cannot be optimized for this type of storage (or they: modern databases are so complex and optimized that they cannot know for sure if they are not defined).

You must evaluate the capacity required by your application . Write proof of concept with these various databases and measure things.

See if it falls within the limitations of InfluxDB. If so, this is probably the best choice. If not, you will need to make your own decision on top of something else.

+19
Jul 16 '16 at 1:53 on
source share

InfluxDB simply cannot hold production workloads (metrics) from 1000 servers. It has some real problems with swallowing data and ends up being inhibited / hanged and unusable. We tried to use it for a while, but as soon as the amount of data reached a certain critical level, it could no longer be used. The memory or processor update did not help. Therefore, our experience definitely avoids this, it is not a mature product and has serious problems of architectural design. And I'm not even talking about a sudden switch to Influx advertising.

Next, we examined Prometheus and, although he demanded to rewrite the requests, he now swallows 4 times more indicators without any problems, compared to what we tried to submit to Influx. And all this download is processed by one Prometheus server, it is fast, reliable and reliable. This is our experience with a huge international online store under a rather heavy load.

+10
Nov 23 '16 at 2:54 on
source share

The current implementation of Prometheus IIRC is for all data feeds on a single server. If you have a gigantic amount of data, it may not be suitable for Prometheus.

+4
Oct 26 '15 at 16:09
source share



All Articles