Skip to main content

Prometheus

prometheus

Prometheus is an open-source systems monitoring and alerting toolkit with an active ecosystem. It is the only system directly supported by Kubernetes and the de facto standard throughout the cloud-native ecosystem. It is a CNCF graduated open source project written in Go.

  • Stores and collects metrics data in time series, meaning timestamped with the date and time they were recorded.
  • Can use tags for metrics storage.

Github Prometheus

Features​

  • Multidimensional data model with time series data identified by metric name and key/value pairs.
  • Uses the PromQL language which is more powerful than SQL for monitoring.
  • Independent and autonomous nodes without using a distributed system.
  • Metrics are collected via HTTP. Prometheus goes to the application and collects from the specified endpoint.
  • Metrics delivery can use an intermediate gateway.
  • Targets can be discovered via service discovery or static configuration.
  • Supports various market dashboards.
  • Alert configuration.

Metrics​

Metrics are numerical measurements of data related to elements of your software or infrastructure. Typically, this numerical data is organized along a timeline.

Time series means that changes are recorded over time.

System MetricsBusiness Metrics
LatencyNumber of users accessing application
Number of RequestsNumber of invoices issued
Resource ConsumptionPurchases of a specific product
Most accessed APIsDaily revenue
Number of errors

For example, the application might slow down when the number of requests is high. If you have the request count metric, you can identify the reason and increase the number of servers to handle the load.

Metrics are not logs

MetricsLogs
Numerical dataTextual data
GraphsError messages
AggregationsInformation
PerformanceSearchable

Prometheus vs InfluxDB​

These are the two main monitoring tools today. What's the difference and when to use each?

What both do the same:

  • Data compression
  • Multidimensional data (Prometheus uses labels and Influx uses tags)
  • Alert system (Alertmanager in Prometheus and Kapacitor in InfluxDB)
  • Query language to interact with and analyze metrics
  • Both integrate with many other tools. Prometheus slightly more, but the vast majority of what we need exists in both. Integration gaps can be solved with webhooks in both.
  • Large developer community. Prometheus slightly larger.

Differences between them:

  • InfluxDB tends to be used as a time-series database and Prometheus tends to be used more for monitoring purposes.
  • InfluxDB has its own FluxQL language and Prometheus has PromQL.
    • PromQL is easier and developed for monitoring, alerting, and graphing purposes. The Prometheus database software automatically assumes many things about our query and we don't need to provide all the steps.
    • FluxQL requires passing more parameters because InfluxDB is a general-purpose time-series database, while Prometheus was developed especially for monitoring.
  • InfluxDB supports float64, int64, bool, and string types and Prometheus supports float64 and strings.
  • InfluxDB writes data with timestamps down to nanoseconds and Prometheus in milliseconds.
  • InfluxDB does not periodically pull metrics from the target system; it expects an application to push data to it. Prometheus can pull metrics from the target system.
  • InfluxDB has issues with high memory and CPU consumption compared to Prometheus.
  • InfluxDB aims to store all types of data for a long time while Prometheus is designed to store data by default for only 15 days, but this can be changed.
  • Configuration in InfluxDB is done through API calls while in Prometheus it's through YAML files. In this case, configuration with Prometheus is easier. In InfluxDB, it's difficult to ensure changes are idempotent because you can invoke the configuration script multiple times.

InfluxDB is a more powerful tool for large data volumes and general use. On the other hand, Prometheus has simpler installation and a larger community.

Either tool will serve you well. But keep in mind: if you just want to monitor something, why use a bazooka to kill a mosquito?

When not to use Prometheus?​

Prometheus's compression of old data means precision isn't 100%. It calculates averages and replaces data to reduce disk usage. This can be configured, but disk consumption will increase significantly.

If you need 100% precision, Prometheus is not a good choice, as the collected data will likely not be as reliable. In this case, it would be better to use some other system to collect and analyze data that needs 100% precision and Prometheus for the rest of the monitoring.