Skip to main content

Common Questions

I want to draw a parallel while leveraging OTel a bit more and clarify some doubts.

Speaking about metrics, without logs or traces

When we have a Kubernetes cluster, a server, and hardware infrastructure in general, it's normal to want to monitor and have a nice panel in Grafana to monitor the cluster and server health. For this, we use Node Exporter, which is a more focused and specific tool, developed especially to export metrics from Linux/Unix operating systems to Prometheus. It collects:

  • System metrics: CPU, memory, disk, network
  • Kernel metrics: interrupts, contexts
  • Hardware metrics: temperature, power
  • Other operating system-specific metrics

But it is not the recommended tool for monitoring an application.

Could we do this with OpenTelemetry? Yes, but Node Exporter is more recommended for cluster monitoring for several reasons:

  • Efficiency: It is lighter and specific for this function and has been optimized to collect system metrics with lower overhead on nodes.
  • Simplicity: It is simpler to configure and maintain and does not require instrumentation or additional code.
  • Integration: It is a tool with native integration with Prometheus
  • Metrics coverage: Node Exporter has predefined metrics that cover practically everything you need to monitor at the system level.

The ideal is to use both in a complementary way:

Node Exporter: for cluster and server metrics (CPU, memory, disk, etc)

OpenTelemetry: For application metrics, traces and logs

There is the OpenTelemetry Collector that can collect operating system metrics using the "hostmetrics" receiver. However, there are some important considerations:

  • The OpenTelemetry hostmetrics receiver collects metrics similar to Node Exporter (CPU, memory, disk, network, etc), but needs to be explicitly configured.

    • receivers:
      hostmetrics:
      collection_interval: 30s
      scrapers:
      cpu:
      memory:
      disk:
      filesystem:
      network:
      load:
      process:

However, most people still prefer Node Exporter even if they have a collector per node in the cluster because:

  • It is more mature and tested in production.
  • It has a larger community and more ready-made dashboards in Grafana.
  • It is lighter and specific for this function.
  • It natively integrates with the Prometheus ecosystem.

If you want to experiment, you can use the OpenTelemetry Collector with hostmetrics, but be aware that:

  • You will have to do more configurations.
  • You may have fewer metrics available compared to Node Exporter.
  • You will find fewer resources and examples in the community.

Now speaking about applications and not the operating system, OpenTelemetry also collects metrics, in addition to logs and traces.

The same Prometheus (which only serves metrics) can fetch both Node Exporter and OpenTelemetry metrics. Remember that Prometheus is a time-series database and has nothing to do with logs and traces.

Prometheus:

  • Prometheus does SCRAPE (scraping) of metrics through /metrics endpoints.
  • It pulls metrics, it does not receive (push)
  • Both Node Exporter and OpenTelemetry expose endpoints that Prometheus scrapes.
Node Exporter exposes /metrics ←-- Prometheus scrapes

OpenTelemetry exposes /metrics ←-- Prometheus scrapes

OpenTelemetry routes the different types of data (signals) to the correct places.

The OpenTelemetry Collector is the key piece that handles data routing:

For metrics in Prometheus:

  • The Collector exposes a /metrics endpoint in Prometheus format
  • Prometheus scrapes this endpoint
  • The Collector converts data from OpenTelemetry format to Prometheus format.

For logs in Loki for example:

  • The Collector actively sends (push) logs to Loki
  • Uses Loki's protocol for sending logs
  • Can do transformations and filters on logs before sending

For traces in Jaeger or GrafanaTempo:

  • The Collector sends (push) traces via gRPC or HTTP
  • Supports different trace protocols (OTLP, Jaeger, Zipkin)
  • Can do sampling and processing of traces

Collector configuration example:

receivers:
otlp:
protocols:
grpc:
http:

processors:
batch:

exporters:
prometheus:
endpoint: "0.0.0.0:8889"
loki:
endpoint: "http://loki:3100/loki/api/v1/push"
jaeger:
endpoint: "jaeger:14250"

service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [batch]
exporters: [loki]
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger]

Your application sends all data (metrics, logs, traces) to the Collector using the OTLP protocol, and the Collector takes care of distributing to the correct services using the appropriate protocols.

If you are using the OpenTelemetry Exporter directly in the application without the Collector, you need to configure each exporter separately in your application:

For metrics in Prometheus you need to configure the PrometheusExporter in the application that will expose a /metrics endpoint for Prometheus to scrape.

For logs in Loki you need to configure a LokiExporter that will push directly to Loki (You need to configure Loki's URL and authentication)

For traces in Jaeger you have to configure the JaegerExporter that sends traces directly to Jaeger, requiring you to point to Jaeger's endpoint.

Just an illustration here in Python of what a quick configuration would look like.

from opentelemetry.exporter.prometheus import PrometheusMetricReader
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider

# Configuring metrics for Prometheus
metric_reader = PrometheusMetricReader()
provider = MeterProvider(metric_readers=[metric_reader])

# Configuring traces for Jaeger
jaeger_exporter = JaegerExporter(
agent_host_name="localhost",
agent_port=6831,
)
trace_provider = TracerProvider()
trace_provider.add_span_processor(jaeger_exporter)

The main disadvantage of not using the Collector is having to configure each exporter in the application bringing more complexity to the code, less flexibility for changes in these services and more direct connections between the applications and the different backend services of these other tools.

That's why the Collector is recommended in production environments, as it centralizes this complexity.