Grafana Agent and OpenTelemetry Operator

It's interesting to only read about this after mastering OpenTelemetry.

In Kubernetes, we have the OpenTelemetry Operator that can automatically inject instrumentation into the application using an init container and spins up a pre-configured collector as a sidecar like Istio.

To configure the collector, we have OpenTelemetry-specific custom resources. It's a very interesting proposal.

However, if we say that Grafana Agent can function as a collector, should we have the sidecar? We can and can't, but Grafana Agent doesn't have the part that automatically injects instrumentation.

OpenTelemetry Operator and Grafana Agent serve different purposes, even though there are some overlaps:

OpenTelemetry Operator:

Performs automatic instrumentation of applications
Automatically configures sidecars
Manages the instrumentation lifecycle
Facilitates configuration via CRDs (Custom Resources)

Grafana Agent:

Focus on collecting and sending data to Grafana Cloud
Can receive OTLP data, but doesn't do instrumentation
Doesn't have the capability to inject instrumentation

So, even when using Grafana Agent, you might still want the OpenTelemetry Operator for:

Automatic instrumentation of applications
Management via Kubernetes native (CRDs)
Consistent configuration across applications

A possible architecture would be:

App + Auto-instrumentation (via Operator) --→ Grafana Agent --→ Grafana Cloud

The Operator takes care of instrumentation, while the Agent takes care of sending to Grafana Cloud.

For example, we would configure our Grafana Agent here to receive all types of data.

apiVersion: monitoring.grafana.com/v1alpha1
kind: GrafanaAgent
metadata:
  name: grafana-agent
spec:
  flow:
    # OTLP receiver for traces and logs
    otelcol.receiver.otlp "default" {
      grpc {
        endpoint = "0.0.0.0:4317"
      }
      http {
        endpoint = "0.0.0.0:4318"
      }
    }

    # Configuration for Tempo (traces)
    otelcol.exporter.otlp "tempo" {
      client {
        endpoint = "tempo-prod-XX.grafana.net:443"
        auth {
          basic {
            username = "your-username"
            password = "your-api-key"
          }
        }
      }
    }

    # Configuration for Loki (logs)
    loki.write "default" {
      endpoint {
        url = "https://logs-prod-XX.grafana.net/loki/api/v1/push"
        basic_auth {
          username = "your-username"
          password = "your-api-key"
        }
      }
    }

    # Kubernetes Service Discovery
    discovery.kubernetes "pods" {
      role = "pod"
    }

    # Configuration for Prometheus (metrics)
    prometheus.remote.write "default" {
      endpoint {
        url = "https://prometheus-prod-XX.grafana.net/api/prom/push"
        basic_auth {
          username = "your-username"
          password = "your-api-key"
        }
      }
    }

    prometheus.scrape "default" {
      targets = discovery.kubernetes.pods.targets
      forward_to = [prometheus.remote.write.default.receiver]
      clustering {
        enabled = true
      }
    }

Then we could simply configure an operator instrumenter like this:

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: app-instrumentation
spec:
  exporter:
    endpoint: http://grafana-agent:4317  # For traces and logs
  metrics:
    enable: true
    prometheusExporter:
      port: 8080  # Port that exposes /metrics
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "1"

Application (instrumented by Operator)
  ├──→ /metrics endpoint --→ Grafana Agent scrape --→ Prometheus Cloud
  ├──→ traces OTLP --------→ Grafana Agent --------→ Tempo Cloud
  └──→ logs OTLP ----------→ Grafana Agent --------→ Loki Cloud

We configure the grafana agent to scrape all applications in the cluster.

Nothing prevents you from not wanting to use it this way and also placing a collector and configuring it to send data to your grafana. The operator has a custom resource just for collector configuration as a sidecar, but knowing that the grafana agent also works, we can reduce many extra containers in the pods.

This only makes sense if you're using Grafana, if your stack is different, don't worry about this.