Instrumentation vs Collection

Instrumentation is the process of generating telemetry data in your code. It's like "installing sensors" in your application that will create traces, metrics and logs.

There are two main types of instrumentation:

Automatic: Using libraries that already automatically instrument common frameworks and libraries. OTel has been growing and gaining this automation for languages gradually. Today we only have it available for languages: .NET, Go, Java, JavaScript, PHP, Python. See https://opentelemetry.io/docs/getting-started/ops/ if you want to start quickly if your project's language is already automated.
Manual: When you add specific code to generate telemetry in important parts of your application. In this case we have several libraries available for various languages. Check https://opentelemetry.io/docs/getting-started/dev/ for more information.

Collection is the process of capturing this generated data and sending it to a destination. It's performed by the OpenTelemetry Collector which will receive telemetry data from instrumented applications, process this data (can filter, transform, aggregate) and export to observability systems (like Jaeger, Prometheus, etc).

The Collector functions as a central intermediary (a Hub) that receives data from multiple sources, standardizes the format and forwards to one or more destinations.

Instrumentation GENERATES the data (is where data is born) and Collection PROCESSES and FORWARDS this data (is how data travels).

The Collector is a separate component from the application. It can be an independent binary, a container that functions as a sidecar, a service in the operating system, etc. The Collector was designed to be an independent component and function as a centralized service that can receive data from multiple applications on the same OS.

If you need something lighter and integrated with the application, OTel offers the "exporter" concept that can send data directly to your observability system. The Exporter is not as powerful as the collector and is much more limited. The ideal is to always point to a collector, but in some specific architecture cases using the exporter might make sense.

Collector as Sidecar vs Centralized Collector

It's good to explain this because you're probably thinking you should have a huge collector waiting for everyone to send signals to it.

Sidecar Pattern

We use this architecture when we need high performance, strict security, each application has specific processing needs and cluster resources are not a limitation.

Pod
├── Application Container
└── Collector Container (sidecar)

Sidecar Advantages

Isolation
- Each application has its own collector
- Failures don't affect other applications
- Easier to debug problems
Performance
- Local communication (within the same pod)
- Lower latency
- Less network traffic
Security
- Data doesn't travel through the network before being processed
- Easier to control access

Centralized Collector

We go for this architecture when we need:

Resource economy
Similar applications
Standardized configuration
Simplicity of maintenance and updates

Kubernetes Cluster
├── Critical Pod (with sidecar)
├── Normal Pod (without sidecar)
└── Central Collector

Changes affect all applications at once.

Nothing prevents us from using both things and forming a hybrid. One collector per node? Per namespace? There is no right and wrong, it depends on many factors. But if I could give you a tip, sidecar never fails!

Collector as Sidecar vs Centralized Collector​

Sidecar Pattern​

Centralized Collector​

Collector as Sidecar vs Centralized Collector

Sidecar Pattern

Centralized Collector