Instrumentation vs Collection
Instrumentation is the process of generating telemetry data in your code. It's like "installing sensors" in your application that will create traces, metrics and logs.
There are two main types of instrumentation:
-
Automatic: Using libraries that already automatically instrument common frameworks and libraries. OTel has been growing and gaining this automation for languages gradually. Today we only have it available for languages: .NET, Go, Java, JavaScript, PHP, Python. See https://opentelemetry.io/docs/getting-started/ops/ if you want to start quickly if your project's language is already automated.
-
Manual: When you add specific code to generate telemetry in important parts of your application. In this case we have several libraries available for various languages. Check https://opentelemetry.io/docs/getting-started/dev/ for more information.
Collection is the process of capturing this generated data and sending it to a destination. It's performed by the OpenTelemetry Collector which will receive telemetry data from instrumented applications, process this data (can filter, transform, aggregate) and export to observability systems (like Jaeger, Prometheus, etc).
The Collector functions as a central intermediary (a Hub) that receives data from multiple sources, standardizes the format and forwards to one or more destinations.
Instrumentation GENERATES the data (is where data is born) and Collection PROCESSES and FORWARDS this data (is how data travels).
The Collector is a separate component from the application. It can be an independent binary, a container that functions as a sidecar, a service in the operating system, etc. The Collector was designed to be an independent component and function as a centralized service that can receive data from multiple applications on the same OS.
If you need something lighter and integrated with the application, OTel offers the "exporter" concept that can send data directly to your observability system. The Exporter is not as powerful as the collector and is much more limited. The ideal is to always point to a collector, but in some specific architecture cases using the exporter might make sense.
Collector as Sidecar vs Centralized Collectorβ
It's good to explain this because you're probably thinking you should have a huge collector waiting for everyone to send signals to it.
Sidecar Patternβ
We use this architecture when we need high performance, strict security, each application has specific processing needs and cluster resources are not a limitation.
Pod
βββ Application Container
βββ Collector Container (sidecar)
Sidecar Advantages
- Isolation
- Each application has its own collector
- Failures don't affect other applications
- Easier to debug problems
- Performance
- Local communication (within the same pod)
- Lower latency
- Less network traffic
- Security
- Data doesn't travel through the network before being processed
- Easier to control access
Centralized Collectorβ
We go for this architecture when we need:
- Resource economy
- Similar applications
- Standardized configuration
- Simplicity of maintenance and updates
Kubernetes Cluster
βββ Critical Pod (with sidecar)
βββ Normal Pod (without sidecar)
βββ Central Collector
Changes affect all applications at once.
Nothing prevents us from using both things and forming a hybrid. One collector per node? Per namespace? There is no right and wrong, it depends on many factors. But if I could give you a tip, sidecar never fails!