Initial Concepts
What is event streaming?​
Imagine event streaming as your digital body's nervous system. Just like your nerves capture signals from all over your body and send them to your brain for processing, event streaming captures information from various sources in real time.
What is it in practice?
It's a way to collect, move, and analyze data as it happens:
- Collection: Captures information in real time from:
- Databases
- Sensors and devices
- Applications and websites
- Cloud services
- etc
- Transport: Moves this data immediately (like a digital conveyor belt)
- Processing: Analyzes or reacts to this data instantly or later.
Why is it important?
- Always-on businesses: Enables companies to operate and respond 24/7
- Automation: Software can react to other software automatically
- Fast decisions: Provides information at the right moment for immediate action
Practical example:
When you use a rideshare app:
- The driver's app continuously sends their location (events)
- The streaming system processes these events
- Your app receives real-time updates about the driver's location
- The system can automatically react (routing, pricing, etc.)
Event streaming ensures that all these pieces of information flow constantly, allowing everything to work smoothly and connected.
Kafka combines three main capabilities to implement end-to-end event streaming use cases with a single battle-tested solution:
- Publishing (producer) and consuming (consumer) event streams, including continuous import/export of data from other systems.
- Durable and reliable storage of event streams for as long as desired.
- Processing event streams as they occur or retrospectively.
All this functionality is provided in a distributed, highly scalable, elastic, fault-tolerant, and secure manner.
Advantages of using Kafka​
- Data is published only once.
- Interested consumers can subscribe and consume what interests them.
- Producers and consumers are decoupled, able to work at different paces.
- Consumers can read data more than once.
- Producer unavailability doesn't affect the process.
- High availability and capacity with clustering and partitioning capabilities.
If Kafka were scaled vertically, problems would arise such as:
- Need for downtime for upgrades.
- Small scaling limit.
- Requires complex configurations.
- Compatibility issues.
What to cover in this study?​
Development​
Kafka offers libraries for using functions in practically all languages.
One of the main players that develops and even offers Kafka as a service is Confluent.
See here a list of libraries and find your desired language, including code examples for usage.
DevOps​
This is the main focus of our study:
- Understand Kafka's architecture and provision it in the best possible way.
- Understand the limits and what configuration would be ideal for each scenario.
- Manage topics and access permissions through RBAC.
- Provide the necessary tools so developers can do their work easily and objectively, being able to debug their production and consumption payloads.