What is Apache Kafka Used For? A Practitioners Guide to Real-World Kafka
You're building something that needs to move data. Fast. Reliably. At scale.
Maybe it's a fraud detection system that needs to process 50,000 transactions per second. Maybe it's a logistics platform tracking 10 million GPS pings daily. Or maybe you're just tired of watching your microservices fall over when one downstream service goes down.
I've been there. In 2018, we were building a real-time inventory system at a retail company I consulted for. We tried RabbitMQ first. Then Redis streams. Both broke at around 30,000 events per second with our data shape. Kafka handled 120,000 without blinking.
That's what is Apache Kafka used for? — moving data at scale, reliably, between systems that don't know about each other.
Let me show you what that actually looks like in practice.
The 30-Second Definition (Skip This If You Know Kafka Basics)
Apache Kafka is a distributed event streaming platform. Originally built at LinkedIn in 2011, open-sourced, now run by Confluent. It's written in Scala and Java.
At its core: producers write immutable event logs to topics. Consumers read from those topics. The events persist on disk, so consumers can replay history. Messages don't disappear after being read — that's the killer feature.
Kafka isn't a queue. Most people think it is. They're wrong. Queues delete messages after consumption. Kafka keeps them. This changes everything about how you design systems.
The Real Use Cases (Where Kafka Shines)
Real-Time Data Pipelines
This is the most common thing I see. Companies like Uber, Netflix, and Pinterest use Kafka as the backbone for moving data between hundreds of microservices.
Here's a pattern we use at SIVARO for a fintech client:
python
# Producer example (Python with confluent-kafka)
from confluent_kafka import Producer
import json
conf = {'bootstrap.servers': 'kafka1:9092,kafka2:9092',
'acks': 'all',
'retries': 3}
producer = Producer(conf)
# Send transaction events
transaction = {
'user_id': 'u_78912',
'amount': 149.99,
'timestamp': 1705000000,
'merchant': 'Acme Corp'
}
producer.produce('transactions_raw',
key=str(transaction['user_id']),
value=json.dumps(transaction))
producer.flush()
Why Kafka over direct HTTP calls? Because when downstream fraud detection takes 2 seconds per request, you don't want your checkout flow blocking. Kafka decouples production from consumption. The checkout service writes to Kafka and returns instantly. Fraud detection reads at its own pace.
We tested this in production. Direct HTTP calls added 180ms p99 latency to checkout. Kafka added 2ms. The fraud team got their data within 300ms anyway. Win-win.
Event Sourcing and CQRS
Here's where Kafka changes how you think about state.
Most applications store current state in a database. You update a row, transaction commits, done. But you lose history. You can't ask "what was the state three hours ago?" without complex CDC setups.
Kafka's log-based storage makes event sourcing natural. Every state change is an event. The current state is just the aggregate of all events.
We built a shipment tracking system using this pattern. Each status change — "picked up", "in transit", "out for delivery", "delivered" — is an event. New services can read the entire history and build their own view.
java
// Java consumer rebuilding shipment state from events
Properties props = new Properties();
props.put("bootstrap.servers", "kafka:9092");
props.put("group.id", "shipment-state-builder");
props.put("enable.auto.commit", "true");
KafkaConsumer<String, ShipmentEvent> consumer =
new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("shipment_events"));
Map<String, ShipmentState> currentState = new HashMap<>();
while (true) {
ConsumerRecords<String, ShipmentEvent> records =
consumer.poll(Duration.ofMillis(100));
for (ConsumerRecord<String, ShipmentEvent> record : records) {
ShipmentEvent event = record.value();
ShipmentState state = currentState.getOrDefault(
event.shipmentId, new ShipmentState());
state.apply(event); // Rebuild state from event
currentState.put(event.shipmentId, state);
}
}
This pattern saved us when a new stakeholder wanted "shipment velocity by carrier" — data we hadn't captured explicitly. Because we had event history, we replayed the last 6 months of events, computed velocity, and had the answer in 2 hours instead of 2 weeks.
Log Aggregation and Monitoring
Every microservice generates logs. Every server has metrics. Every application has traces.
Before Kafka? Teams set up Elasticsearch directly, hammering it with writes. Elasticsearch would fall over under load. Or they'd use syslog, losing data when the destination was down.
Kafka as a log buffer changes this.
We run this at SIVARO for our own infrastructure monitoring:
bash
# Filebeat -> Kafka -> Logstash -> Elasticsearch
# filebeat.yml configuration
filebeat.inputs:
- type: filestream
paths:
- /var/log/sivaro/*.log
output.kafka:
hosts: ["kafka1:9092", "kafka2:9092"]
topic: "sivaro-logs"
required_acks: 1
The key insight: Kafka absorbs traffic spikes. When Elasticsearch goes down for maintenance (which it does, regularly), logs accumulate in Kafka. When Elasticsearch comes back, it catches up. Zero data loss.
We process about 500GB of logs daily through a 3-broker cluster. The brokers handle 80MB/s sustained write throughput without breaking a sweat.
Stream Processing
This is the advanced use case. Not just moving data, but transforming it in flight.
Kafka Streams (the library) and ksqlDB (the SQL layer) let you process events as they arrive. No batch jobs. No lambda architecture. Just continuous computation.
Here's a real example: a ride-sharing client wanted surge pricing calculated in real-time. Every ride request, every driver location update — process them immediately to compute demand/supply ratios.
sql
-- ksqlDB stream for real-time surge pricing
CREATE STREAM ride_requests (
user_id VARCHAR,
pickup_zone VARCHAR,
request_time BIGINT
) WITH (KAFKA_TOPIC='ride_requests', VALUE_FORMAT='JSON');
CREATE TABLE surge_pricing AS
SELECT
pickup_zone,
COUNT(*) AS request_count,
TIMESTAMPTOSTRING(WINDOWSTART, 'HH:mm:ss') AS window_start
FROM ride_requests
WINDOW TUMBLING (SIZE 5 MINUTES)
GROUP BY pickup_zone
EMIT CHANGES;
The old approach: batch jobs every 15 minutes. Drivers would see outdated pricing. Customers would complain. With Kafka Streams, pricing updates in under 2 seconds.
Database Change Data Capture (CDC)
This is my favorite use case. Most people don't know Kafka can watch your database and stream changes.
Tools like Debezium (open source, Apache 2.0 license) connect to database transaction logs (binlog for MySQL, WAL for Postgres) and publish changes to Kafka.
Why? Because your monolithic database doesn't know about your microservices. When an order is placed in the monolith, 6 different services need to know. CDC makes that reliable.
json
// Debezium CDC event for a customer update
{
"payload": {
"before": null,
"after": {
"id": 1234,
"name": "Jane Smith",
"email": "jane@example.com",
"updated_at": 1705000000
},
"source": {
"db": "customers_db",
"table": "customers"
},
"op": "c" // 'c' = create, 'u' = update, 'd' = delete
}
}
We used this at a healthcare company to sync patient data between a legacy Oracle system and a new cloud-based analytics platform. The Oracle team had no interest in adding Kafka producers. CDC meant we didn't need their cooperation. We just read the transaction logs.
What Kafka is NOT Good For
Most people think Kafka is a message queue. They're wrong because it's designed for persistent event streams, not transient messages.
Don't use Kafka for:
- Request-reply patterns (use gRPC or HTTP)
- Small messages with exactly-once delivery (use Pulsar or RabbitMQ)
- Low latency under 10ms (Kafka adds 2-5ms overhead)
- Task queues (use Celery or Bull)
At first I thought this was a technical preference — turns out it's a cost issue. Running Kafka for 100 messages per minute is expensive overkill. The operational overhead of managing ZooKeeper (or KRaft now), broker configuration, and consumer group rebalancing isn't worth it for small workloads.
Trade-offs Nobody Talks About
Kafka is operationally expensive. Not just cloud costs. The expertise needed to tune it properly is rare. I've seen teams spend 3 months debugging consumer lag because they didn't understand partition assignment.
Consumer rebalancing sucks. When a consumer dies or a new one joins, Kafka pauses all consumers in that group. For 30 seconds. If you're processing 100K events/sec, that's 3 million events of lag. We've mitigated this with static group membership, but it's a hack.
Message size limits. Default max message size is 1MB. You can increase it, but then network buffers fill up, compression becomes less effective, and broker performance degrades. We learned this the hard way trying to push 10MB images through Kafka. Don't. Use S3 and pass references.
When Should You NOT Use Kafka?
I'll be blunt. Most teams don't need Kafka.
If you're processing less than 10,000 messages per second, and you don't need replay, and you don't have multiple downstream consumers — use RabbitMQ or Redis. You'll save months of operational headache.
Kafka adds complexity. It's a distributed system with 15+ configuration parameters that matter. One wrong "min.insync.replicas" setting and you lose data silently.
The threshold I use: if you don't have at least two of these requirements, Kafka is overkill:
- Multiple independent consumers for the same data
- Need to replay historical events
- Throughput > 50,000 messages/second
- Exactly-once semantics across systems
- 24/7 uptime with zero data loss
Common Questions (FAQ)
What is Apache Kafka used for in microservices?
Decoupling services. Service A writes events to Kafka. Services B, C, and D read from Kafka independently. If B goes down, A doesn't know or care. When B comes back, it catches up from where it left off. This saved us when a payment service crashed for 4 hours — the upstream order service kept running, and the payment service replayed 2 million events when it recovered.
Can Kafka replace a database?
No. But it can complement one. Kafka is for event streaming, not querying. You can't do ad-hoc SQL queries on Kafka topics efficiently. You need a separate query layer (ksqlDB helps, but it's not PostgreSQL).
How is Kafka different from RabbitMQ?
Three fundamental differences:
- Persistence. RabbitMQ deletes messages after acknowledgment. Kafka keeps messages based on retention policy (time or size).
- Ordering. RabbitMQ doesn't guarantee order across multiple consumers. Kafka guarantees order within a partition.
- Throughput. Kafka handles 10x the throughput of RabbitMQ for comparable hardware. We benchmarked both — RabbitMQ topped out at 150K messages/sec per node. Kafka hit 1.2M.
What is Apache Kafka used for in data engineering?
Data ingestion pipelines. Traditionally, data engineers used batch ETL jobs (run every night, process yesterday's data). Kafka enables streaming ETL — process data as it arrives, enrich it, send it to data warehouses or lakes with sub-second latency.
Does Kafka support exactly-once semantics?
Yes, since Kafka 0.11 (2017). It uses a combination of idempotent producers, transactional consumers, and idempotent sinks. But it's complex to configure correctly. We run exactly-once for financial transactions at one client. It works, but debugging issues is painful — you need to understand the transaction protocol deeply.
What's the smallest Kafka cluster you'd run in production?
3 brokers minimum. That gives you a quorum for commitment and tolerance of one broker failure. Anything smaller and you lose the resilience Kafka is known for.
How much does Kafka cost to run?
Real numbers from our 2024 infrastructure: A 3-broker cluster on AWS (m5.2xlarge instances) with 3TB of EBS storage and a ZooKeeper ensemble of 3 small instances runs about $2,500/month in compute and storage. Add 20% for networking and monitoring. That's before any license costs if you use Confluent Enterprise.
The Bottom Line
What is Apache Kafka used for? It's the backbone for systems that need to move data at scale, reliably, with replay capability. It's not a silver bullet. It's not for every problem. But when you have multiple systems that need to consume the same events, when you can't afford data loss, when throughput matters — Kafka is the best tool for the job.
Start small. Don't migrate your entire architecture to Kafka overnight. Pick one integration, run it for a month, measure the pain. If the pain of running Kafka is less than the pain of your current approach, scale it out.
That's how we built systems processing 200K events per second. One topic at a time.
Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec.