What Is Kafka Apache Used For? A Practitioner’s Guide to Real-World Streaming

You're building a system that needs to handle data in motion. Maybe it's clickstream events from a million users. Maybe it's IoT sensor readings from a factory floor. Maybe it's financial transactions that can't lose a single record.

Someone tells you: "Use Kafka."

But what is Kafka Apache used for, actually? Not the marketing version. The real one. The one where you're debugging a consumer lag at 2 AM after a deploy went sideways.

I've been running Kafka in production since 2017. At SIVARO, we've built data infrastructure for clients processing 200K events per second. I've watched Kafka save companies. I've watched it nearly kill them (misconfigured compaction, I'm looking at you).

Here's the truth: Kafka is a distributed commit log that lets you decouple data producers from consumers. That's it. But what you can do with that simple idea changes how you build software.

In this guide, you'll learn:

What Kafka actually is (skip the hype, read the architecture)
The 4 patterns where Kafka beats everything else
Where Kafka sucks (and what to use instead)
Real code examples from production systems
Answers to the questions people actually ask me

The One Sentence Definition

Apache Kafka is a distributed streaming platform that acts as a high-throughput, fault-tolerant commit log.

If that sounds like "database but weird" — you're not wrong. Kafka doesn't store data the way PostgreSQL does. It appends records to immutable logs and lets consumers decide their position. You can replay data from last Tuesday if you need to. Try that with RabbitMQ.

The core abstraction is dead simple:

Producer writes records to a topic
Consumer reads records from a topic
Broker is the server that stores and serves the data

But Kafka's power comes from what happens when you scale that across 20 machines with replication. That's where "what is kafka apache used for?" stops being a theoretical question and starts being a business advantage.

The 4 Patterns Kafka Solves (That Nothing Else Does as Well)

Real-Time Data Pipelines

Most people think Kafka is "fast messaging." They're wrong. Kafka is a buffer between systems that produce data and systems that consume it.

Consider this: You have a web app generating 50GB of clickstream data daily. You need that data in Snowflake for analytics, in Elasticsearch for search, and in a real-time dashboard for ops.

Without Kafka? Each consumer hits your web servers directly. When Snowflake runs its nightly load, your API latency spikes. When Elasticsearch reindexes, your database falls over.

With Kafka? Producers write to one topic. Consumers read independently. Each consumer tracks its own offset. Snowflake can lag by 6 hours. The dashboard needs sub-second latency. Doesn't matter — Kafka handles it.

I helped a fintech company replace a spaghetti of point-to-point integrations with Kafka. They went from 14 separate data pipelines to 3 Kafka topics. Their ops team stopped getting paged at 3 AM.

Event Sourcing and CQRS

Event sourcing means storing state as a sequence of events, not as the current value. Kafka is literally built for this.

Your bank account balance isn't a single number. It's a series of deposits, withdrawals, and transfers. The balance is derived from replaying those events. Kafka stores those events. Consumers compute the current state.

The pattern looks like this:

python
# Producer: Record an event
from kafka import KafkaProducer
import json

producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
                        value_serializer=lambda v: json.dumps(v).encode('utf-8'))

# Order placed event
event = {
    "event_type": "order_placed",
    "order_id": "ORD-12345",
    "user_id": "USR-678",
    "items": [{"sku": "WIDGET-A", "qty": 2}],
    "timestamp": "2024-01-15T10:30:00Z"
}

producer.send('order_events', key=b'ORD-12345', value=event)
producer.flush()

And the consumer computes the current state:

python
from kafka import KafkaConsumer
import json

consumer = KafkaConsumer('order_events',
                         bootstrap_servers=['localhost:9092'],
                         auto_offset_reset='earliest',
                         value_deserializer=lambda m: json.loads(m.decode('utf-8')))

order_state = {}
for message in consumer:
    event = message.value
    order_id = event['order_id']
    
    if event['event_type'] == 'order_placed':
        order_state[order_id] = {'status': 'placed', 'items': event['items']}
    elif event['event_type'] == 'payment_received':
        order_state[order_id]['status'] = 'paid'
    elif event['event_type'] == 'order_shipped':
        order_state[order_id]['status'] = 'shipped'
    
    print(f"Current state: {order_state}")

The killer feature? You can rebuild state from scratch by replaying all events. No database backup needed. No "how did this record get corrupted" debugging. Just replay the log.

Stream Processing

Here's where Kafka gets interesting. You don't just move data — you transform it in motion.

Kafka Streams lets you write Java code that processes records as they arrive. No Spark. No Flink. Just your application logic running inside your service.

I built a fraud detection system using Kafka Streams. We were processing 30K transactions per second. The rules were simple: flag any transaction where a user's spending rate exceeded 3x their 7-day average in under 5 minutes.

java
// Kafka Streams topology for fraud detection
StreamsBuilder builder = new StreamsBuilder();

KStream<String, Transaction> transactions = builder.stream("transactions", 
    Consumed.with(Serdes.String(), transactionSerde));

// Windowed aggregation: count transactions per user in 5-minute windows
KTable<Windowed<String>, Long> transactionCounts = transactions
    .groupByKey()
    .windowedBy(TimeWindows.of(Duration.ofMinutes(5)))
    .count();

// Join with historical average
KTable<String, Double> historicalAverages = builder.table("user_avg_spend",
    Consumed.with(Serdes.String(), Serdes.Double()));

transactionCounts.toStream()
    .leftJoin(historicalAverages,
        (windowedKey, count) -> windowedKey.key(),
        (count, avgSpend) -> {
            if (avgSpend != null && count > avgSpend * 3) {
                return new FraudAlert(windowedKey.key(), count, avgSpend);
            }
            return null;
        })
    .filter((key, alert) -> alert != null)
    .to("fraud_alerts", Produced.with(Serdes.String(), fraudAlertSerde));

That's production code. 200 lines total. Deployed as a single JAR. No cluster to manage.

Decoupling Microservices

This is the most common use case I see, and also the most abused.

Microservice A needs to tell Microservice B that something happened. Kafka sits between them. A publishes an event. B consumes it. Neither knows the other exists.

But here's the thing most tutorials don't tell you: Kafka adds latency. If you need sub-5ms response times for a user request, Kafka is the wrong choice. Use gRPC or HTTP for that. Kafka is for async workflows.

Where Kafka shines: when Microservice B is down. A publishes the event anyway. B catches up when it recovers. That's the "durable" part of durable messaging.

What Kafka Absolutely Sucks At

I'm not going to sell you a fairy tale. Kafka has sharp edges.

Small message throughput. Kafka is optimized for messages between 1KB and 1MB. If you're sending 4-byte messages at 1M/sec, Kafka chokes on metadata overhead. Use NATS or Redis for that.

Exactly-once semantics are expensive. Kafka supports exactly-once delivery. It also requires transactions and idempotent producers. Your throughput drops 40%. For most use cases, at-least-once plus deduplication in the consumer is cheaper and simpler.

Operational complexity. A 3-node Kafka cluster isn't hard. A 20-node cluster with 50 topics, 200 partitions, and cross-datacenter replication? That's a full-time job. I've seen teams burn 6 months just tuning Kafka configs.

Real-time latency under 10ms. Kafka can do it. But you'll pay in hardware. We tested 3ms p99 latency with SSDs, tuned OS buffers, and dedicated network interfaces. Most people don't need that. If you do, look at Apache Pulsar.

How We Actually Deployed Kafka at SIVARO

I'll walk you through a real deployment we did for a logistics company in 2023.

The problem: They had 5,000 delivery trucks sending GPS coordinates every 30 seconds. Their monolith couldn't handle the ingestion rate. They needed real-time tracking for customers and historical data for route optimization.

The solution: 6 Kafka brokers on c5.4xlarge instances. 4 topics:

gps_raw — all GPS data, 1-day retention
gps_enriched — GPS data joined with driver info, 7-day retention
delivery_events — status changes (picked up, delivered, exception), infinite retention
alerts — geofence violations, late deliveries, 30-day retention

The config that mattered:

yaml
# server.properties — the non-default settings that saved us
num.partitions=24
replication.factor=3
min.insync.replicas=2
default.replication.factor=3
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000

We set min.insync.replicas=2 to guarantee at least 2 replicas acknowledged every write. We lost one broker in week 3. Zero data loss. Zero downtime.

The failure we avoided: They originally wanted Kafka Connect for everything. Bad idea for high-throughput GPS. Kafka Source/Sink connectors are fine for 1K events/sec. At 30K/sec, you write your own producer. We did. Simpler, faster, debuggable.

What Is Kafka Apache Used For? — The FAQ People Actually Ask

Can Kafka replace a database?

No. Kafka is not a database. It's a log. You can use it for event sourcing (storing events), but you still need a queryable state store. Kafka Streams has state stores (RocksDB-based), but they're not made for ad-hoc queries. Use PostgreSQL or MongoDB for that.

How much data can Kafka handle?

LinkedIn runs 1.2 million messages per second across 1,100 brokers. That's the ceiling. For 99% of teams, 100K messages/sec across 6 brokers is more realistic. Your bottleneck will be your network bandwidth and disk I/O long before Kafka gives up.

What's the difference between Kafka and RabbitMQ?

RabbitMQ is a message broker. Kafka is a distributed log. RabbitMQ delivers messages to one consumer and removes them. Kafka keeps messages and lets multiple consumers replay them. RabbitMQ is better for request-response patterns. Kafka is better for event streaming and replay.

Should I use Kafka for real-time analytics?

Yes, but you'll layer something on top. Kafka itself doesn't analyze data — it stores and streams it. You pair it with Kafka Streams, Flink, Spark Streaming, or ksqlDB. We use ksqlDB for simple filtering and aggregation. Flink for complex windowed joins.

How do I handle schema evolution?

Use Avro with Schema Registry. You define a schema for each event type. Schema Registry enforces backward compatibility. Producers write Avro-encoded bytes. Consumers decode them. When you change a schema — add a field, deprecate an old one — Schema Registry ensures old consumers can still read new events.

python
# Avro producer with Schema Registry
from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer

avro_producer = AvroProducer({
    'bootstrap.servers': 'localhost:9092',
    'schema.registry.url': 'http://localhost:8081'
}, default_key_schema=key_schema, default_value_schema=value_schema)

value = {"order_id": "ORD-123", "amount": 250.00, "status": "pending"}
avro_producer.produce(topic='orders', value=value)

What's the proper way to handle failures?

Three things:

Producer retries: Set retries=5 and enable.idempotence=true. Your producer will retry on transient failures without duplicating records.
Consumer resilience: Use enable.auto.commit=false. Commit offsets manually after processing. If your consumer crashes, it re-processes the last batch. Duplicate-tolerant design.
Broker resiliency: Run at least 3 brokers. Set min.insync.replicas=2. Never use acks=0 in production (unless you enjoy losing data).

Is Kafka free?

Kafka itself is open source (Apache 2.0 license). Confluent sells enterprise features: Schema Registry (free tier), Kafka Connect, RBAC, multi-region replication. If you're running fewer than 10 brokers, the free Confluent Community license is fine. At scale, you'll want the commercial features. Or you can roll your own — many teams do.

The Architecture Decision: When Kafka Is Wrong

Let me save you pain. Here's when you should NOT use Kafka:

1. Request-response with sub-10ms requirements. Kafka adds 2-5ms minimum. Use gRPC or Redis Queue.

2. You have 3 engineers. Kafka takes a team to run. Use managed services: Confluent Cloud, Redpanda Serverless, or AWS MSK. Your time is better spent on product.

3. You need FIFO queues. Kafka guarantees order within a partition, not across partitions. If you need strict global order with retries, use RabbitMQ or Pulsar.

4. Your data volume is under 100GB/day. Kafka's operational overhead isn't worth it for small data. Use Postgres LISTEN/NOTIFY or NATS.

5. You need true pub/sub with wildcard routing. Kafka topics are flat. You can't subscribe to orders.*.us.east. MQTT or Pulsar handle that natively.

The Future of Kafka (2024 and Beyond)

Three trends I'm watching:

KRaft mode — Kafka without ZooKeeper. Confluent shipped it in production in 2023. We migrated one cluster. It simplified ops significantly. No more ZooKeeper quorum to manage. But it's still maturing. Don't run KRaft on your largest cluster yet.

Redpanda. A Kafka-compatible system written in C++. Removes the JVM overhead. We tested it for a client needing 200K events/sec on 3 nodes. Redpanda delivered with half the hardware. But it's not Kafka — the ecosystem (connectors, monitoring) is smaller.

Serverless Kafka. Confluent Cloud's serverless tier auto-scales to zero. We use it for dev environments. The pricing is unpredictable for production (you pay per partition-hour and per GB transferred). Good for variable workloads. Bad for steady-state high throughput.

What I've Learned After 7 Years

I've deployed Kafka for clients in finance, logistics, and e-commerce. Here's what matters:

Start simple. One topic. One consumer group. Hand-rolled producer and consumer. No Kafka Streams. No Connect. No Schema Registry. Get the data moving. Then layer on complexity.

Monitor aggressively. Burrow for consumer lag. Kafka Exporter for broker metrics. Custom dashboards for partition imbalance. You will hit problems. Know about them before your users do.

Plan for partition growth. Partitions are the unit of parallelism. 10 partitions = 10 consumers max. Start with more partitions than you think you need (2x your expected consumer count). You can't shrink partitions. You can only add more (and then you must manually rebalance).

Assume nothing about message ordering. Yes, Kafka guarantees order within a partition. But what happens when you retry a failed message? The retry lands at the end of the partition. Your downstream system sees events out of order. Build idempotent consumers that handle out-of-order events.

Use schemas from day one. I can't stress this enough. Once you have 5 microservices producing events to the same topic, changing the event format becomes a nightmare. Put Schema Registry in your architecture from the first deploy.

The Bottom Line

What is Kafka Apache used for? It's the backbone of modern event-driven systems. It handles data-in-motion at scale better than almost anything else. But it's not magic. It's a tool with sharp trade-offs.

Use Kafka when:

You need multiple consumers to read the same data independently
You need to replay historical data
You're building event sourcing or CQRS
You have high throughput (10K+ events/sec) and need durability

Don't use Kafka when:

You need sub-5ms latency
You have a tiny team and small data
You need strict FIFO with retries
Your problem is actually a database problem (store the data, query it later)

If you decide to use it, start small. Use a managed service if you can. Monitor obsessively. And always, always have a plan for dealing with consumer lag at 2 AM.

Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec.