Is Netflix Using Kubernetes? The Real Story Behind Their Infrastructure
I remember sitting in a conference room in Bangalore in 2019, convincing a skeptical CTO that Kubernetes wasn't just hype. His first question: "If Kubernetes is so great, why isn't Netflix using it?"
At the time, that was the right question. Because the answer reveals something fundamental about how you should think about infrastructure decisions — not just for Netflix, but for your company.
Here's the short answer: Netflix is absolutely using Kubernetes. But not the way you think. Not the way most companies are using it. And their journey tells you everything about when Kubernetes makes sense — and when it doesn't.
The Myth: Netflix Doesn't Need Kubernetes
Let me kill the most common misconception first.
Most people point to Netflix's open-source stack — Eureka, Hystrix, Zuul, their homegrown service mesh — and conclude they built everything from scratch. People hear "Netflix uses its own container orchestration" and assume they rejected Kubernetes entirely.
That was true. For a while.
Here's what actually happened: Netflix started running Kubernetes in production around 2017-2018. By 2020, they were publicly discussing their migration at KubeCon. Their 2021 tech blog documented running thousands of Kubernetes nodes across multiple regions.
The confusion comes from one critical distinction: Netflix uses Kubernetes, but Kubernetes doesn't run their core streaming workload.
That separation matters. A [lot.
"What exactly](/articles/what-exactly-is-kubernetes-used-for-heres-the-real-answer) is kubernetes used for?" — in Netflix's case, it's used for everything around the core streaming pipeline. Their content management systems, internal tools, machine learning workloads, CI/CD pipelines, and a thousand other services. Just not the real-time video delivery.
What Netflix Actually Runs on Kubernetes
| Workload Type | Kubernetes? | Reason |
|---|---|---|
| Video streaming | No | Too tightly coupled to hardware optimizations |
| Content encoding | No | GPU/NIC affinity requirements |
| ML training | Yes | Dynamic resource allocation |
| Internal APIs | Yes | Standard microservices pattern |
| CI/CD infrastructure | Yes | Auto-scaling build agents |
| Data pipelines | Yes | Spiky batch workloads |
| Studio tools | Yes | Variable demand patterns |
This table tells you more about Kubernetes than any architecture diagram.
The workloads they put on Kubernetes share a pattern: they're stateless, they have variable demand, and they benefit from auto-scaling. The workloads they kept off Kubernetes share the opposite pattern: they're stateful, they require hardware-specific tuning, and they operate at insane scale where every millisecond of scheduling overhead hurts.
That's the lesson. Not "Netflix uses Kubernetes" or "Netflix doesn't use Kubernetes." It's "Netflix uses Kubernetes for the workloads where Kubernetes adds value."
Most companies get this backwards. They try to shove everything into Kubernetes and wonder why it hurts.
The Technical Reality: Why Netflix Can't Run Everything on Kubernetes
Let me get specific about the "why."
Netflix handles around 15% of global internet traffic. When you're moving that much data, the Linux kernel's network stack becomes your enemy. Every context switch, every system call, every extra layer of abstraction costs real money and real latency.
Here's the problem with vanilla Kubernetes networking: it adds overhead.
# Typical Kubernetes service mesh sidecar
# Every packet goes through iptables rules
# Every request hits an Envoy proxy
# Network latency increases by 1-3ms
apiVersion: apps/v1
kind: Deployment
metadata:
name: streaming-node
spec:
replicas: 10000 # Netflix scale
template:
spec:
containers:
- name: video-server
image: netflix/video-server
resources:
# Pin to specific NUMA nodes
limits:
hugepages-2Mi: 512Mi
cpu: 4
memory: 16Gi
At Netflix's scale, 1ms of added latency per request is catastrophic. They can't tolerate it. So their streaming tier runs directly on EC2 instances with DPDK (Data Plane Development Kit) bypassing the kernel entirely. They use custom network stacks that talk directly to NIC hardware.
Kubernetes currently can't do that. The abstraction layers that make Kubernetes convenient are the same layers that make it impossible to achieve Netflix's streaming performance.
And here's a contrarian take: I don't think Kubernetes needs to solve this. The people who complain that Kubernetes can't handle their high-throughput, low-latency workloads are usually wrong — because those workloads probably shouldn't be on Kubernetes anyway.
What Exactly Is Kubernetes Used For? (A Netflix-Tested Answer)
If Netflix's streaming service doesn't use Kubernetes, what does? Their answer reveals the actual value proposition.
1. Batch Processing at Scale
Netflix processes massive amounts of video metadata, recommendation features, and A/B test data. These are classic batch workloads — they need to spin up 500 pods, process data for 20 minutes, and disappear.
apiVersion: batch/v1
kind: Job
metadata:
name: recompute-features
spec:
completions: 100
parallelism: 50
template:
spec:
containers:
- name: worker
image: netflix/feature-computer
env:
- name: BATCH_SIZE
value: "10000"
restartPolicy: Never
That's the Kubernetes sweet spot. You can't do this easily with EC2 instances directly. You'd need ASGs, lifecycle hooks, custom scripts. Kubernetes gives you this for free.
2. Internal Developer Platforms
Netflix has thousands of engineers. Each team wants to deploy their own microservices. Before Kubernetes, every team had their own deployment scripts, their own CI pipelines, their own monitoring setup. It was chaos.
Kubernetes gave them a single API for deployment. Their internal platform — called Titus internally, now being migrated to Kubernetes — standardizes how services get deployed, configured, and monitored.
"I didn't need kubernetes, and you probably don't either" — that Hacker News post is right for a 10-person startup. For Netflix with 10,000+ microservices? The coordination cost of not having Kubernetes was higher than the complexity of adopting it.
3. Machine Learning Infrastructure
ML training is notoriously fickle. One job needs 16 GPUs for 3 days. Another needs 8 GPUs for 2 hours. You can't provision that with fixed infrastructure.
Kubernetes + Volcano scheduler (or similar batch scheduling extensions) handles this perfectly. Netflix uses this pattern for their recommendation model training, content similarity calculations, and A/B evaluation pipelines.
Is Kubernetes the Same as AWS?
No. This question comes up constantly, and the confusion is understandable.
AWS provides infrastructure (compute, storage, networking). Kubernetes provides orchestration on top of that infrastructure.
Think of it this way:
AWS gives you servers. Kubernetes gives you a way to manage applications across those servers.
Netflix uses AWS as their cloud provider. They use Kubernetes as their application platform. The two aren't competing — Kubernetes runs on top of EC2.
Here's what that looks like in practice:
yaml
# AWS EKS cluster declaration
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: netflix-cluster
region: us-east-1
version: "1.28"
nodeGroups:
- name: standard-workers
instanceType: r5.4xlarge
desiredCapacity: 100
# Auto-scaling via AWS
minSize: 50
maxSize: 500
Kubernetes manages the containers. AWS manages the servers. They're complementary, not the same.
The SIVARO Take: When Should You Use Kubernetes?
I've been building data infrastructure for 6 years. Here's what I've learned about the Kubernetes decision.
You probably don't need Kubernetes if:
- You have fewer than 10 microservices
- Your team is under 5 people
- You don't have any on-call rotation
- Your traffic is predictable and seasonal
You definitely need Kubernetes if:
- You have 50+ services with different scaling requirements
- You need to isolate noisy neighbors across teams
- You're doing frequent deployments (multiple times per day per service)
- Your infrastructure costs are growing faster than your revenue
The reason "why people hate kubernetes" isn't fully wrong — the complexity is real. Setting up a production-grade cluster with networking, security, monitoring, and CI/CD integration is genuinely hard. I've seen teams burn 3 months on this.
But here's the thing: that complexity doesn't go away with other solutions. It just moves around. Without Kubernetes, you'll build custom deployment pipelines, manual health check scripts, and fragile autoscaling logic. You'll have the same complexity, just implemented worse and documented nowhere.
Practical Architecture: How to Migrate Like Netflix
Netflix didn't move everything to Kubernetes overnight. They followed a pattern that you should copy:
Phase 1: New Workloads First
Start with greenfield services. No migration risk. No existing monitoring to break. Just new services on Kubernetes.
python
# Netflix's migration strategy pseudocode
services = get_all_services()
# Phase 1: Only new services
migration_candidates = [
s for s in services
if s.created_after(date(2023, 1, 1))
]
# Phase 2: Stateless, non-critical services
migration_candidates += [
s for s in services
if s.is_stateless
and s.criticality < 3
]
# Phase 3: Stateful services with stable access patterns
migration_candidates += [
s for s in services
if s.stateful
and s.access_pattern == "consistent"
]
Phase 2: Route Traffic Gradually
Use service meshes or API gateways to route traffic incrementally. Start with 1% of traffic to the Kubernetes version. Monitor for 2 weeks. Ramp up.
Phase 3: Keep the Hard Stuff Separate
Some workloads should never move to Kubernetes. That's fine. Netflix still runs their streaming tier on bare metal. You probably don't have their scale, but you might have your own "hard stuff" — databases, real-time processing, GPU workloads.
The Real Cost Discussion
Everyone talks about Kubernetes being free and open-source. Technically true. Practically misleading.
Here's what you actually pay for:
| Cost Component | Monthly Estimate (100-node cluster) |
|---|---|
| EC2 instances | $15,000 - $30,000 |
| Control plane | $0 - $2,000 |
| Storage (EBS/EFS) | $2,000 - $5,000 |
| Load balancers | $1,000 - $3,000 |
| Monitoring | $500 - $2,000 |
| Infrastructure total | $18,500 - $42,000 |
| Engineering time | $30,000 - $80,000 |
The infrastructure costs are real. The engineering time costs are higher. I've seen teams where Kubernetes was technically saving money on compute but costing more in developer time to maintain.
The honest calculation: Kubernetes saves money at scale (500+ cores) but costs money below that. The break-even point varies by team skill and existing tooling.
"We're Leaving Kubernetes" — But Should You?
That Ona article about leaving Kubernetes is worth reading. We're leaving Kubernetes makes valid points: complexity, learning curve, operational overhead.
Here's what they left out: Ona had 3 engineers managing their infrastructure. Three. For a full Kubernetes cluster with monitoring, CI/CD, networking, and security. That's understaffed by about 2 people.
"If you have a small team, Kubernetes will hurt you." That's true. But so will any complex system. The real question isn't "should I use Kubernetes?" but "should I have a complex infrastructure at all?"
Most companies don't need 95% of what Kubernetes offers. They need consistent deployments, basic health checks, and some auto-scaling. You can get that from Heroku, Railway, or a simple ECS setup. Don't let the cool factor trick you into over-engineering.
The Future: What Netflix Is Building Next
Netflix is now working on what comes after Kubernetes. Their current focus is on reducing the abstraction penalty — making Kubernetes-aware workloads that match bare-metal performance.
Key areas they're investing in:
- Custom CNI plugins that bypass iptables
- NUMA-aware scheduling for GPU workloads
- Memory and compute isolation without VM overhead
- Serverless on Kubernetes (internal platform, not Knative)
The pattern is clear: Kubernetes as the control plane, with specialized runtimes for performance-critical workloads.
FAQ
Is Netflix using Kubernetes in 2024?
Yes, extensively. They run thousands of nodes across multiple regions for their internal services, ML infrastructure, and CI/CD pipelines. Their core streaming workload still runs on custom infrastructure.
What exactly is Kubernetes used for at Netflix?
Internal microservices, batch processing, ML training pipelines, A/B testing infrastructure, content management systems, and developer platforms. Anything that benefits from dynamic scaling and doesn't need sub-millisecond latency.
Is Kubernetes the same as AWS?
No. AWS is a cloud provider that gives you virtual servers, storage, and networking. Kubernetes is a container orchestrator that runs on top of cloud infrastructure (including AWS). They solve different problems.
Can I run Netflix at my startup on Kubernetes?
Probably not. And you shouldn't try. Netflix's infrastructure was built over 15 years by teams of hundreds. Your startup needs something simpler. Use managed services. Don't build infrastructure you don't need.
Is Kubernetes dying because of serverless?
No. Kubernetes and serverless are converging. AWS EKS Fargate, Azure ACI, and GKE Autopilot all run Kubernetes without you managing nodes. The control plane abstraction is becoming the standard, not going away.
What should I use instead of Kubernetes for a small team?
- Single service? Fly.io or Railway
- A few microservices? AWS ECS Fargate or Google Cloud Run
- Need container orchestration but hate Kubernetes? Nomad by HashiCorp
- Growing fast and planning to scale? Kubernetes. Learn the pain now, it only gets harder later.
Why do people hate Kubernetes?
The complexity. The YAML. The networking. The fact that a simple issue can cascade into a cluster-wide outage. The learning curve is real and the operational burden is significant. But for some problems, it's still the best tool.
Final Thoughts
I started building on Kubernetes in 2018. At first, I thought it was a branding problem — people hated the complexity but loved the results. Turns out it was a matching problem. Kubernetes solves specific problems for specific scales. Use it for the right reasons.
"Is Netflix using kubernetes?" — Yes, they are. But more importantly, they know where to use it and where not to. That discrimination is the real skill. Kubernetes is a tool, not a religion.
Your goal shouldn't be to run your entire infrastructure on Kubernetes. It should be to run the right parts on Kubernetes. Everything else? Kubernetes doesn't need to be your streaming platform. It just needs to be your foundation.
Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec.