Netflix Runs on Kubernetes. Here's the Real Story.

Netflix uses Kubernetes. Extensively. But if you're here to copy their architecture, stop. You'll burn your infrastructure budget in a week. I'm Nishaant Dix...

netflix runs kubernetes here's real story
By Nishaant Dixit

Netflix Runs on Kubernetes. Here's the Real Story.

Netflix uses Kubernetes. Extensively. But if you're here to copy their architecture, stop. You'll burn your infrastructure budget in a week.

I'm Nishaant Dixit, founder of SIVARO. I've spent the last 6 years building data infrastructure and production AI systems. When clients ask me "is netflix using kubernetes?" they're usually really asking: "Should I be using Kubernetes too?"

The answer is complicated. Let me show you what Netflix actually does — and what it means for your stack.

What Netflix Actually Does with Kubernetes

Netflix moved to Kubernetes around 2015-2016. But here's the part nobody talks about: they run Kubernetes on top of their own custom infrastructure. They're not on EKS or GKE. They built their own platform called Titus.

Titus is Netflix's container management platform. It's built on Kubernetes APIs, but it sits on top of AWS EC2 instances. Netflix manages everything — the control plane, the networking, the storage, the scheduling.

Most people think "is netflix using kubernetes?" means "are they clicking 'Create Cluster' in AWS console?" No. They're not.

They forked Kubernetes, extended it, and built their own scheduler that understands their specific workloads. That's a year-long engineering effort for a team of 20+ senior engineers.

Here's what Netflix uses Kubernetes for:

  • Stateless microservices (the easy stuff)
  • Batch processing jobs
  • Machine learning model serving
  • CI/CD pipelines

Here's what they don't use it for:

  • Database workloads (they run on EC2 directly)
  • CDN infrastructure (Open Connect appliances)
  • Real-time streaming pipelines (custom-built)

Netflix runs about 2 million containers per week across their fleet. That's a staggering number. But they got there by building tooling that took years of development.

What Does Kubernetes Actually Do? (A Straight Answer)

This is the question I get most from clients. And most explanations are terrible.

So here's what Kubernetes actually does: it schedules containers onto machines and keeps them running.

That's it. That's the core function.

If you have 10 machines running 50 containers, Kubernetes ensures:

  1. If a container crashes, it restarts somewhere
  2. If a machine dies, containers redistribute
  3. If you need 10 copies of your API, it maintains exactly 10
  4. Network traffic gets routed to healthy containers

The Kubernetes documentation calls this "automated container deployment, scaling, and management." That's technically correct but misses the point.

The real value isn't the containers. It's the abstraction. You tell Kubernetes "I want 3 copies of my API with 2 CPU each" and it figures out which machines have capacity. You don't care about individual machines anymore.

But here's the catch I didn't need Kubernetes and you probably don't either: that abstraction costs complexity. You're now managing a control plane, etcd, networking plugins, storage plugins, and a dozen other moving parts.

Is Kubernetes the Same as AWS?

No. And if you think they're the same, you're about to make a painful hiring mistake.

I've had engineers tell me "I know Kubernetes because I use ECS." That's like saying you know how to fly a plane because you've driven a bus.

Here's the breakdown:

AWS provides:

  • Virtual machines (EC2)
  • Managed databases (RDS)
  • Object storage (S3)
  • Load balancers (ALB/NLB)

Kubernetes provides:

  • Container scheduling
  • Service discovery
  • Configuration management
  • Scaling policies

The confusion comes because AWS offers EKS (Elastic Kubernetes Service) and ECS (Elastic Container Service). They're different products.

ECS is Amazon's proprietary container orchestrator. It's simpler, less powerful, and tightly integrated with AWS. You can't run ECS on-premises or on another cloud.

Kubernetes is the standard. Red Hat defines it as "a portable, extensible, open-source platform for managing containerized workloads." Portable means it runs on AWS, GCP, Azure, your colo rack, or your laptop.

But "portable" is misleading. Your Kubernetes manifests might be portable. Your storage, networking, and monitoring probably aren't.

What Exactly Is Kubernetes Used For? (Real Workloads)

Let me give you concrete examples from actual deployments, not marketing slides.

Use Case 1: Microservices API Gateway

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
    spec:
      containers:
      - name: gateway
        image: mycompany/api-gateway:v2.1.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "1000m"
---
apiVersion: v1
kind: Service
metadata:
  name: api-gateway-service
spec:
  selector:
    app: api-gateway
  ports:
  - port: 80
    targetPort: 8080
  type: LoadBalancer

This is bread-and-butter Kubernetes. Three replicas of a gateway service, with resource limits. A load balancer distributes traffic.

Use Case 2: Batch Processing Job

apiVersion: batch/v1
kind: Job
metadata:
  name: nightly-report
spec:
  ttlSecondsAfterFinished: 86400
  backoffLimit: 3
  template:
    spec:
      containers:
      - name: report-generator
        image: mycompany/report:v3.0
        command: ["python", "generate_report.py", "--date", "2024-01-15"]
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
      restartPolicy: Never

This runs a report generation job. It runs once, generates the report, then exits. TTLSecondsAfterFinished auto-deletes the pod after 24 hours.

Use Case 3: Stateful Database (Use With Caution)

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres-cluster
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 100Gi

This is where Kubernetes gets dangerous. Running Postgres in Kubernetes requires persistent volumes, proper backup strategies, and careful handling of node failures. I've seen teams lose data because their PVC got deleted during a cluster upgrade.

Do it only if:

  • You have dedicated Kubernetes storage engineers
  • Your databases are small (< 100GB)
  • You can afford occasional downtime during upgrades

Don't do it if:

  • You're running production databases > 1TB
  • You don't have Kubernetes storage expertise
  • You need point-in-time recovery guarantees

Netflix doesn't run their critical databases in Kubernetes. They run them on EC2. Smart.

Why Are People Moving Away from Kubernetes?

I've seen this trend accelerating. Ona publicly announced they're leaving Kubernetes. Dave from Hacker News made the point that most teams don't need it.

Here's what I've observed in my consulting work:

The cost curve is brutal at small scale.

Running a 3-node Kubernetes cluster costs about $300/month in AWS just for the control plane and base nodes. But you could run the same 3 containers on a $50/month EC2 instance with Docker Compose.

The operational complexity is invisible until something breaks.

Your control plane goes down? Now your deployments fail. Your etcd backup corrupted? You lose cluster state. Your nodes become unhealthy? Good luck debugging that.

The Kubernetes haters guide makes an excellent point: most people's frustration with Kubernetes isn't about the tool itself — it's about their team not having the operational maturity to handle distributed systems.

Here's who should move away from Kubernetes:

  1. Startups with < 10 engineers. You don't have someone to wake up at 3 AM when etcd goes down.
  2. Teams running < 50 containers. Docker Compose or AWS ECS is simpler and cheaper.
  3. Static workloads. If your traffic doesn't spike, you don't need Kubernetes scaling.

The question you should ask isn't "does this company use Kubernetes?" It's "what complexity did they embrace that I can avoid?"

The Infrastructure Maturity Spectrum

Based on what I've seen from 30+ client engagements, here's the spectrum:

Level 0: Single server

  • Running Docker directly
  • Manual deploys via SSH
  • Works for prototypes and small SaaS

Level 1: Docker Compose

  • Multiple containers on one machine
  • Basic networking between services
  • Good for teams of 1-5

Level 2: Managed container service

  • AWS ECS, Google Cloud Run, Heroku
  • No control plane management
  • Limited flexibility
  • Good for teams of 2-10

Level 3: Managed Kubernetes

  • EKS, GKE, AKS
  • Provider manages control plane
  • You manage node groups, workloads, monitoring
  • Good for teams of 5-20 with Kubernetes skills

Level 4: Self-managed Kubernetes

  • You run the control plane
  • Multi-cloud or on-premises
  • Maximum flexibility, maximum complexity
  • Netflix at Titus scale

Netflix is Level 4+. Most companies should aim for Level 2-3.

When Kubernetes Actually Makes Sense

I'll give you four scenarios where Kubernetes is worth the pain:

Scenario 1: You have traffic that spikes 10x in minutes.
Think e-commerce on Black Friday. Kubernetes can scale from 10 pods to 100 in under 2 minutes. Your load balancer automatically distributes traffic. Your costs drop to near-zero when traffic ends.

Scenario 2: You need multi-cloud or hybrid deployment.
Some clients run Kubernetes across AWS and on-premises because of data sovereignty regulations. Kubernetes provides a uniform API across both environments. But don't underestimate the networking and storage complexity.

Scenario 3: Your team already knows Kubernetes.
If you hire 3 senior Kubernetes engineers, you should probably use Kubernetes. Fighting the tool creates morale problems. Cloud Google's Kubernetes docs are excellent, but they assume you know what you're doing.

Scenario 4: You're running 500+ microservices.
At this scale, manual management becomes impossible. Kubernetes provides consistent deployment, service discovery, and monitoring. Netflix is at this scale.

The Anti-Patterns I See Most Often

After helping clients clean up Kubernetes messes, here are the mistakes that keep happening:

Antipattern 1: Kubernetes for a single monolithic app.
I've seen startups run one API server, one database, and one frontend in Kubernetes. They had 9 nodes, 3 people managing the cluster, and spent 20 hours a month on cluster maintenance. They could have used a $20/month VPS.

Antipattern 2: Ignoring resource limits.

apiVersion: v1
kind: Pod
metadata:
  name: hungry-pod
spec:
  containers:
  - name: nginx
    image: nginx
    # No resources specified!

Without resource limits, one container can starve the entire node. I've debugged production outages caused by a single container consuming all node memory. Always set requests and limits.

Antipattern 3: Overprovisioning for "scalability".
Someone configures 50 pod replicas because they read that Netflix does that. They burn $10,000/month on idle infrastructure. Test with 3 replicas first. Scale up when you see actual traffic.

Antipattern 4: No chaos engineering for state.
You test stateless workloads — API servers, workers, frontends. But your database statefulset has never been tested through a node failure. You don't know if your PVCs will survive a cluster restart. Test it.

Practical Advice for Teams Considering Kubernetes

Step 1: Run a "no-Kubernetes" evaluation first.
Deploy your app on a single $20/month server with Docker Compose. Measure your traffic, resource usage, and scaling needs. If that works, you don't need Kubernetes.

Step 2: If you need Kubernetes, start with managed options.
Use GKE (Google's managed Kubernetes) or EKS (AWS). Let the provider handle the control plane. You handle the node groups.

Step 3: Build a "clusterless" pattern first.
Deploy your app to Cloud Run or App Runner. These auto-scale without Kubernetes complexity. Most startups should start here.

Step 4: Only move to self-managed Kubernetes when you've exceeded the managed tier.
Netflix, Spotify, and Uber run their own. You probably don't need to.

The Real Cost of Kubernetes

Let's be honest about costs:

Infrastructure cost: A 3-node cluster costs $150-400/month for small workloads. This is more than a $50/month VPS but less than $2000/month for a bare-metal server.

Operational cost: Expect 20-40 hours/month of Kubernetes work for a small team. That's $4000-8000/month in engineering time. Most people don't account for this.

Training cost: A senior engineer needs 2-3 months to become productive with Kubernetes. That's $20,000-40,000 in salary before they deliver value.

Migration cost: Moving from Docker Compose to Kubernetes takes 2-6 months for a small team. During that time, you're not building product features.

The total cost for a small team to adopt Kubernetes is typically $50,000-100,000 in the first year. That's real money. Use it wisely.

When I Recommend Kubernetes

In my consulting work at SIVARO, I recommend Kubernetes when:

  1. The client has traffic spikes > 5x normal
  2. They have 3+ engineers who already know Kubernetes
  3. They're running 100+ containers
  4. They need multi-cloud or hybrid deployment
  5. Their stateless services are growing faster than their ops team

Otherwise, I recommend starting with:

  • Docker Compose (for prototypes)
  • AWS ECS or Google Cloud Run (for production)
  • Managed Kubernetes (EKS/GKE) when they hit scaling limits

Netflix uses Kubernetes because they have the scale, the engineers, and the operational maturity. Most companies don't.

If you're still asking "is netflix using kubernetes?", the real question should be "what do they gain from it that I could gain with less complexity?"

The answer is usually: not much.

Build your product. Validate your market. Grow your team. Then consider Kubernetes.


FAQ: Quick Answers to Common Questions

Q: Is Netflix fully on Kubernetes?
A: No. Netflix runs Titus (their Kubernetes-based platform) for stateless workloads. Their CDN, streaming engines, and critical databases run on custom infrastructure.

Q: What does Kubernetes actually do that Docker doesn't?
A: Docker runs containers on one machine. Kubernetes runs them across many machines with networking, scaling, and self-healing. If one machine dies, Kubernetes moves the containers.

Q: Is Kubernetes the same as AWS?
A: No. AWS is a cloud provider. Kubernetes is a container orchestrator. AWS offers EKS (Kubernetes service) and ECS (proprietary container service). They're different tools for different problems.

Q: What exactly is Kubernetes used for in production?
A: Running containerized applications at scale — API servers, batch jobs, CI/CD pipelines, ML model serving. It handles deployment, scaling, networking, and monitoring.

Q: Why are people moving away from Kubernetes?
A: Operational complexity, high infrastructure costs at small scale, and the need for specialized engineering talent. Most teams don't need it.

Q: Should my startup use Kubernetes?
A: Probably not. Start with Docker Compose or a managed platform like Cloud Run. Move to Kubernetes when you have 50+ containers and the engineering team to support it.

Q: What's the biggest mistake teams make with Kubernetes?
A: Adopting it before they need it. The operational overhead kills productivity. Build product, validate market, then scale infrastructure.


Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec.

Free · No Commitment · 48-Hour Delivery

Get a free infrastructure audit

2-hour remote session. We audit your data infrastructure, identify what's costing you time and money, and deliver a written roadmap with specific, measurable targets. No pitch.

Book Your Free Audit
N
Nishaant Dixit
Founder & Lead Engineer at SIVARO

Building data-intensive systems since 2018. 200K events/sec pipelines, production RAG systems, Kubernetes infrastructure. LinkedIn →

Start a Project
Need help with infrastructure?

Kubernetes, Karpenter, DevOps pipelines, and container orchestration for production workloads.

Explore MVP to Production