What Does Kubernetes Do Exactly? A Practitioner's Guide

Look, I spent two years ignoring Kubernetes. Thought it was overengineered. Another Google brainchild that solves problems you don't have. Then we hit 50 microservices at SIVARO, and suddenly I understood why every engineering leader I respected was migrating. So let me tell you what Kubernetes actually does—not the marketing fluff, but the real mechanical sympathy.

Kubernetes is an orchestration platform for containerized applications. It automates deployment, scaling, and operations of application containers across clusters of hosts. But that's like saying a car "transports people." The real question is what does Kubernetes do exactly when you push that first deployment to production at 3 AM?

The One Problem Kubernetes Actually Solves

Most people think Kubernetes is about scaling. They're wrong.

The scaling conversation is a distraction. The real problem Kubernetes solves is infrastructure drift. When you have 10 servers manually configured, they start diverging within hours. Package versions drift. Firewall rules decay. One server gets a hotfix that another doesn't. Kubernetes eliminates that by making infrastructure declarative—you define what you want, and it makes it so.

At SIVARO, we had a production incident in 2022 where a developer SSH'd into a box to debug, accidentally left a port open, and we got hit with a cryptominer. That doesn't happen on Kubernetes. The control plane reconciles desired state every 30 seconds. Any deviation gets corrected. It's not just automation—it's an immune system for your infrastructure.

The Control Plane: Where the Magic Happens

Kubernetes runs on at least one master node (preferably three for production). That control plane contains:

kube-apiserver: The front door. Every CLI command, every pod query, every scaling request hits this REST API first.
etcd: The cluster's brain. A distributed key-value store that holds all state. Lose etcd, lose the cluster.
kube-scheduler: Decides which node runs which pod. Factors: resource requirements, affinity rules, current load.
kube-controller-manager: Runs background loops. One controller ensures replica pods stay running. Another manages endpoints. Another handles node failures.

When you type kubectl apply -f deployment.yaml, here's what happens:

API server validates and stores your intent in etcd
Scheduler finds a suitable node for each pod
Controller creates the actual pods
The kubelet on each node pulls container images and starts processes
Health probes begin checking every 10 seconds

The entire process takes under 2 seconds for a simple deployment. That's what Kubernetes does exactly—translates your YAML into running containers with zero manual intervention.

Pods: The Atomic Unit You Can't Ignore

A pod is one or more containers that share networking and storage. You don't deploy containers directly. You deploy pods.

Here's a concrete example from our data pipeline:

yaml
apiVersion: v1
kind: Pod
metadata:
  name: data-ingestor
  labels:
    app: ingestor
    tier: processing
spec:
  containers:
  - name: main-worker
    image: sivaro/ingestor:2.4.1
    ports:
    - containerPort: 8080
    env:
    - name: DB_HOST
      value: "postgres-cluster"
    resources:
      requests:
        memory: "512Mi"
        cpu: "500m"
      limits:
        memory: "1Gi"
        cpu: "1"
  - name: sidecar-logger
    image: sivaro/log-shipper:1.0.3

That sidecar pattern? We use it everywhere. The main worker processes data; the sidecar handles log rotation and shipping to S3. They share the same network namespace (localhost:8080 for health checks), same volume for log files. Without Kubernetes managing that lifecycle, you'd need supervisord or manual setup on each host.

The key insight: pods are ephemeral. They die. Kubernetes is designed around that assumption. You don't fix broken pods—you replace them. This forces you to build stateless, disposable services. That sounds painful. It is. But the reliability gains are real. Our uptime went from 99.5% to 99.97% after adopting this pattern.

Deployments: Your Production Safety Net

An unmanaged pod is a dead pod. You never run pods directly in production. You run Deployments.

A Deployment manages a ReplicaSet, which manages pods. It gives you:

Rolling updates: Replace pods one by one with zero downtime
Rollbacks: One command reverts to the last working version
Scaling: Change replicas instantly

Here's a deployment we use for our model-serving infrastructure:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inference-api
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  selector:
    matchLabels:
      app: inference
  template:
    metadata:
      labels:
        app: inference
    spec:
      containers:
      - name: model-server
        image: sivaro/llm-serve:3.1.2
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 15
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 30

That readiness probe? It took us 3 production outages to get right. If the probe fails, Kubernetes stops sending traffic to that pod. Set it too aggressive, and pods get marked unready during normal startup. Set it too lenient, and users hit 503s on broken instances.

The rule we follow: readiness probes check dependencies (database connectivity, model loading). Liveness probes check process health (is the process still running?). Mixing them up causes cascading failures.

Services: How Your Pods Actually Get Traffic

Pods come and go. Their IP addresses change. Services provide a stable endpoint.

A Service is an abstraction that selects pods via labels and load-balances traffic to them.

yaml
apiVersion: v1
kind: Service
metadata:
  name: inference-service
spec:
  selector:
    app: inference
  ports:
  - port: 80
    targetPort: 8000
  type: ClusterIP

ClusterIP is the default—internal cluster networking. For external traffic, you'd use NodePort (exposes on each node's IP) or LoadBalancer (provisions a cloud load balancer).

We use a Layer 7 ingress controller (nginx-ingress in our case) for SSL termination and routing. The ingress resource looks like:

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
spec:
  tls:
  - hosts:
    - api.sivaro.com
    secretName: tls-secret
  rules:
  - host: api.sivaro.com
    http:
      paths:
      - path: /v1/predict
        pathType: Prefix
        backend:
          service:
            name: inference-service
            port:
              number: 80

Without Services, you'd hardcode pod IPs. That works until a node fails. Then your entire routing table becomes stale. Services wrap Kubernetes' internal DNS (CoreDNS by default), and you reference services by name within the cluster.

ConfigMaps and Secrets: Don't Hardcode Anything

Your code shouldn't know where it runs. That's ConfigMaps for non-sensitive config, Secrets for credentials.

Here's how we structure our database configuration:

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  DB_HOST: "postgres-cluster.default.svc.cluster.local"
  DB_PORT: "5432"
  LOG_LEVEL: "info"
  BATCH_SIZE: "1000"

Then mount it as environment variables or files:

yaml
apiVersion: v1
kind: Pod
metadata:
  name: data-processor
spec:
  containers:
  - name: processor
    image: sivaro/processor:2.0
    envFrom:
    - configMapRef:
        name: app-config
    env:
    - name: DB_PASSWORD
      valueFrom:
        secretKeyRef:
          name: db-creds
          key: password

A mistake I made early on: putting secrets directly into ConfigMaps. ConfigMaps aren't encrypted at rest by default. Use Secrets (which are base64-encoded and can be encrypted at rest with etcd encryption). Even better, use external secret stores like HashiCorp Vault or AWS Secrets Manager with CSI drivers.

Autoscaling: When You Actually Need More

Most teams set up autoscaling on day one. They shouldn't. Let me explain.

Autoscaling introduces instability. If your application has cold starts (model loading, database connection pooling), scaling up triggers a cascade of failures. The new pod tries to connect, fails because the DB connection pool is saturated, gets marked unhealthy, the load balancer retries, more connections fail—you get the picture.

We learned this the hard way during a Black Friday event in 2023. Our autoscaler kicked in 4 minutes too late, then overcompensated, then the new pods couldn't connect to the database because we hit connection limits. We had to manually intervene.

Now we use:

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: inference-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: inference-api
  minReplicas: 4
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60

The stabilization windows prevent flapping. The scale-down is deliberately slower than scale-up. You'd rather over-provision temporarily than thrash your cluster.

But honestly? For most services, fixed replicas work fine. Autoscaling is a tool for unpredictable loads (like user-facing APIs after a feature launch). Batch processing jobs don't need it.

Storage: Persistent Data in an Ephemeral World

Containers are stateless by default. But databases, caches, and file storage need persistence. Kubernetes handles this through PersistentVolumeClaims (PVCs) and StorageClasses.

yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-store
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: ssd-premium

Mount it in a pod:

yaml
spec:
  containers:
  - name: postgres
    image: postgres:16
    volumeMounts:
    - mountPath: /var/lib/postgresql/data
      name: pgdata
  volumes:
  - name: pgdata
    persistentVolumeClaim:
      claimName: data-store

Kubernetes doesn't manage your storage hardware. It abstracts it. The storage backend (cloud volumes, NFS, Ceph, whatever) is handled by the CSI driver. Key thing: if a pod gets rescheduled to a different node, the PVC follows it if the access mode allows (ReadWriteMany for NFS, ReadWriteOnce for block storage attached to one node at a time).

We run stateful applications (PostgreSQL, Redis) on Kubernetes with StatefulSets (not Deployments). StatefulSets give each pod a stable identity and ordered scaling. Without it, your database pods get random names every restart, and replication breaks.

Networking: The Part That Breaks Most Often

Kubernetes networking is confusing because it's abstracted. The Container Network Interface (CNI) plugin handles actual wiring. Popular choices: Calico (policy-heavy), Flannel (simple overlay), Cilium (eBPF-based, fast).

Each pod gets a unique IP on the cluster network. Pods can reach each other directly (assuming network policies allow it). The CNI handles routing across nodes.

Here's what broke for us: our Calico policies were too restrictive. We blocked ICMP, which broke DNS lookups on one node. Took us 6 hours to debug. Lesson: test network policies in isolation before applying cluster-wide.

A minimal network policy:

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-internal
spec:
  podSelector:
    matchLabels:
      app: inference
  policyTypes:
    - Ingress
  ingress:
    - from:
      - podSelector:
          matchLabels:
            app: frontend
      ports:
        - port: 8000

This allows traffic only from pods labeled app: frontend to the inference pods on port 8000. Everything else is dropped. It's not firewall-level security, but it's a huge improvement over "allow all."

The Hard Truths Nobody Tells You

Kubernetes isn't simple. It's not a silver bullet. Here's what the hype gets wrong:

Complexity cost is real. A 3-node cluster with ingress, monitoring, logging, and CI/CD integration takes weeks to set up correctly. Teams that don't have dedicated DevOps struggle.

Debugging is harder. When a pod fails, you check pod logs, then events, then kubelet logs, then node metrics. The distributed nature makes root cause analysis painful.

Resource management isn't automatic. You still have to specify CPU and memory requests/limits. Miscount them, and pods get OOM-killed or CPU-throttled.

Upgrading is risky. Minor version upgrades of Kubernetes itself can break your workloads. We got burned on a 1.23 to 1.24 upgrade that deprecated a CRD we depended on.

But—and this is the important part—these problems are solvable. The alternative (manual server management, configuration drift, no rollback capability) is worse at scale. For small projects (under 10 services), Kubernetes is overkill. Docker Compose or a simple VM setup suffices. For anything larger, the operational benefits justify the complexity.

FAQ: What Does Kubernetes Do Exactly?

Q: Is Kubernetes only for cloud deployments?
No. You can run it on bare metal, on-prem VMs, even on Raspberry Pi clusters. The abstraction layer works regardless of infrastructure. We have a 5-node on-prem cluster for our data pipeline that never touches cloud.

Q: Does Kubernetes handle monitoring and logging?
Not natively. It's infrastructure only. You need Prometheus + Grafana for monitoring, and a logging stack (ELK, Loki, or similar) for logs. Kubernetes provides the plumbing (metrics API, log sinks) but not the tools themselves.

Q: Can I run databases on Kubernetes?
Yes, but carefully. Stateful applications (databases, queues) require StatefulSets, PVCs, and careful backup strategies. GitLab in 2022 ran their entire PostgreSQL infrastructure on Kubernetes—proves it's possible, but don't attempt it without dedicated operations support.

Q: What does Kubernetes do exactly for scaling?
It scales horizontally by adding/removing pod replicas (based on CPU, memory, or custom metrics) and vertically by adjusting node resources (if using cluster autoscaling). The scheduler distributes work across available capacity.

Q: Is Kubernetes the same as Docker?
No. Docker manages containers on a single host. Kubernetes manages containers across a cluster. Docker is the runtime; Kubernetes is the orchestrator. You can run Kubernetes with containerd (the default) or CRI-O instead of Docker.

Q: How long does it take to learn Kubernetes properly?
For basic deployments: 2-3 weeks. For production readiness (security, networking, monitoring, troubleshooting): 6 months. Most teams underestimate the learning curve by 3x.

Q: What's the cheapest way to try Kubernetes?
Minikube on your laptop (free), k3s on a $5/month VPS, or Azure's AKS/GCP's GKE free tier. Avoid EKS for learning—$73/month control plane cost kills experimentation.

Final Thoughts

Kubernetes doesn't solve problems you don't have. But when you're managing 50+ services across multiple environments, with zero-downtime deploys and regulatory compliance requirements, it shifts the conversation from "how do we keep servers running" to "how do we improve our product."

The real answer to "what does Kubernetes do exactly?" is: it converts your application logic from a static artifact into a dynamic, self-healing system. It's not magic. It's careful orchestration of containers, networking, storage, and configuration into a coherent whole.

We're still learning. Every incident teaches us something new. Last week it was a pod eviction due to node pressure. Week before, a misconfigured readiness probe. But every time, Kubernetes brought the system back to the desired state without a manual restart.

That's the actual value. The quiet, unglamorous work of maintaining state across failure.

Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec.