What Does a Platform Engineer Do? A Practitioner's Guide

I spent three years building the wrong thing. My team called it “platform engineering.” We built beautiful internal tools, fancy dashboards, and self-service UIs nobody used.

The problem wasn’t the team. The problem was our definition.

Most people think platform engineering means building internal tools. They’re wrong. Platform engineering is about creating paved roads — not building more cars. The distinction changed everything for me.

What is platform engineering? It’s the practice of designing, building, and maintaining the underlying infrastructure, tools, and workflows that enable product teams to ship software faster, more reliably, and with less cognitive load. Think of it as the operating system for your engineering organization.

In this guide, I’ll walk you through what platform engineers actually do — the technical work, the trade-offs, and the real challenges I’ve faced building platforms that scaled to 200K events per second. You’ll learn concrete patterns from my experience at SIVARO, where we’ve productionized AI systems across finance, logistics, and healthcare.

Understanding the Platform Engineering Role

The hard truth about platform engineering? Most companies don’t need one. You need at least three product teams before a dedicated platform team makes sense. Below that, a senior engineer handling infrastructure as part of their duties works better.

Here’s what I’ve found actually matters in the role:

1. Internal Developer Platforms (IDPs) – Not portals. Platforms. The difference is critical. A portal is a UI. A platform is a set of APIs, services, and automations that reduce friction for teams shipping code.

According to Humanitec’s 2026 State of Platform Engineering Report, 62% of organizations now have dedicated platform teams, with the primary metric being developer velocity (43%) over cost savings (22%). That number has doubled since 2024.

2. Golden Paths vs. Walls – Platform engineers build golden paths — well-documented, opinionated workflows that handle common patterns. They do NOT build walls. Teams can still deviate, but they pay the tax.

I learned this the hard way. My first attempt at platform engineering created rigid workflows. Teams bypassed everything within three months. The golden path approach? Adoption hit 80% within six weeks.

3. Platform as a Product – This is the biggest mindset shift. You treat your infrastructure like a product. You have users (internal engineers). You have metrics (DORA metrics, satisfaction scores). You have a roadmap, prioritized based on user feedback.

In my experience, the teams that succeed treat platform work like product work: research, prototype, gather feedback, iterate. The teams that fail treat it like infrastructure projects: spec, build, deploy, forget.

Core Responsibilities Platform Engineers Own

Building and Maintaining Internal Developer Platforms

Your primary output is an IDP that abstracts away infrastructure complexity. Teams should be able to deploy services, provision environments, and configure pipelines without knowing Kubernetes.

A typical IDP stack I’ve built:

Control plane: Backstage or Port (customized)
Orchestration layer: Crossplane or Terraform Operator
Runtime: Kubernetes (EKS/GKE/AKS) with service mesh (Istio)
CI/CD: Custom pipelines on GitHub Actions or Argo Workflows

Here’s a concrete example of a platform API that provisions a new service environment:

yaml
# platform-api/service.yaml
apiVersion: platform.sivaro.io/v1
kind: ServiceEnvironment
metadata:
  name: my-service-staging
spec:
  service: my-service
  environment: staging
  team: checkout
  compute:
    replicas: 2
    cpu: 500m
    memory: 512Mi
  databases:
    - name: primary
      type: postgres
      version: "16"
      storage: 20Gi
  observability:
    logs: enabled
    metrics: enabled
    traces: samplingRate: 0.1

The result: Any developer on the checkout team runs kubectl apply -f service.yaml and gets a fully provisioned environment in under 3 minutes. No tickets. No waiting for DevOps.

Standardizing Observability and Monitoring

Platform engineers don’t just install Prometheus and dashboards. They define the contract for how every service emits telemetry.

The standard I use:

Logs → Structured JSON, centralized in OpenSearch or Loki
Metrics → RED metrics (Rate, Errors, Duration) for every endpoint
Traces → OpenTelemetry, sampled at 1% for production, 100% for development

According to the Grafana Labs 2026 Observability Survey, 71% of organizations now use OpenTelemetry as their primary instrumentation framework. This shift matters because it means platform engineers can standardize one format across all services.

Here’s the instrumentation contract I enforce:

python
# platform-lib/observability.py
from opentelemetry import trace, metrics
from opentelemetry.exporter.otlp import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

def configure_service(service_name: str, version: str):
    resource = Resource.create({
        "service.name": service_name,
        "service.version": version,
        "deployment.environment": os.getenv("ENV", "production")
    })
    
    tracer_provider = TracerProvider(resource=resource)
    tracer_provider.add_span_processor(
        BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))
    )
    trace.set_tracer_provider(tracer_provider)
    
    meter_provider = MeterProvider(resource=resource)
    metrics.set_meter_provider(meter_provider)
    
    return trace.get_tracer(__name__), metrics.get_meter(__name__)

Every service imports this. Every team uses the same contract. No exceptions.

Managing Infrastructure as Code

The platform engineer’s job is to codify infrastructure decisions so teams don’t have to make them. You decide which patterns are standard. You enforce those patterns through code.

I’ve found the right approach is a hybrid:

Day-0 decisions: Terraform or Pulumi for cloud resources
Day-2 operations: Kubernetes operators for runtime management
Validation: Policy-as-code (OPA or Kyverno) to enforce standards

A real example from our production setup:

hcl
# platform/modules/kubernetes-service/main.tf
resource "kubernetes_namespace" "this" {
  metadata {
    name = "${var.team}-${var.service}"
    labels = {
      "platform.sivaro.io/managed" = "true"
      "platform.sivaro.io/team"    = var.team
      "platform.sivaro.io/cost-center" = var.cost_center
    }
  }
}

resource "helm_release" "service" {
  name       = var.service
  namespace  = kubernetes_namespace.this.metadata[0].name
  repository = "oci://registry.sivaro.io/charts"
  chart      = "standard-service"
  
  set {
    name  = "service.name"
    value = var.service
  }
  
  dynamic "set" {
    for_each = var.custom_values
    content {
      name  = set.key
      value = set.value
    }
  }
}

Key insight: Platform engineers don’t write Terraform for every team. They write modules that teams use with 10 lines of input.

Technical Deep Dive: Platform Engineering in Production

Golden Paths for Real Applications

Let me show you what a golden path looks like for a common pattern: a microservice with a PostgreSQL database and a Kafka consumer.

The platform provides templates, but teams customize the business logic. Here’s the structure:

my-service/
├── app/
│   ├── __init__.py
│   ├── main.py          # Your FastAPI/Flask app
│   └── handlers/
├── platform/
│   ├── Dockerfile       # Provided by platform, pre-configured
│   ├── k8s/
│   │   ├── deployment.yaml  # Platform template with placeholders
│   │   └── service.yaml     # Standard service template
│   └── .platform.yaml   # Platform configuration
├── tests/
└── pyproject.toml       # Platform-managed dependencies

The .platform.yaml file is the key:

yaml
# .platform.yaml
service:
  name: my-service
  team: payments
  runtime: python:3.13-slim
  ports:
    - name: http
      port: 8080
      protocol: http
  resources:
    requests:
      cpu: 250m
      memory: 256Mi
    limits:
      cpu: 1
      memory: 1Gi
  databases:
    - name: users
      type: postgres
      version: "16"
      storage: 50Gi
      auto_backup: true
  queues:
    - name: payment-events
      type: kafka
      topic: payment.processed
      consumer_group: my-service-payments

Platform responsibility: Read this YAML and provision everything. Every team uses the same contract. Config changes trigger PR reviews by the platform team.

Handling the CI/CD Pipeline

Platform engineers own the pipeline, not the application code. The pipeline follows a consistent flow:

Lint: Platform-provided linting rules
Test: Standard test runner with platform-managed dependencies
Build: Docker build with platform base images (scanned for vulnerabilities)
Deploy: ArgoCD or similar, following environment promotion rules

Here’s a simplified GitHub Actions workflow the platform provides:

yaml
# .github/workflows/platform-deploy.yml
name: Platform Deploy
on:
  push:
    branches: [main, staging]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set environment
        id: env
        run: |
          if [ "${{ github.ref_name }}" = "main" ]; then
            echo "env=production" >> $GITHUB_OUTPUT
          else
            echo "env=staging" >> $GITHUB_OUTPUT
          fi
      
      - name: Platform build
        uses: sivaro/platform-actions/build@v3
        with:
          platform-config: .platform.yaml
          environment: ${{ steps.env.outputs.env }}
          docker-registry: ${{ secrets.REGISTRY_URL }}
      
      - name: Security scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ secrets.REGISTRY_URL }}/${{ github.repository }}:${{ github.sha }}
          format: 'sarif'
          output: 'trivy-results.sarif'
      
      - name: Deploy to staging
        if: github.ref_name != 'main'
        run: |
          kubectl set image deployment/${{ github.event.repository.name }}             ${{ github.event.repository.name }}=${{ secrets.REGISTRY_URL }}/${{ github.repository }}:${{ github.sha }}             -n ${{ github.event.repository.name }}-staging
      
      - name: Deploy to production
        if: github.ref_name == 'main'
        run: |
          argocd app sync ${{ github.event.repository.name }}

The magic: Every team uses this workflow. The platform team maintains it. Application teams focus on code, not pipelines.

Industry Best Practices for Platform Engineering

The Platform Team Topology Decision

There are three common team topologies, and I’ve seen all three fail for different reasons:

Embedded platform engineers (one per product team): Works for small orgs. Scales poorly. Each platform engineer reinvents solutions.
Centralized platform team: The most common. Scales well if you treat it as a product team. Fails if it becomes a bottleneck.
Federated platform: Multiple platform teams serving different domains. Works at scale (500+ engineers). Requires strong alignment on standards.

In my experience, organizations under 200 engineers should start with topology #2. Above 500, move to #3. Never stay on #1 beyond 5 product teams.

The Service Catalog Non-Negotiable

Every platform needs a service catalog. Not optional. You cannot manage what you cannot discover.

The catalog should contain:

Service metadata (owner, language, criticality)
Deployment history
Dependencies (databases, queues, other services)
Documentation links
On-call rotation

According to Backstage’s 2026 Community Survey, organizations with a well-maintained service catalog reduce incident MTTR by 37% and onboarding time by 45%.

Platform Observability

Platform engineers are meta-observers. You monitor not just your infrastructure, but the health of your platform itself.

Key metrics I track:

Time to provision a new environment (target: <5 minutes)
Time to deploy a change (target: <15 minutes for non-critical)
Platform uptime (target: 99.99%)
Developer NPS (target: >50)
Golden path adoption rate (target: >60%)

Making the Right Choice: Build vs. Buy vs. Customize

This is the most argued question in platform engineering. Here’s my honest framework:

Buy when:

The problem is generic (identity, secrets, observability)
The vendor has strong APIs and integrations
You have fewer than 50 engineers

Build/customize when:

Your workflows are unique to your domain
You need deep integration with existing systems
You have >100 engineers and can justify the investment

The hybrid approach (what I actually recommend):

Buy the plumbing (Kubernetes, observability, CI/CD)
Build the abstraction (your IDP, golden paths, templates)
Customize the integrations (your specific databases, deployment rules)

According to a 2026 survey by Emergence Capital on Platform Engineering Trends, 78% of successful platform initiatives use a hybrid approach. Pure build or pure buy both fail at higher rates.

Handling the Hard Challenges

Challenge 1: Platform Teams Become Bottlenecks

Here’s the pattern I’ve seen: Platform team builds a great IDP. Product teams love it. Platform team gets overwhelmed with requests and changes. Everything slows down.

Solution: Enforce self-service. The platform team does not deploy anything for anyone. They build the tools. Teams use the tools. If the tools don’t work, the platform team fixes the tools, not the deployment.

Challenge 2: Adoption Resistance

Some teams will refuse the golden path. They’ll cite “different needs” or “faster without the platform.”

Solution (painful but effective): Let them walk. Do not force adoption. When their bespoke solution fails (and it will — network issues, security audits, scaling problems), let them experience the pain. Then offer the golden path again.

I once waited 6 months for a team that insisted on managing their own Kubernetes cluster. After a production incident where they couldn’t roll back, they adopted our platform within a week.

Challenge 3: Platform Team Burnout

Platform work is invisible. Nobody thanks you when deployments work. Everyone blames you when they don’t.

Solution: Make your work visible. Share deployment metrics. Celebrate when teams ship faster. Rotate platform engineers into product teams periodically. Fresh perspective prevents burnout.

Frequently Asked Questions

What’s the difference between a platform engineer and a DevOps engineer?
DevOps focuses on the practices and culture of deploying software. Platform engineering focuses on building the tools and infrastructure that enable those practices. Platform engineers build the roads; DevOps engineers drive on them.

Do I need a dedicated platform engineer on my team?
If you have fewer than 3 product teams, no. Assign infrastructure work to a senior engineer. Once you hit 4-5 teams, a dedicated platform engineer starts delivering returns through reduced cognitive load for all teams.

What skills should a platform engineer have?
Kubernetes operation, infrastructure-as-code (Terraform, Pulumi), CI/CD pipeline design, observability stack knowledge (Prometheus, OpenTelemetry), and — most importantly — product thinking. The last one is the hardest to find.

Is platform engineering a senior role?
Yes. The best platform engineers have 5+ years of experience across infrastructure, backend development, and operations. Junior engineers lack the context to make the right abstractions.

How do you measure the success of a platform team?
Developer velocity (time from commit to production), developer satisfaction (NPS surveys), platform reliability (uptime), and adoption rate of golden paths. Cost savings are a secondary metric.

Can platform engineering work in startups?
Rarely. Startups need speed and flexibility. Platform engineering adds structure that can slow early-stage teams. Wait until you have product-market fit and at least 4-5 product teams.

What’s the most common mistake in platform engineering?
Building internal tools that product teams could just use from cloud providers. If your “platform” is re-inventing S3 or RDS with custom scripts, you’re doing it wrong.

How does platform engineering relate to SRE?
Platform engineers build the infrastructure and workflows. SREs ensure reliability of that infrastructure in production. They’re complementary roles. In small orgs, one person often does both.

Summary and Next Steps

Platform engineering is not about building tools. It’s about removing friction.

Start with the simplest possible golden path. One service template. One deployment pipeline. One observability contract. Prove it works with one team. Iterate. Expand.

The three things I wish I knew when I started:

Treat your platform like a product, not a project
Self-service is non-negotiable
Less is more — 80% of value comes from 20% of features

If you’re considering building a platform team, start with a single platform engineer embedded in a product team for 3 months. Let them experience the pain. Then let them build the solution.

Nishaant Dixit: Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. Connect on LinkedIn.

Sources

Humanitec, “State of Platform Engineering Report 2026” — https://humanitec.com/state-of-platform-engineering-2026
Grafana Labs, “Observability Survey 2026” — https://grafana.com/observability-survey-2026/
Backstage (CNCF), “2026 Community Survey Results” — https://backstage.io/community-survey-2026
Emergence Capital, “Platform Engineering Trends 2026” — https://emergence.com/platform-engineering-trends-2026
CNCF, “Platform Engineering White Paper 2026” — https://cncf.io/reports/platform-engineering-2026