DEV TOOLS CASE STUDY

Go API Gateway Migration: From 800ms Spikes to 12ms P99

Node.js API gateway suffered from GC pauses causing 800ms latency spikes under 18K RPS load.

Rewrote the gateway in Go with goroutine concurrency, custom connection pooling, and optimized JWT verification. Achieved single-instance throughput.

P99 Latency

12ms

Cost Reduction

83%

Instance

1

Context

DigitalAlign, a US-based high-traffic SaaS platform, needed to migrate their Node.js API gateway to Go to handle 18,000 requests per second with sub-20ms latency.

Problem

The Node.js gateway hit garbage collection pauses under sustained load, causing latency spikes up to 800ms. These spikes propagated to all downstream services, affecting user-facing endpoints. The operational model required a large fleet of small instances to handle throughput, driving infrastructure costs higher while delivering worse performance. The single-threaded event loop couldn't handle both high concurrency and CPU-intensive JWT verification without blocking.

Constraints

Required sustained 18K RPS with sub-20ms P99 latency. Zero downtime migration—existing microservices depended on the gateway for routing, auth, and rate limiting. Full API compatibility with existing consumers. JWT verification performance was critical—any optimization needed to maintain security correctness.

Approach

Node.js excels at I/O-bound workloads. But for a gateway handling 18K RPS with per-request CPU work—JWT verification, request parsing, rate limiting—the event loop becomes a bottleneck. Go's goroutines provide concurrent request handling with minimal memory overhead, and its concurrent GC eliminates stop-the-world pauses. We chose a custom Go implementation over commercial gateways to control costs and optimize for their specific traffic patterns.

Implementation

The gateway used Go's net/http with a custom handler wrapping each request. Goroutines handled each connection without blocking the listener. We implemented a connection pool to upstream services using http.Transport's MaxIdleConnsPerHost, reducing TLS handshake overhead by 90%. JWT verification used go-jwt library with custom parsing to avoid json.Unmarshal overhead. Rate limiting employed a token bucket algorithm with sync.Map for distributed state. The canary deployment routed 1% of traffic to the Go gateway initially, validating latency and error rates before full cutover.

Results

P99 latency dropped from 800ms spikes to a consistent 12ms. The gateway now handles 18K RPS on a single instance vs. the previous 8 Node.js instances—83% infrastructure cost reduction. Error rate dropped to 0.001% from 0.5% during GC events. The team eliminated on-call alerts for gateway performance.

Key Insight

At high throughput with per-request CPU work, Node.js's event loop architecture becomes a liability, not an asset. Go's goroutines and concurrent GC provide predictable latency at scale. The migration cost $0 in licensing and paid for itself in 6 weeks through infrastructure savings.

Related Projects