INFRASTRUCTURE CASE STUDY

ClickHouse Migration for Real-Time Analytics at Scale

PostgreSQL queries took 2-5 seconds, making real-time dashboards unusable during peak load.

Migrated to ClickHouse with Kafka streaming. Implemented partitioned tables, materialized views, and optimized aggregation queries for sub-second analytics.

P95 Query Latency

250ms

Cost Reduction

35%

Events/Day

200M

Context

Floqer, a Canada-based CRM data enrichment SaaS company, needed to migrate from PostgreSQL to ClickHouse to handle 200 million daily events with interactive analytics requirements.

Problem

PostgreSQL couldn't sustain analytical workloads alongside transactional traffic. During peak hours, dashboard queries stretched to 2-5 seconds—unacceptable for a product where sales teams relied on real-time pipeline visibility. The relational model and B-tree indexes were optimized for point lookups, not aggregation scans across millions of rows. CPU saturation on the primary replica caused write stalls, affecting both analytics and core CRM operations.

Constraints

Zero downtime migration. Budget required cost reduction, not increase. The in-house team had no ClickHouse experience—they needed a solution they could operate long-term. Query patterns were diverse—ranging from single-customer lookups to company-wide aggregations—requiring a flexible data model.

Approach

ClickHouse isn't just faster PostgreSQL—it's a fundamentally different architecture. We evaluated Snowflake, BigQuery, and ClickHouse before choosing ClickHouse for its columnar compression and predictable sub-second analytical queries at this data volume. The critical decision was partitioning strategy: we partitioned by event_date with a custom hash on customer_id to co-locate related data while enabling parallel scan.

Implementation

A three-node ClickHouse cluster with replicated MergeTree tables ensured high availability. Tables were partitioned by event_date with a secondary sort key on customer_id for common query patterns. Materialized views pre-aggregated common metrics—daily active users, conversion funnels, pipeline velocity—reducing runtime aggregation costs by 80%. Kafka consumed events from the existing pipeline, with a custom schema enforcement layer preventing malformed data from entering ClickHouse. Queries were rewritten to leverage ClickHouse's vectorized execution: pre-where clauses to prune columns early, final modifier for deduplication, and sample() for approximate queries where exactness wasn't required.

Results

P95 query latency dropped from 2-5 seconds to 250ms—delivering true real-time analytics. The team now runs 50+ concurrent dashboard queries without degradation. Infrastructure costs dropped 35% despite 3x throughput increase, because ClickHouse's compression (6-8x for event data) reduced storage requirements dramatically. Peak event volume handled: 200M events/day without tuning.

Key Insight

The 10x performance gain came from three architectural decisions: (1) partitioning by time with customer_id as the sort key, (2) materialized views for common aggregations, and (3) rewriting queries for ClickHouse's vectorized engine rather than PostgreSQL patterns. Most ClickHouse migrations fail because teams port SQL verbatim without adapting to columnar execution.

Related Projects