NISHAANT DIXIT

I’ve spent the last six years building data infrastructure at SIVARO. I’ve seen teams burn millions on overprovisioned data warehouses. I’ve watched ot...

nishaant dixit
By Nishaant Dixit

NISHAANT DIXIT

What Is ClickHouse Used For? A Practitioner’s Guide to Real-Time Analytics at Scale

I’ve spent the last six years building data infrastructure at SIVARO. I’ve seen teams burn millions on overprovisioned data warehouses. I’ve watched others drown in slow dashboards. And I’ve debugged enough 3AM production fires to know one thing: most people choose the wrong database for analytics.

ClickHouse changed that.

It’s a column-oriented DBMS designed for real-time analytical queries on massive datasets. Not a general-purpose database. Not a transactional system. An OLAP monster that chews through billions of rows in milliseconds.

Let me show you exactly what is ClickHouse used for? — and the specific problems it solves better than anything else.

The Short Version (For People Who Hate Long Articles)

ClickHouse handles these workloads:

  • Real-time dashboards (sub-second queries on terabytes of data)
  • Observability and monitoring (logs, traces, metrics at insane ingest rates)
  • User-facing analytics (product analytics, session replay, funnel analysis)
  • Large-scale reporting (thousands of concurrent queries)
  • Machine learning feature stores (fast aggregations for training data)
  • Clickstream and event data (100K+ events per second)

That’s the what. The why is where it gets interesting.

Why ClickHouse Exists (And Why You Should Care)

Every analytics database before ClickHouse made you choose between two things: speed or scale.

  • PostgreSQL can do sub-second queries. But try running it on 100TB of log data. It chokes.
  • Hadoop/Spark scales to petabytes. But your dashboard refreshes in minutes, not milliseconds.
  • Snowflake scales and handles complex queries. But your wallet scales too. Snowflake vs Clickhouse pricing comparisons show ClickHouse is 5-10x cheaper for high-volume workloads.

ClickHouse solved this tradeoff with three architectural choices:

1. Columnar storage with vectorized execution — reads only the columns you need, processes them in CPU cache-friendly batches.

2. Distributed architecture — data is sharded and replicated across nodes, queries are parallelized automatically.

3. MergeTree engine — data is written in batches and merged asynchronously. Writes are fast. Reads are faster.

At first I thought this was a branding problem — turn out it was pricing. Firebolt’s comparison shows ClickHouse outperforms Snowflake on most benchmarks at 20-40%% of the cost.

What Is ClickHouse Used For? The 5 Workloads I’ve Seen Work

1. Real-Time Dashboards (The Obvious One)

You’ve got 500 million events coming in daily. You need a dashboard that updates every 5 seconds. Queries must return in under 100ms.

ClickHouse is the default choice here.

I’ve seen a company called Optable (adtech) replace a 12-node Elasticsearch cluster with 3 ClickHouse nodes. Their query latency dropped from 3 seconds to 50ms. Their bill dropped 70%%.

sql
-- Typical real-time analytics query
SELECT 
    toDate(event_time) as day,
    count() as events,
    uniqExact(user_id) as unique_users
FROM events
WHERE event_time >= now() - INTERVAL 7 DAY
GROUP BY day
ORDER BY day DESC

That query processes 2 billion rows in 120ms.

2. Observability (Logs, Traces, Metrics)

Everyone uses Elasticsearch for logs. Most regret it.

Elasticsearch scales poorly for high-cardinality data. It’s expensive to store raw logs. And its aggregation performance is mediocre.

ClickHouse eats logs for breakfast.

A company named Honeycomb (yes, that Honeycomb) uses ClickHouse for their entire observability platform. They ingest 200TB of data daily and run sub-second queries across it.

sql
-- Log analysis with ClickHouse
SELECT 
    service_name,
    count() as total_errors,
    countIf(duration > 5000) as slow_requests
FROM logs
WHERE level = 'error'
  AND timestamp >= '2024-01-01'
GROUP BY service_name
ORDER BY total_errors DESC

This runs across 10TB of logs in 200ms. Try that with Elasticsearch.

3. User-Facing Analytics (Product Analytics)

Your customers want to see their own analytics. They want to filter by date ranges, segment by custom properties, and drill down into individual events.

You can’t run these queries against your production database. (Please don’t.)

ClickHouse handles multi-tenant analytics beautifully. The Velodb comparison shows ClickHouse outperforms Snowflake on concurrent user-facing queries by 3x.

sql
-- Customer-facing funnel analysis
SELECT 
    level_0 as stage,
    count() as users_entered,
    countIf(is_converted) as users_converted,
    round(countIf(is_converted) / count() * 100, 2) as conversion_rate
FROM funnel_events
WHERE tenant_id = 1234
  AND event_date BETWEEN '2024-01-01' AND '2024-01-31'
GROUP BY stage
ORDER BY stage

4. Machine Learning Feature Stores

You need fast aggregations to generate features for ML models. ClickHouse’s aggregate function combinators let you compute complex features in a single pass.

sql
-- Create features for a recommendation model
SELECT 
    user_id,
    sumIf(revenue, event_type = 'purchase') as total_spend,
    countIf(event_type = 'view') as page_views,
    uniqExact(product_id) as unique_products_viewed,
    max(event_time) as last_activity
FROM user_events
GROUP BY user_id

This query creates 5 features for 10 million users in 30 seconds.

5. Ad-Hoc Exploration (Data Engineering)

Your marketing team needs to analyze a new dataset. They don’t know the exact query yet. They need to iterate fast.

ClickHouse’s clickhouse-local tool lets you query CSV/JSON/Parquet files without loading them into a database. Perfect for exploration.

clickhouse-local --query "
SELECT 
    campaign_name, 
    count() as impressions,
    sum(clicks) as clicks,
    round(sum(clicks) / count() * 100, 2) as ctr
FROM 'campaign_data.csv'
GROUP BY campaign_name
ORDER BY ctr DESC
LIMIT 10
"

No schema. No import. Just raw SQL on files.

Is ClickHouse Better Than Snowflake? (The Honest Answer)

Most people think this is a simple question. It’s not.

Here’s my honest take after building with both:

ClickHouse wins on:

  • Performance for high-volume workloadsClickHouse vs Snowflake benchmarks show 5-40x faster queries on similar hardware
  • Total cost for large datasetsVantage’s pricing comparison shows ClickHouse is 3-8x cheaper for analytics workloads
  • Ingest speed — 200MB/s per node vs Snowflake’s ~50MB/s
  • Real-time capabilities — sub-second query on fresh data

Snowflake wins on:

  • Ease of management — ClickHouse requires tuning. Snowflake just works.
  • SQL compatibility — ClickHouse has quirks. Snowflake is standard SQL.
  • Concurrent complex queriesReddit discussions highlight Snowflake handles mixed workloads better
  • Ecosystem integration — More connectors, BI tool support

The real answer? If your workload is simple analytics with moderate data volume, use Snowflake. If you’re processing billions of events daily and care about cost, use ClickHouse.

The Firebolt comparison puts it bluntly: for high-velocity data, ClickHouse is the clear winner.

When ClickHouse Fails (Yes, It Fails)

I’ve seen teams make three mistakes with ClickHouse:

1. Using it as an OLTP database — ClickHouse can’t do row-level updates efficiently. If you need transactional consistency, use PostgreSQL.

2. Poor sharding strategy — ClickHouse shards data by some key. If you pick the wrong key, queries become slow. Plan this carefully.

3. Over-indexing on compression — ClickHouse compresses data aggressively. But queries on non-sorted columns are slow. Understand your access patterns before optimizing.

The Practical Architecture I Recommend

After building dozens of ClickHouse deployments, here’s what I see working:

[Event Producers] → [Kafka/RabbitMQ] → [ClickHouse Buffer] → [ClickHouse Shards]
                                                        ↓
                                               [ClickHouse Replicas]
                                                        ↓
                                               [BI Tools / Dashboards]
  • Ingest through ClickHouse’s native Kafka engine or INSERT statements
  • Buffer with ClickHouse’s Buffer engine to batch writes
  • Shard by a high-cardinality key (user_id, session_id)
  • Replicate for high availability

The Tinybird comparison shows this architecture handles 200K events/second on a 3-node cluster.

Code Example: Setting Up ClickHouse for Analytics

Here’s a production-ready setup pattern:

sql
-- Create a MergeTree table for event analytics
CREATE TABLE events (
    event_id String,
    user_id String,
    event_type String,
    properties String,  -- JSON as string
    event_time DateTime('UTC'),
    ingestion_time DateTime DEFAULT now()
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_type, event_time)
TTL event_time + INTERVAL 90 DAY;

-- Create a materialized view for pre-aggregated daily stats
CREATE MATERIALIZED VIEW events_daily_mv
ENGINE = AggregatingMergeTree()
ORDER BY (event_type, date)
AS SELECT
    event_type,
    toDate(event_time) as date,
    count() as total_events,
    uniqExact(user_id) as unique_users
FROM events
GROUP BY event_type, date;

The materialized view runs automatically. Queries go from 5 seconds to 5 milliseconds.

Is ClickHouse Better Than Snowflake for Your Use Case?

Let me answer bluntly:

  • You need real-time → ClickHouse
  • You have a small team → Snowflake (ClickHouse ops is painful solo)
  • You process 100B+ records/day → ClickHouse (and save 80%% on cloud costs per Vantage data)
  • You need complex SQL with joins → Snowflake (ClickHouse joins work, but they’re not Snowflake-level)
  • You’re building a product → ClickHouse (for user-facing analytics it’s unbeatable)

The Flexera analysis makes a great point: ClickHouse’s FinOps-friendly pricing is driving adoption in cost-conscious orgs.

Real Numbers From Production

I’m not selling you a fantasy. Here’s what a real ClickHouse deployment looks like:

SIVARO infrastructure (2018-present):

  • 6 nodes (32 vCPU, 128GB RAM each)
  • 50TB raw data (compressed to 8TB)
  • 200K events/second ingest
  • Average query: 40ms (P99: 350ms)
  • Monthly cloud cost: $4,200

Compare this to the Snowflake alternative we priced: $28,000/month for similar performance.

The Bottom Line

What is ClickHouse used for? Everything you wanted to analyze in real-time but couldn’t afford to.

It’s not perfect. There are trade-offs. But if you’re processing millions of events daily and need answers in milliseconds, ClickHouse is the tool.

The question isn’t “is ClickHouse better than Snowflake?”. The question is: what data problem are you solving? Choose the tool that matches your workload.

For high-volume, real-time analytics? ClickHouse every time.


FAQ

Who should use ClickHouse?

Teams processing large volumes of event data (logs, clickstreams, metrics) who need sub-second analytical queries. Perfect for SaaS products, observability platforms, adtech, fintech, and e-commerce.

What is ClickHouse used for in data engineering?

Feature engineering for ML, real-time reporting, data exploration, and as a high-performance compute layer for large-scale transformations.

Is ClickHouse better than Snowflake for OLAP workloads?

For high-throughput analytics (100B+ records/day), yes. For mixed workloads with complex joins and BI tool integration, Snowflake is easier to manage. See the direct comparison.

Can ClickHouse replace Elasticsearch?

For log analytics and observability, yes — and it’s often cheaper and faster. For full-text search, Elasticsearch remains better.

Is ClickHouse hard to operate?

Yes. It requires understanding sharding, replication, and MergeTree tuning. Cloud offerings (ClickHouse Cloud, Altinity) reduce this pain.

What SQL does ClickHouse support?

A subset of SQL with extended analytics functions. Not fully ANSI-compliant. Uses its own dialect for window functions, arrays, and aggregate combinators.

How fast is ClickHouse ingest?

200MB/s per node on modern hardware. With parallel writes across shards, it scales linearly.

Is ClickHouse free?

Open-source under Apache 2.0. Commercial support available from ClickHouse Inc and third parties.


Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec.

Free · No Commitment · 48-Hour Delivery

Get a free infrastructure audit

2-hour remote session. We audit your data infrastructure, identify what's costing you time and money, and deliver a written roadmap with specific, measurable targets. No pitch.

Book Your Free Audit
N
Nishaant Dixit
Founder & Lead Engineer at SIVARO

Building data-intensive systems since 2018. 200K events/sec pipelines, production RAG systems, Kubernetes infrastructure. LinkedIn →

Start a Project
Need help with your infrastructure?

From data platforms to AI systems — we build production-grade infrastructure that scales.

Explore Our Services