What Does the ClickHouse Do? (And Why It's Eating Snowflake's Lunch)
Real-time analytics at scale is a lie most databases tell you. Until ClickHouse.
I spent 2018-2020 building data pipelines that kept crashing under load. We'd throw PostgreSQL at it, then Cassandra, then some in-memory hack. All of them broke around 50K events per second. Then I found ClickHouse in 2020. It ate our workload for breakfast. 200K events per second. No sweat. Compression so good our storage bill dropped 80%%.
Here's what I've learned in five years of running it in production: what does the clickhouse do? It answers analytical queries on massive datasets in milliseconds. That's it. But the how is where things get interesting.
This guide covers what ClickHouse is, why it's not just "another database," and the ugly trade-offs nobody talks about. If you're choosing between ClickHouse and Snowflake (or PostgreSQL, or that thing your CTO read about on Hacker News), you'll leave knowing exactly where each shines.
What ClickHouse Actually Is
ClickHouse is a column-oriented OLAP database management system. Open source. Built by Yandex in 2016. Now maintained by ClickHouse Inc.
But that description is useless. Let me be direct:
ClickHouse stores data by column, not by row. That means when you ask "what's the average revenue for the last 30 days?" it only reads the revenue column — not every single row's worth of customer_name, address, and favorite_color.
This makes it brutally fast for analytical queries. Aggregations. Filtering. Time series. The stuff your OLTP database (PostgreSQL, MySQL) chokes on when you have more than 100 million rows.
I've seen ClickHouse scan 500 million rows in under a second. On a single machine. Try that with Postgres.
At SIVARO, we switched our real-time dashboards from PostgreSQL to ClickHouse. Query time dropped from 12 seconds to 300 milliseconds. No caching layer. No materialized views. Just the raw data.
Is clickhouse sql or no sql? It's SQL. Real SQL. With JOINs, subqueries, window functions. But it's SQL with a twist — you need to think column-first, not row-first.
The Core Thing ClickHouse Does: Real-Time Aggregation
Here's the killer use case.
You have an event stream — web page views, IoT sensor readings, stock trades, whatever. Millions of events per second. You need to answer questions like:
- How many users visited in the last hour?
- What's the 99th percentile latency for API requests?
- Show me revenue by product category for the last 30 days, broken down by hour.
ClickHouse handles this natively. Not with pain. Not with a separate streaming engine.
sql
SELECT
toStartOfHour(timestamp) AS hour,
product_category,
count(DISTINCT user_id) AS unique_users,
sum(revenue) AS total_revenue
FROM events
WHERE timestamp >= now() - INTERVAL 30 DAY
GROUP BY hour, product_category
ORDER BY hour DESC
LIMIT 100
That query on 10 billion rows? Under 2 seconds in ClickHouse. On a cluster of three commodity machines.
Snowflake can do this too. But the difference is latency and cost. Snowflake's cloud warehouse pricing can be 3-5x more expensive per query for real-time workloads. And Snowflake's minimum billing increment means you pay for compute even when idle.
Is clickhouse better than snowflake? For real-time analytics on streaming data? Yes. Unequivocally. For ad-hoc SQL with complex JOINs across multiple large tables? Snowflake has an edge.
Why Clickhouse Compresses Better Than Anything Else
This isn't a minor feature. It's a superpower.
ClickHouse's default compression ratio on real-world data is 5:1 to 10:1. I've seen 15:1 on time-series data. Compare that to Snowflake, which claims 2:1 to 4:1.
Why? ClickHouse stores data sorted. By default, it uses a primary key that determines physical storage order. Consecutive rows with similar values compress insanely well.
sql
CREATE TABLE events (
timestamp DateTime,
user_id UInt64,
event_type String,
revenue Float64
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
The ORDER BY clause isn't just for sorting. It determines physical layout. ClickHouse stores rows with adjacent timestamps right next to each other on disk. Run-length encoding on timestamps with millisecond precision? Near-perfect compression.
I once migrated a 2 TB PostgreSQL dataset to ClickHouse. The ClickHouse version was 180 GB. One-tenth the storage. Our backup costs disappeared.
The Architecture That Makes It Possible
ClickHouse is not a "cloud-only" database. You can run it on a single Raspberry Pi (yes, I've done this). You can run it on a 64-core beast. You can cluster it across 100 machines.
The architecture is shared-nothing. Each node holds a subset of the data. Queries are distributed across nodes, results merged, returned. No single point of contention.
Compare to Snowflake's architecture: Snowflake uses a shared-disk model with separate compute and storage. That's great for elasticity. But it introduces network overhead for every query. Snowflake's data never lives entirely in memory on the compute nodes — it's always reading from cloud object storage.
ClickHouse? It aggressively caches in memory. It uses the local SSDs on each node. For real-time queries, this kills Snowflake.
Here's a practical benchmark from a client project:
- Dataset: 5 billion rows of user activity events
- Query: "Count unique users by country for last 7 days"
- ClickHouse (8 nodes, 32GB RAM each): 0.8 seconds
- Snowflake (Medium warehouse): 4.2 seconds
- PostgreSQL (RDS, 64GB RAM): timed out after 60 seconds
I've run similar comparisons and the gap is consistent. ClickHouse wins on scan-heavy aggregation queries. Snowflake wins on queries with complex JOINs and subqueries that can't be pushed down.
What ClickHouse Doesn't Do Well (The Honest Part)
I'm not here to sell you a fairy tale. ClickHouse has sharp edges.
1. Row-level operations are painful
Want to update a single row? Good luck. ClickHouse isn't built for it. UPDATEs are asynchronous. DELETEs are mutations that rewrite entire partitions.
sql
ALTER TABLE events UPDATE revenue = 100 WHERE event_id = 'abc-123'
This doesn't work like PostgreSQL. It marks the old data as deleted and writes new data. On a table with 10 billion rows? That's a full partition rewrite. Takes minutes.
Is clickhouse better than postgres? For analytics? Yes. For transactional workloads with frequent row updates? Absolutely not. Use PostgreSQL for that.
2. JOIN performance is finicky
ClickHouse JOINs work. But they're not PostgreSQL JOINs. You need to think about the order of tables in the JOIN. The right table gets loaded into memory entirely.
For small dimension tables (a few million rows), it's fine. For joining two 100-billion-row tables? You'll hit memory limits. Apache Doris handles these cases better for star-schema workloads.
3. No built-in data governance
Snowflake has RBAC, column-level security, dynamic data masking, and row-level security. ClickHouse has... basic user/role permissions.
If you need fine-grained access control across hundreds of users, ClickHouse will make you cry. We built our own proxy layer at SIVARO for this. It's not fun.
4. Concurrency limits
ClickHouse can handle hundreds of concurrent queries. But thousands? The scheduler falls apart. Each query gets a slice of CPU, and with too many queries, they all get slow.
Snowflake handles concurrency better through its multi-cluster warehouse architecture. Some teams report 50%%+ performance degradation on ClickHouse at 500+ concurrent queries.
The Real Reason You'd Choose Clickhouse Over Snowflake
I've talked to twenty-plus teams making this decision. Here's the pattern I see:
Choose ClickHouse when:
- You have high write throughput (100K+ events/second)
- Your queries are mostly aggregations and filters
- You want predictable pricing (fixed hardware, not per-query)
- You need sub-second queries on billions of rows
- You want to run it yourself (edge, on-prem, air-gapped)
Choose Snowflake when:
- You have complex ETL with frequent JOINs
- Your users need ad-hoc SQL with unpredictable query patterns
- You need enterprise governance features today
- You don't want to manage infrastructure at all
The ClickHouse vs Snowflake comparison on Tinybird makes the same point: "ClickHouse is for developers building real-time systems. Snowflake is for analysts running occasional queries."
Most people think "big data = big cloud bill." They're wrong. ClickHouse on commodity hardware often beats Snowflake on cost by 3-5x while delivering better performance. The trade-off is operational complexity.
Practical Use Cases (With Code)
Time-Series Monitoring
This is ClickHouse's bread and butter.
sql
CREATE TABLE metrics (
timestamp DateTime,
host String,
cpu_usage Float64,
memory_usage Float64,
disk_io UInt64
) ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (host, timestamp)
The SummingMergeTree engine pre-aggregates metrics for matching ORDER BY keys. Queries that sum CPU usage per host? They read pre-computed sums, not raw rows. Insane performance.
User Analytics
sql
CREATE TABLE user_events (
event_id String,
user_id String,
event_type String,
properties String,
timestamp DateTime
) ENGINE = ReplacingMergeTree()
ORDER BY (user_id, timestamp)
The ReplacingMergeTree engine handles deduplication. If you insert the same event twice (common with Kafka retries), ClickHouse drops duplicates during merge.
Financial Data
sql
SELECT
asset,
avg(price) AS avg_price,
min(price) AS min_price,
max(price) AS max_price,
quantile(0.99)(price) AS p99_price
FROM trades
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY asset
ClickHouse has built-in approximate quantile functions. They use T-Digest algorithm. Accurate enough for dashboards, 100x faster than exact calculations.
The Controversial Take: Clickhouse Just Stole Snowflake's Best Feature
Snowflake's killer feature was zero-copy cloning and time travel. You could clone a 10 TB database in milliseconds. You could query data from any point in the last 90 days.
ClickHouse now has zero-copy replication using shared storage (S3, GCS, MinIO). And it has ALTER TABLE ... DROP PARTITION with time-based partitioning that effectively gives you time travel.
Is it as polished as Snowflake? No. But it exists. And it's free.
I've seen Snowflake bills of $50K/month for startups that barely needed it. ClickHouse on a 3-node cluster with S3 storage? $2K/month. Same performance. Better for real-time.
FAQ: What Does The ClickHouse Do?
Is ClickHouse better than Snowflake?
Depends on your workload. For real-time analytics, streaming data ingestion, and fixed-budget scenarios — yes. For ad-hoc SQL with complex JOINs and enterprise governance — no. I've run both in production.
What is ClickHouse and why is it used?
It's a columnar OLAP database designed for real-time analytics on massive datasets. Used for observability platforms (New Relic, Logz.io), ad-tech, financial analytics, and IoT. Anywhere you need sub-second queries on terabytes of data.
Is ClickHouse SQL or NoSQL?
It's SQL. Full stop. With some extensions (arrays, nested data types, specialized engine syntax). But it's dialect-compatible with most SQL you already know. MySQL and PostgreSQL wire protocol support are built in.
Is ClickHouse better than PostgreSQL?
For analytics queries on large datasets — absolutely. PostgreSQL falls over past 100M rows for analytical queries. For transactional workloads with row-level operations — PostgreSQL wins. Use both. They complement each other.
Is ClickHouse completely free?
Yes, the community edition is Apache 2.0 licensed. Free to run anywhere. ClickHouse Inc offers a managed cloud version with additional features (autoscaling, backup, monitoring). You can run ClickHouse on your own hardware for zero licensing cost.
What does the ClickHouse do that other databases can't?
Scans billions of rows per second per core. Compresses data by 5-10x. Handles 100K+ writes/second on a single node. Runs on anything from a Raspberry Pi to a 100-node cluster. That combination is unique.
What is ClickHouse used for in production?
Real-time dashboards, observability platforms, ad-tech bid analysis, financial trading analytics, IoT sensor data, clickstream analysis, log analytics, any metric aggregation workload. Tinybird, Cloudflare, Uber, eBay, and Bloomberg all run it.
How does ClickHouse compare to Apache Doris?
Doris handles star-schema JOINs better (fact tables with multiple dimension tables). ClickHouse handles wide-table scans and single-table aggregations better. Doris excels at MySQL-compatible JDBC workloads. ClickHouse excels at streaming ingestion and raw performance.
My Final Take
I've been wrong about databases before. I thought MongoDB would win everything (it didn't). I thought Cassandra was the answer (it wasn't). ClickHouse is the first database that matched what I actually needed: a way to ask analytical questions on live data without waiting.
Is clickhouse better than snowflake? If you're building a product that needs real-time analytics, yes. If you're running occasional dashboards for the C-suite, Snowflake is fine.
Is clickhouse completely free? Yes. And you can run it on a $20/month VPS. I've done it. Hit me up if you want the config.
What does the clickhouse do? It makes the impossible query possible. That's not marketing. That's my experience.
Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec.