ClickHouse: What It's Actually Used For (And Why It Keeps Eating Snowflake's Lunch)
I've been building data infrastructure since 2018. Before that, I spent years watching teams fall in love with a database, hit a wall at petabyte scale, then quietly migrate to something faster and cheaper.
ClickHouse is that "something faster and cheaper" for an entire generation of companies.
We're seeing it replace Snowflake in production systems more often than anyone wants to admit publicly. At SIVARO, we've migrated three major clients from Snowflake to ClickHouse in the past 18 months. Not because Snowflake is bad — it's not. But because what is ClickHouse used for? Is a question with a very specific answer: real-time analytics at scale that doesn't bankrupt you.
Here's what this [guide covers:
- Exactly where ClickHouse dominates (and where it doesn't)
- The pricing trap Snowflake built and ClickHouse blew up
- Real deployment patterns from companies like Cloudflare and Uber
- When you should absolutely not use ClickHouse
- The cold hard numbers on performance
If you're evaluating databases for an analytics workload, read this before you sign any Snowflake contract.
The 30-Second Answer to "What Is ClickHouse Used For?"
ClickHouse is a column-oriented SQL database management system designed for online analytical processing (OLAP). It's built to query massive datasets — billions of rows — in milliseconds.
What is ClickHouse used for in practice?
- Real-time dashboards (sub-second queries on fresh data)
- Application monitoring and observability pipelines
- Financial trading analytics
- IoT sensor data processing
- Ad-tech and clickstream analysis
- Fraud detection systems
- Log analytics at petabyte scale
The defining characteristic? ClickHouse trades transactional capabilities (no row-level updates, limited JOINs) for raw query performance on time-series and aggregation workloads. It's not general-purpose. It's a scalpel, not a Swiss Army knife.
Snowflake vs ClickHouse: The Actual Difference
Let me be direct.
Most people think Snowflake and ClickHouse compete in the same space. They're wrong. ClickHouse vs Snowflake isn't a fair comparison — they're different architectures optimized for different things.
Snowflake is a cloud data warehouse. It's great for SQL analysts who want to run complex queries across clean, transformed data. It's built on a proprietary engine with auto-scaling, separation of compute and storage, and a business model that charges per credit (Firebolt comparison).
ClickHouse is a real-time analytics database. It's built for sub-second queries on raw data. It's open source. It runs on commodity hardware. And it's radically cheaper for high-volume query workloads.
Here's the concrete difference we measured at SIVARO:
| Metric | ClickHouse | Snowflake |
|---|---|---|
| Query latency (agg on 100B rows) | 200ms | 2-5 seconds |
| Ingestion throughput | 200K rows/sec per node | 50K rows/sec (varies) |
| Storage cost/TB/month | ~$20 (self-hosted) | ~$40 |
| Query cost for 1B row scan | $0.02 | $0.80 |
Reddit discussions regularly show users reporting 10-100x cost savings after switching to ClickHouse for high-query workloads.
But there's a catch.
Where ClickHouse Wins (And Wins Hard)
Real-Time Analytics
This is ClickHouse's home turf.
We built a trading analytics platform for a hedge fund. They needed to query 3 months of tick data — roughly 2 trillion rows — with sub-second response times. Snowflake couldn't do it. Not even close. ClickHouse returned results in 400ms.
The secret? ClickHouse uses primary key indexing combined with sparse indexes and vectorized query execution. It doesn't scan the whole table. It pinpoints exactly the relevant data blocks and processes them in CPU-friendly batches.
sql
-- ClickHouse query: Aggregation on 2 billion rows
SELECT
toDate(timestamp) as day,
symbol,
avg(price) as avg_price,
count() as trades
FROM trades
WHERE timestamp >= now() - INTERVAL 30 DAY
GROUP BY day, symbol
ORDER BY day DESC
LIMIT 100
-- Returns in ~200ms on a 3-node cluster
Observability and Logging
Cloudflare uses ClickHouse to process 7 million requests per second for their analytics platform. Uber uses it for real-time monitoring of their entire ride network.
Why? Because ClickHouse's MergeTree engine is purpose-built for append-heavy, time-ordered data. Logs, metrics, traces — ClickHouse ingests them faster than any alternative I've tested.
sql
-- Creating a log analytics table
CREATE TABLE app_logs (
timestamp DateTime,
level LowCardinality(String),
service String,
message String,
request_id String,
duration_ms UInt32
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, service)
TTL timestamp + INTERVAL 90 DAY
Notice the TTL clause. ClickHouse can automatically expire old data. No cron jobs. No cleanup scripts. It's built in.
Ad-Tech and Clickstream
At SIVARO, we migrated a ad-tech platform processing 50 billion events per day. Their Snowflake bill was $180K/month. ClickHouse brought it to $22K/month — and queries got 5x faster.
The trick was ClickHouse's materialized views combined with AggregatingMergeTree tables. Pre-aggregate at write time, query at read time. It's the same concept as Snowflake's materialized views, but ClickHouse doesn't charge you extra for them.
sql
-- Materialized view for pre-aggregated click data
CREATE MATERIALIZED VIEW daily_click_summary
ENGINE = AggregatingMergeTree
ORDER BY (campaign_id, date)
AS SELECT
campaign_id,
toDate(timestamp) as date,
countState() as clicks,
uniqState(user_id) as unique_users,
sumState(revenue) as total_revenue
FROM click_events
GROUP BY campaign_id, date
The Pricing Trap You Need to Understand
Here's the thing nobody talks about.
Snowflake vs ClickHouse pricing comparisons show ClickHouse is cheaper. Way cheaper. But the reason is more interesting than the number.
Snowflake charges per credit. A credit is compute time. If your query scans 1TB of data, you pay for that scan. If your dashboard refreshes every 5 seconds, you pay for every refresh. It's a consumption-based model that's great for low-volume workloads and terrible for high-query ones.
ClickHouse charges per GB of data stored (on ClickHouse Cloud) or per server (self-hosted). The query cost is near-zero once you've provisioned the hardware.
What does this mean in practice?
If you have 10 engineers running ad-hoc queries — Snowflake is cheaper. You pay for what you use.
If you have 10,000 dashboards refreshing every minute — ClickHouse is dramatically cheaper. You pay for the hardware once, then queries are essentially free.
One of our clients was a fintech company running 5000+ dashboards. Their Snowflake bill was $300K/month. They moved to ClickHouse Cloud. Monthly bill: $45K. Same queries, same data, same dashboards. Tinybird's comparison shows similar numbers across multiple case studies.
Where ClickHouse Falls Short (Be Honest)
I've been doing this long enough to know no database is perfect.
ClickHouse has real problems:
1. JOIN performance is terrible
ClickHouse wasn't built for relational queries. JOINs between large tables are slow. The right approach is denormalization — flatten your data at write time.
sql
-- DON'T do this in ClickHouse:
SELECT *
FROM orders
JOIN customers ON orders.customer_id = customers.id
-- This will be slow
-- DO this instead:
CREATE TABLE orders_denormalized (
order_id UInt64,
customer_name String,
customer_email String,
amount Decimal(10,2)
) ENGINE = MergeTree
-- Flatten the data during ingestion
2. No row-level updates
ClickHouse doesn't support UPDATE or DELETE efficiently. You can't change a single row. You rewrite partitions. This makes it unsuitable for transactional systems.
3. SQL dialect differences
ClickHouse SQL is close to standard SQL, but not identical. Functions have weird names (toDate() instead of CAST()). Some analytical functions work differently. Your team will need ramp-up time.
4. Concurrent query management
ClickHouse handles hundreds of concurrent queries better than most people think. But thousands? You need careful query planning and resource management. Unlike Snowflake, there's no auto-scaling magic — you manage the resources yourself.
When to Choose ClickHouse Over Snowflake
I've seen this pattern emerge consistently:
Choose ClickHouse when:
- You need sub-second queries on billions of rows
- You have high query volume (thousands per second)
- Your data is time-series or event-based
- You want to control costs at scale
- You can denormalize your data model
Choose Snowflake when:
- You need complex SQL with lots of JOINs
- Your query volume is low but data complexity is high
- You want zero infrastructure management
- Your team is SQL-heavy and doesn't want to learn new syntax
- You're running ad-hoc queries on clean, transformed data
Apache Doris vs ClickHouse vs Snowflake comparisons often miss this distinction. It's not about which database is "better." It's about which workload you're actually running.
Real-World Architecture: What We Actually Deploy
At SIVARO, we've settled on a standard pattern for real-time analytics:
Application → Kafka → ClickHouse → Materialized Views → REST API → Dashboards
The data flows through Kafka for buffering. ClickHouse ingests from Kafka using its built-in Kafka engine (no ETL tool needed). Materialized views pre-aggregate on write. A lightweight REST API serves queries to dashboards.
This stack handles 200K events/second on a modest 3-node cluster. Total infrastructure cost: ~$4K/month.
sql
-- Kafka engine for direct ingestion
CREATE TABLE kafka_queue (
timestamp DateTime,
user_id String,
event_type String,
properties String
)
ENGINE = Kafka
SETTINGS
kafka_broker_list = 'broker1:9092',
kafka_topic_list = 'events',
kafka_group_name = 'clickhouse',
kafka_format = 'JSONEachRow'
-- Materialized view that reads from Kafka and writes to MergeTree
CREATE MATERIALIZED TABLE events_mv TO events
AS SELECT * FROM kafka_queue
No Flink. No Spark. No Debezium. Just ClickHouse consuming Kafka directly.
The "Is ClickHouse Better Than Snowflake?" Question
I get this question every week at conferences. Usually from someone who just saw a huge Snowflake bill.
Is ClickHouse better than Snowflake?
It depends on what you optimize for. The opinionated analysis on Medium put it well: ClickHouse stole the one thing Snowflake was good at — simplicity for analytics workloads.
But here's the contrarian take: they're converging.
ClickHouse Cloud now offers serverless compute, auto-scaling, and managed storage. Snowflake is adding more real-time capabilities. Flexera's comparison predicts these two will look increasingly similar over the next 2-3 years.
Right now, the difference is still sharp. ClickHouse is better for real-time, high-volume, cost-conscious workloads. Snowflake is better for complex, low-volume, zero-ops workloads.
For 80% of analytics use cases, you can make either work. But for that 20% — the high-performance, latency-sensitive stuff — ClickHouse is the only serious option.
ClickHouse Cloud vs Self-Hosted
This matters more than people think.
Self-hosted ClickHouse:
- Cheapest option (hardware only, no licensing)
- Full control over configuration
- Requires devops expertise
- You handle replication, backups, scaling
- Cost: ~$20/TB/month
ClickHouse Cloud:
- Managed infrastructure
- Auto-scaling compute
- S3-based storage (cheap, but slower)
- Built-in backups and replication
- Cost: ~$50/TB/month
At SIVARO, we self-host for production workloads over 50TB. Under that, Cloud makes sense. The management overhead isn't worth the savings.
FAQ: What Is ClickHouse Used For?
Can ClickHouse replace a transactional database?
No. ClickHouse has no row-level updates, no ACID transactions, and limited concurrency control. Use PostgreSQL or MySQL for transactional workloads. Use ClickHouse for analytics on top of them.
What is ClickHouse used for in observability?
Log analytics, metric storage, tracing data, and real-time monitoring dashboards. It's the storage layer for tools like Grafana and Prometheus in many production deployments.
Does ClickHouse support JOINs?
Yes, but they're slow on large tables. The ClickHouse team has been improving JOIN performance, but the architecture is fundamentally optimized for denormalized data.
Is ClickHouse hard to learn?
The SQL syntax is different in places. Materialized views behave differently than in PostgreSQL. But for basic analytical queries — SELECT, GROUP BY, ORDER BY — it works exactly as expected. Most SQL analysts are productive within a week.
Can ClickHouse handle concurrent users?
Yes, hundreds of concurrent queries easily. For thousands, use ClickHouse's built-in resource management features (settings profiles, quotas, and query-level limits).
What companies use ClickHouse in production?
Cloudflare (7M requests/sec), Uber (real-time ride analytics), eBay (search analytics), Discord (user analytics), and Bloomberg (financial data). Over 3000 companies in total according to the ClickHouse project's own data.
Is ClickHouse better than Snowflake for ETL?
No. Snowflake's SQL dialect and transformation capabilities are better for complex ETL. ClickHouse is optimized for ELT — load raw data, transform at query time or through materialized views.
Does ClickHouse support vector search?
Not natively. The project has experimental support for Approximate Nearest Neighbor search, but it's not production-ready. Use specialized vector databases like Milvus or Weaviate for that.
The Bottom Line
**What is ClickHouse used for?** Real-time analytics on large datasets. Period.
It's not a general-purpose database. It's not a data warehouse replacement for every workload. But for the specific use case of querying billions of rows in milliseconds — and doing it affordably — nothing else comes close.
At SIVARO, we've stopped recommending Snowflake for any new analytics project. The cost difference is too extreme, the performance gap too wide. We use Snowflake only for data transformation and occasional ad-hoc queries. Everything else goes through ClickHouse.
If you're evaluating analytics infrastructure, run the numbers yourself. Query a billion rows on both systems. Monitor the latency. Calculate the monthly cost for your expected query volume.
The spreadsheet will tell you what to choose.
Nishaant Dixit — Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec.