We migrated 425M rows to ClickHouse and achieved 8x storage compression and 15x faster analytical scans versus our prior QuestDB setup.
723M+
Rows stored
8x
Storage compression vs QuestDB
3.75-5.6x
90-day scan speedup
8 sec
Full-history scan (was timeout)
CHAPTER 01
Regime detection, signal scanning, and cross-asset correlation all share the same query pattern: read every row for a set of symbols over a multi-year window, compute aggregations, return a result. On QuestDB with 425M rows, a backtestable universe scan over 2,378 minute-resolution symbols at a 90-day lookback hit 45 seconds wall time on a 16-core, 64GB server. QuestDB's SQL engine is single-threaded on complex aggregations. A QuestDB query over a partitioned table cannot parallelize across partitions at query time. For ingest it was excellent; for analytical scans it hit the same wall as Postgres but for a different reason.
TimescaleDB's columnar compression extension helped, but it requires the Timescale license for the compression modes that deliver meaningful gains on float-heavy columns. Under the free license, compression ratios for OHLCV data peaked at roughly 2x over raw, still leaving a 90+ GB footprint for the eventual 425M-row dataset.
CHAPTER 02
We moved the entire analytical layer to ClickHouse 26.3.9.8 with 43 tables organized around data type rather than data source. The schema design principle was: minimize the number of columns a typical query needs to read. Tables are not normalized in the relational sense. The bars_1d table carries symbol, exchange, and source as LowCardinality strings on every row. In ClickHouse, LowCardinality encoding makes them essentially dictionaries compressed to 1 to 2 bytes per value, so the denormalization is free.
The primary key design follows ClickHouse's compound sort key model. For bars_1m the ORDER BY is (symbol, source, ts) with PARTITION BY toYYYYMM(ts). This means scans for a specific symbol's recent history hit a single monthly partition, and the data within that partition is sorted by symbol so reads are sequential.
We considered Druid for the query layer. Druid's segment model handles pre-aggregated metrics well but requires a separate ingestion pipeline, a ZooKeeper cluster, and a non-trivial operational footprint. We also considered DuckDB as an embedded analytics engine. DuckDB is excellent for single-process analytical workloads but lacks a concurrent-write story for live ingest. ClickHouse handles both simultaneously.
ARCHITECTURE OVERVIEW
SOURCES
ClickHouse 26.3.9.8
Rust 1.84 (clickhouse-rs 0.11)
TRANSFORM
systemd
validate + dedup
STORE
Linux kernel 5.15
partitioned
QUERY
Query API
+ cache
CHAPTER 03
We configured explicit per-column CODECs rather than accepting defaults. The timestamp column uses DoubleDelta plus ZSTD(3), exploiting the near-constant 1-minute delta between consecutive timestamps for roughly 8:1 compression on that column alone. Price columns use Gorilla(8) plus ZSTD(3); Gorilla stores XOR deltas between adjacent floats and price series exhibit high autocorrelation which Gorilla exploits directly. Combined with ZSTD(3) block compression, storage per price column runs roughly 2 to 3 bytes on average rather than 8 bytes uncompressed.
Port assignment caused four hours of confusion during initial setup. ClickHouse's default TCP port is 9000. QuestDB's default HTTP port is also 9000. Running both on the same host required moving ClickHouse's native TCP to port 9100 and documenting this in every Rust client configuration.
The async insert configuration took two iterations. The default async insert behavior collects rows in a 200ms window, which introduced unacceptable latency for the live-signal path. We ran signal-path writes on the synchronous path and bulk ingest on the async path with a 10MB buffer and 100ms max delay.
TECH STACK
CHAPTER 04
Measured against the same 16-core, 64GB server used for the prior QuestDB setup. The 90-day backtestable scan across 2,378 symbols dropped from 45 seconds to 8 to 12 seconds. Storage for 723M rows came in at 14.19 GB versus approximately 115 GB estimated for an equivalent row-store. Dashboard aggregation p95 latency dropped from 400 to 800ms down to 90 to 200ms. The full-history signal scan across 425M rows previously timed out at 120 seconds and now completes in 8 seconds.
Post-migration, the on-call burden from data layer incidents dropped to zero. QuestDB's garbage partition problem, which caused 10-core CPU saturation on every restart, has no equivalent in ClickHouse because partitions are directory-level constructs managed by the merge tree without named partition metadata. No partition metadata corruption has occurred in six weeks of operation.
723M+
Rows stored
8x
Storage compression vs QuestDB
3.75-5.6x
90-day scan speedup
8 sec
Full-history scan (was timeout)
CHAPTER 05
DECISION · 01
Chose per-column CODEC specifications over table-level defaults. The tradeoff: schema definitions are verbose and require updating when columns are added. What it gave us: Gorilla encoding on price columns cut storage by 40% beyond what LZ4 alone achieved. That additional compression is not recoverable after the fact without a full table rewrite.
DECISION · 02
Chose to bind ClickHouse to localhost only (127.0.0.1). No external IP binding. API access is proxied through the Next.js API layer with Redis caching and per-user rate limiting. The tradeoff: we cannot run ClickHouse queries directly from remote tools without an SSH tunnel. What it gave us: the database is not exposed to the internet at any port regardless of firewall misconfiguration.
DECISION · 03
The OPTIMIZE TABLE FINAL procedure for dedup-heavy tables is disruptive to write availability. Moving those tables to an insert-side deduplication strategy using Bloom filter sketches in Rust before writes reach ClickHouse would eliminate the need for post-hoc FINAL runs.
START A PROJECT
We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a ProjectWe built a 723M-row market data pipeline ingesting 10 exchanges simultaneously at under 50ms tick-to-storage latency.
723M+ Total rows stored
Read case study →
DataWe replaced a Python fan-in that dropped ticks under load with a Rust multi-task aggregator handling 80,000 ticks per second across 10 exchanges at 3.1% CPU.
80K tick/s Peak throughput
Read case study →
DataWe migrated 425M rows across 43 tables from a CPU-saturating QuestDB deployment to ClickHouse in 6.5 days with zero data loss.
425M+ Rows migrated
Read case study →