723M+ rows across 43 tables. Migrated from QuestDB in April 2026. 5-10x better compression, sub-second range queries on minute bars.
HOW WE USE IT
ClickHouse 26.3 is the primary analytical database across the entire Avo stack. As of May 2026, the database holds 723M+ rows across 43 tables covering OHLCV bars, exchange ticks, macro data, regime signals, novelty events, funding rates, and DeFi TVL snapshots.
The migration from QuestDB happened in April 2026. QuestDB was the original choice for its fast ILP ingest, but it struggled with the analytical query patterns that the intelligence layer needed: wide range scans over 270M minute bars, multi-symbol aggregations, and conditional argMax queries for "latest value per symbol." ClickHouse handles these in 20-100ms versus QuestDB's 100-500ms. The 5-10x compression improvement (from Gorilla codecs on OHLCV columns plus ZSTD(3) final layer) reduced storage from a projected 200GB+ to 14.19GB for 723M rows.
Compression codecs are configured per column type: DoubleDelta for timestamps (monotone increment = near-zero entropy), Gorilla for OHLCV floats (adjacent tick correlation), LowCardinality for symbol strings (high repetition across a fixed universe), and ZSTD(3) as the outer compression layer on all columns.
Example workflow: adding a new time-series table for a client's analytics product. 1. Define the table with ENGINE = MergeTree() ORDER BY (symbol, toStartOfMinute(ts)), PARTITION BY toYYYYMM(ts). 2. Choose codecs per column: DoubleDelta for the timestamp column, Gorilla for any float metrics, LowCardinality(String) for symbol. 3. Set TTL to ts + INTERVAL 1 YEAR if the client only needs rolling history. 4. Write the ingest path: a Rust binary batches rows into 1,000-row blocks and inserts via the HTTP interface (POST /). Avoid individual-row inserts; they create small parts that ClickHouse must merge constantly. 5. Add a nightly audit query that checks MAX(ts) and MIN(ts) per symbol and alerts if any symbol's freshest row is older than 25 hours. 6. Test the primary query pattern (SELECT argMax(close, ts) FROM bars GROUP BY symbol) against the table with EXPLAIN. Verify the query reads from the primary index rather than doing a full scan.
TTL is configured on bars_1m at ts + INTERVAL 2 YEAR. This prevents unbounded storage growth while keeping the full 2-year lookback needed for regime training. The bars_1d table has no TTL because daily bars are tiny (10M rows) and serve as the permanent historical record.
Two corruption incidents shaped the ops process. First, 1,315 rows in bars_1d and 4.3M rows in weather_data had timestamps set to year 2299, caused by an epoch-seconds versus epoch-milliseconds bug in an early ingest script. These were found by a nightly audit query and deleted. Second, seven other tables had valid data with 1970-epoch timestamps, which required backfilling from source rather than deletion. The data validation engine (argus-validate, ReplacingMergeTree) now quarantines any row with a timestamp more than 1 year in the future before it reaches the main tables.
ClickHouse is bound to 127.0.0.1 only. The Next.js API layer proxies all external queries. Max server memory is capped at 32GB; max per-query memory at 16GB.
Production numbers
723M+
Total rows
43
Tables
14.19 GB
Storage (723M rows)
2-5x
Query speedup vs QuestDB
We built a 723M-row market data pipeline ingesting 10 exchanges simultaneously at under 50ms tick-to-storage latency.
723M+ Total rows stored
Read case study →
DataWe migrated 425M rows to ClickHouse and achieved 8x storage compression and 15x faster analytical scans versus our prior QuestDB setup.
723M+ Rows stored
Read case study →
DataWe replaced a Python fan-in that dropped ticks under load with a Rust multi-task aggregator handling 80,000 ticks per second across 10 exchanges at 3.1% CPU.
80K tick/s Peak throughput
Read case study →
DataWe migrated 425M rows across 43 tables from a CPU-saturating QuestDB deployment to ClickHouse in 6.5 days with zero data loss.
425M+ Rows migrated
Read case study →
DataWe built a zero-cost downloader collecting 11,706 equity symbols across 19+ global exchanges, replacing $8,000 to $22,000 per month in vendor licensing.
11,706 Total symbols collected
Read case study →
DataWe built a revision-aware FRED pipeline tracking 63 macro series with 90-day lookback windows, growing coverage from 32 to 63 series in one sprint.
63 FRED series tracked (from 32)
Read case study →
Start a project
Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a project →