ClickHouse in production

723M+ rows across 43 tables. Migrated from QuestDB in April 2026. 5-10x better compression, sub-second range queries on minute bars.

HOW WE USE IT

ClickHouse at Avo

ClickHouse 26.3 is the primary analytical database across the entire Avo stack. As of May 2026, the database holds 723M+ rows across 43 tables covering OHLCV bars, exchange ticks, macro data, regime signals, novelty events, funding rates, and DeFi TVL snapshots.

The migration from QuestDB happened in April 2026. QuestDB was the original choice for its fast ILP ingest, but it struggled with the analytical query patterns that the intelligence layer needed: wide range scans over 270M minute bars, multi-symbol aggregations, and conditional argMax queries for "latest value per symbol." ClickHouse handles these in 20-100ms versus QuestDB's 100-500ms. The 5-10x compression improvement (from Gorilla codecs on OHLCV columns plus ZSTD(3) final layer) reduced storage from a projected 200GB+ to 14.19GB for 723M rows.

Compression codecs are configured per column type: DoubleDelta for timestamps (monotone increment = near-zero entropy), Gorilla for OHLCV floats (adjacent tick correlation), LowCardinality for symbol strings (high repetition across a fixed universe), and ZSTD(3) as the outer compression layer on all columns.

Example workflow: adding a new time-series table for a client's analytics product. 1. Define the table with ENGINE = MergeTree() ORDER BY (symbol, toStartOfMinute(ts)), PARTITION BY toYYYYMM(ts). 2. Choose codecs per column: DoubleDelta for the timestamp column, Gorilla for any float metrics, LowCardinality(String) for symbol. 3. Set TTL to ts + INTERVAL 1 YEAR if the client only needs rolling history. 4. Write the ingest path: a Rust binary batches rows into 1,000-row blocks and inserts via the HTTP interface (POST /). Avoid individual-row inserts; they create small parts that ClickHouse must merge constantly. 5. Add a nightly audit query that checks MAX(ts) and MIN(ts) per symbol and alerts if any symbol's freshest row is older than 25 hours. 6. Test the primary query pattern (SELECT argMax(close, ts) FROM bars GROUP BY symbol) against the table with EXPLAIN. Verify the query reads from the primary index rather than doing a full scan.

TTL is configured on bars_1m at ts + INTERVAL 2 YEAR. This prevents unbounded storage growth while keeping the full 2-year lookback needed for regime training. The bars_1d table has no TTL because daily bars are tiny (10M rows) and serve as the permanent historical record.

Two corruption incidents shaped the ops process. First, 1,315 rows in bars_1d and 4.3M rows in weather_data had timestamps set to year 2299, caused by an epoch-seconds versus epoch-milliseconds bug in an early ingest script. These were found by a nightly audit query and deleted. Second, seven other tables had valid data with 1970-epoch timestamps, which required backfilling from source rather than deletion. The data validation engine (argus-validate, ReplacingMergeTree) now quarantines any row with a timestamp more than 1 year in the future before it reaches the main tables.

ClickHouse is bound to 127.0.0.1 only. The Next.js API layer proxies all external queries. Max server memory is capped at 32GB; max per-query memory at 16GB.

Production numbers

723M+

Total rows

Tables

14.19 GB

Storage (723M rows)

We built a revision-aware FRED pipeline tracking 63 macro series with 90-day lookback windows, growing coverage from 32 to 63 series in one sprint.

63 FRED series tracked (from 32)

Read case study →

View all 23 case studies using ClickHouse→

Start a project

Need a ClickHouse build?

Most projects ship in under two weeks. Start with a free 30-minute discovery call.

Start a project →

Start a Project