QuestDB got us to 425 million rows of OHLCV data across 10,752 daily symbols and 2,378 minute-bar symbols. Then it started showing cracks. The migration to ClickHouse took three weeks and we came close to losing 2,810 partitions before we built the validation gate that caught the corruption.
Why We Left QuestDB
QuestDB was a deliberate early choice. Write throughput is excellent, the SQL dialect is clean, and it handles time-series ingest patterns well out of the box. For the first few months, it did exactly what we needed.
The problems arrived in three batches. First, analytical query performance. Cross-symbol aggregations that Avo's signal pipeline needed to run continuously were getting slow as row counts crossed 200 million. Queries that should have completed in under 200ms were creeping into 1-2 seconds. For a real-time signal system, that is the difference between a live product and a batch product.
Second, compression. QuestDB's columnar compression is respectable but not exceptional. For OHLCV data with high structural regularity (monotonic timestamps, correlated price fields, volume distributions following known patterns), the compression ratio we were seeing left significant disk space on the table compared to what ClickHouse benchmarks suggested was achievable.
Third, and most critically: a WAL gotcha that cost us 2,810 partitions on a test run before we caught it in staging. More on that shortly.
The WAL Gotcha That Almost Ended Us
QuestDB's WAL (write-ahead log) tables have a behavior that is not prominently documented: executing a standard DROP TABLE on an active WAL table silently loses partition data. The table drops successfully with no error. The data is simply gone.
Our initial migration plan involved a CTAS (Create Table As Select) pattern: read from QuestDB, write to ClickHouse, then drop the QuestDB source tables to reclaim disk space. When we ran the first batch in a staging environment that mirrored production, the DROP commands appeared to succeed. The subsequent row count validation showed a discrepancy of 2,810 partitions. The data was gone.
The fix is FORCE DROP PARTITION instead of DROP TABLE on WAL-enabled tables. We also rewrote our decommission procedure:
-- DO NOT: DROP TABLE bars_1d;
-- WAL tables: use partition-level removal instead
-- Step 1: Verify ClickHouse has the data
SELECT count() FROM clickhouse.bars_1d WHERE ts >= '2020-01-01';
-- Step 2: Cross-check QuestDB source count
SELECT count() FROM questdb.bars_1d WHERE ts >= '2020-01-01';
-- Step 3: Only proceed if counts match within tolerance
-- Step 4: Remove partitions explicitly (never DROP TABLE on WAL)
ALTER TABLE bars_1d DROP PARTITION WHERE ts < dateadd('y', -3, now());After discovering this, we added a hard rule to our migration runbook: no CTAS+DROP patterns on any WAL-enabled QuestDB table. QuestDB was kept as a 72-hour fallback until ClickHouse was fully validated.
Get weekly intelligence delivered to your inbox
Curated signals, regime shifts, and anomaly highlights from Avo Intelligence. Every Monday. Free.
ClickHouse Schema Design
We provisioned 43 ClickHouse tables across the migration. The two highest-volume tables are bars_1d and bars_1m. The schema design decisions that mattered most:
CREATE TABLE bars_1d
(
symbol LowCardinality(String),
ts DateTime,
open Float32,
high Float32,
low Float32,
close Float32,
volume Float64,
source LowCardinality(String)
)
ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(ts)
ORDER BY (symbol, ts)
SETTINGS index_granularity = 8192;Key decisions: LowCardinality(String) for symbol and source fields. Symbol is repeated across millions of rows but has bounded cardinality (roughly 60,000 distinct values). LowCardinality encoding stores a dictionary lookup instead of the raw string, reducing storage significantly and improving GROUP BY performance.
ReplacingMergeTree instead of plain MergeTree handles the deduplication case: if a re-ingest of historical data produces duplicate rows for the same (symbol, ts) pair, ClickHouse deduplicates during background merges. This made re-ingesting historical gaps safe without explicit dedup logic in our Rust ingest binaries.
ORDER BY (symbol, ts)is the most important query performance decision. Our signal pipeline's most common access pattern is: fetch all bars for a specific symbol over a time range. With symbol as the leading sort key, that query is a single contiguous column scan rather than a scattered seek across time partitions.
For bars_1m, we added a TTL to keep storage bounded:
ALTER TABLE bars_1m MODIFY TTL ts + INTERVAL 2 YEAR;
ClickHouse enforces TTLs during background merges. No scheduled jobs, no partition management scripts. The 2-year TTL on minute bars keeps storage bounded while retaining the historical depth needed for backtesting and model training.
The Rust Ingest Architecture
Our ingest layer runs 23 Rust binaries. Each binary is responsible for a specific data source: Yahoo Finance daily bars, OKX tick data, Binance OHLCV, Kraken order book snapshots, FRED macroeconomic series, and so on. The migration strategy was to point each binary at ClickHouse instead of QuestDB, then backfill historical data for each source independently.
The ClickHouse HTTP interface with async inserts made this straightforward. Our Rust binaries use the clickhouse crate with batched inserts:
use clickhouse::Client;
use clickhouse::Row;
use serde::Serialize;
#[derive(Row, Serialize, Clone)]
struct OhlcvBar {
symbol: String,
ts: u32, // Unix timestamp
open: f32,
high: f32,
low: f32,
close: f32,
volume: f64,
source: String,
}
async fn insert_batch(
client: &Client,
bars: Vec<OhlcvBar>,
) -> Result<(), clickhouse::error::Error> {
let mut insert = client.insert("bars_1d")?;
for bar in bars {
insert.write(&bar).await?;
}
insert.end().await
}Each binary accumulates rows in memory (typically 5,000-50,000 rows per batch depending on the source), then flushes in a single insert. ClickHouse async inserts buffer incoming data and write to the MergeTree engine in the background, reducing write amplification at high ingest rates.
The Validation Gate
After the WAL incident in staging, we built a validation gate that ran after every migration batch before the next batch started. The gate checked five conditions:
- →Row count parity: ClickHouse count within 0.01% of QuestDB source count
- →Symbol coverage: same set of distinct symbols present in both systems
- →Date range coverage: min(ts) and max(ts) match for each symbol
- →Null check: no null values in price or volume columns in ClickHouse
- →Corruption check: no rows with ts outside expected range (the year-2299 corruption we also discovered separately)
The year-2299 corruption: during the migration, we discovered 1,315 rows in bars_1d and 4,349,487 rows in a separate weather_data table with timestamps set to year 2299. These were ingest bugs from much earlier in the system's life where a timestamp conversion error had assigned the wrong epoch value. The validation gate caught these because the expected max(ts) for any financial data is well before year 2100. We deleted all corrupt rows and kept QuestDB running as a cross-reference source for 72 hours to confirm no valid data was lost.
Results
- →425 million rows migrated with zero data loss (0 row count discrepancy at final validation)
- →14.19 GB total compressed storage vs estimated 70-140 GB in a row-oriented store
- →Query latency on full-universe aggregations: p95 dropped from ~1,800ms on QuestDB to ~90ms on ClickHouse
- →43 ClickHouse tables provisioned covering all Argus data domains
- →Migration duration: 3 weeks including validation gate development
- →QuestDB retired after 72-hour parallel observation period with zero discrepancies
What We Would Do Differently
Run the validation gate from day one, not as a reaction to losing staging data. The cost of building the gate upfront is a few hours of engineering. The cost of not having it is potentially days of re-ingesting data from upstream sources that have their own rate limits and availability constraints.
We would also start ClickHouse schema design earlier and resist the temptation to mirror the QuestDB schema exactly. The optimal ClickHouse schema for our access patterns is meaningfully different from a naive port of the QuestDB schema. Specifically: the LowCardinality encoding on symbol, the ReplacingMergeTree engine choice, and the (symbol, ts) ORDER BY required deliberate design rather than a direct translation.
Finally: keep the source database running longer than you think you need to. 72 hours felt excessive. In retrospect, it was the right call. We found two minor discrepancies in that window that we would have missed with a faster cutover.
Need a data pipeline built right?
We have shipped production-grade data infrastructure for financial and SaaS systems. If you are planning a migration or building from scratch, we can help you avoid the mistakes we made.
Start a Project →