When we started building Avo, we needed a database that could handle serious time-series workloads: tens of millions of OHLCV bars arriving daily across 56,000+ symbols, continuous querying from signal pipelines, and sub-second response times for a live web platform. The wrong choice here would mean either paying for vastly over-provisioned hardware or watching query times creep into the seconds as data grew.
We started with QuestDB. We evaluated TimescaleDB seriously. We ended up on ClickHouse. Here is the full story . with real numbers from our deployment.
Starting with QuestDB
QuestDB was the first database we ran in production. It has excellent write throughput, a clean SQL dialect, and good documentation for time-series use cases. For the first few months, it worked well. We were ingesting millions of rows per day across our initial universe of symbols.
The problems surfaced as data volume scaled. QuestDB's WAL (write-ahead log) tables introduced a critical operational gotcha: executing DROP TABLE on an active WAL table silently loses partition data. We discovered this the hard way during a migration attempt, losing a batch of partitions before isolating the root cause. The fix required FORCE DROP PARTITION instead . a behavior that is not prominently documented.
More structurally, QuestDB's compression was adequate but not exceptional, and complex analytical queries . multi-join aggregations across the full symbol universe . became noticeably slow as we crossed 200 million rows. We needed something built specifically for analytical query performance at scale.
Why We Evaluated TimescaleDB
TimescaleDB was the obvious next candidate. It sits on top of PostgreSQL, which means the operational tooling is mature, the SQL dialect is familiar, and ecosystem support is broad. For a team that already knows Postgres, the learning curve is minimal.
The core mechanism is hypertables . automatic time-based partitioning of rows across chunks, with compression applied to older chunks. This works well for standard time-series retrieval: give me the last 30 days of bars for ticker X, sorted by timestamp.
Where TimescaleDB showed its limits for our specific use case was in analytical aggregations that span the full symbol universe. Queries like “find all symbols with volume spike ratio above 3.0 in the past 4 hours, grouped by sector, with rolling 20-day average for context” require scanning and aggregating across enormous row counts. TimescaleDB is row-oriented at its core . it inherits PostgreSQL's MVCC architecture, which is optimized for transactional workloads, not analytical ones.
In our benchmarks on a dataset of 500 million rows, the analytical queries we needed to run continuously were 3-5x slower on TimescaleDB than on ClickHouse. At that scale of slowdown, real-time signal detection becomes batch detection . and that is not the product we are building.
Get weekly intelligence delivered to your inbox
Curated signals, regime shifts, and anomaly highlights from Avo Intelligence. Every Monday. Free.
Why ClickHouse Won
ClickHouse is a column-oriented analytical database. Instead of storing rows contiguously (as PostgreSQL and most traditional databases do), it stores each column as a separate on-disk structure. For analytical queries that touch only a few columns out of a wide schema . which describes most of what Avo does . this means dramatically less I/O, dramatically better CPU cache utilization, and dramatically better compression.
The numbers from our production deployment:
- →2.5 billion rows in bars_1d and bars_1m tables
- →14.19 GB total compressed storage . compared to an estimated 70-140 GB uncompressed row-store equivalent
- →Sub-100ms query times on full-universe aggregations (p95 across our API endpoints)
- →LZ4HC compression codec delivering 5-10x compression ratio on OHLCV data
- →MergeTree engine with custom sort keys (symbol, ts) enabling efficient range scans
The compression efficiency is the most practically impactful number. OHLCV data has natural structure that column-oriented compression exploits aggressively: open prices are correlated with previous close prices, timestamps increment monotonically, volumes follow recognizable distributions. LZ4HC applied to columnar data compresses far more effectively than the same algorithm applied to row-interleaved data.
MergeTree vs Hypertables
The MergeTree table engine is the core of ClickHouse storage. Data is written in sorted parts and merged in the background according to the table's primary sort key. For our bars tables, the sort key is (symbol, ts) . enabling efficient lookups by symbol, efficient time-range scans, and efficient full-universe aggregations in either order.
TimescaleDB hypertables partition exclusively on time. This is sensible for many time-series workloads but creates a mismatch for analytical queries where the primary access pattern is by symbol, not by time range. Fetching all bars for a specific symbol over a long time horizon requires scanning across many time partitions in TimescaleDB. In ClickHouse with a (symbol, ts) sort key, that query becomes a single contiguous column scan.
ClickHouse also supports ReplacingMergeTree and AggregatingMergeTree engine variants for deduplication and incremental aggregation . features we use in several of our derived tables. TimescaleDB has materialized views for aggregation but they are less flexible and more expensive to maintain at our update frequency.
The Honest Trade-offs
ClickHouse is harder to operate than TimescaleDB. Specifically:
- →No built-in replication on a single node. High availability requires explicit shard/replica configuration, which adds operational complexity.
- →UPDATE and DELETE operations are expensive by design. ClickHouse is not suited for transactional workloads requiring frequent row-level mutations.
- →The SQL dialect diverges from standard SQL in several places. Some queries require ClickHouse-specific syntax that is not portable.
- →Debugging slow queries requires familiarity with the MergeTree internals . concepts like granules, marks files, and part merges are essential operational knowledge.
For Avo's use case . append-heavy time-series data, analytical query patterns, no requirement for row-level mutation of historical bars . these trade-offs are acceptable. TimescaleDB's operational simplicity comes at a query performance cost that would have required significantly more hardware to compensate for.
Cost at Scale
This is where the compression advantage compounds into real cost savings. At 14.19 GB of compressed storage for 2.5 billion rows, our storage cost on a single Hetzner dedicated server is effectively negligible . the compute cost of running the analytics engine far exceeds storage.
A comparable row-oriented database storing the same data uncompressed would require 70-140 GB, roughly a 5-10x difference. On managed database services (AWS RDS, Google Cloud SQL), that difference translates directly to storage billing. On self-hosted hardware, it is the difference between a 500 GB SSD and a 4 TB SSD . a meaningful cost difference at scale.
We also configured TTL (time-to-live) on our highest-volume table: bars_1m has a 2-year TTL. ClickHouse enforces TTLs automatically during background merges, keeping storage bounded without manual partition management.
The Verdict
For pure time-series retrieval workloads . IoT sensor data, application metrics, log aggregation . TimescaleDB is an excellent choice. The operational simplicity of building on PostgreSQL, the familiar SQL dialect, and solid time-based partitioning make it a strong default for teams already in the Postgres ecosystem.
For analytical workloads at scale . full-universe aggregations, cross-asset signal detection, low-latency queries across billions of rows . ClickHouse is in a different class. The columnar storage model, MergeTree engine flexibility, and LZ4HC compression compound into performance characteristics that row-oriented databases simply cannot match without substantially more hardware.
We made the harder operational choice. It was the right one.
Built on real infrastructure
Every Avo signal is backed by 2.5 billion data points, compressed and queryable in under 100ms. See the intelligence layer in action.
View live signals →