Kafka evaluated for the signal transport layer. Redis Streams chosen for Avo's own pipeline. Avo builds Kafka integrations for clients who already run it.
HOW WE USE IT
Apache Kafka is the dominant event streaming platform in the enterprise. It handles durable, ordered, replayable event logs across multiple consumer groups with strong delivery guarantees. Any client running significant event volume (10,000+ events per second sustained), operating on AWS MSK or Confluent Cloud, or needing cross-datacenter replication is a Kafka shop. Avo knows the platform well enough to build on top of it and to make the honest call on when it is the right choice versus cheaper alternatives.
For Avo's own signal transport pipeline, we evaluated Kafka and chose Redis Streams instead. The reason is operational surface area. Kafka requires a broker cluster (minimum 3 nodes for replication), ZooKeeper or KRaft for coordination, and ongoing tuning of partition count, replication factor, and consumer group lag. For a single-server pipeline running under 1,000 events per minute, that is engineering overhead with no throughput benefit. Redis Streams gives us at-least-once delivery, consumer groups, offset tracking, and replay from a service that is already running for caching and rate limiting. We run zero additional infrastructure for it.
For client work, Avo builds Kafka integrations when the client already operates a Kafka cluster or when their event volume makes Redis Streams the wrong choice. The most common scenario is a client whose backend team already runs Kafka for application events (user signups, order placements, clickstream), and they want to wire a new service into that existing bus.
Example workflow: connecting a new analytics microservice to a client's existing Kafka cluster. 1. Add the kafka-rust crate (or rdkafka via librdkafka bindings) to the service Cargo.toml. 2. Configure the consumer with group.id set to the new service name, auto.offset.reset = earliest for initial replay of historical events, and enable.auto.commit = false for explicit offset control. 3. Consume from the topic in a Tokio task. On each batch, process the events and write results to ClickHouse. 4. Only commit offsets after the ClickHouse insert acknowledges. If the insert fails, the next restart reprocesses from the last committed offset. 5. Store the consumer group offset in Kafka (not externally) to avoid the dual-write consistency problem. 6. Monitor consumer lag via the kafka-lag-exporter sidecar. Alert if lag exceeds 5 minutes of throughput equivalent.
Tradeoffs to think through honestly. Kafka's exactly-once semantics require idempotent producers and transactional consumers. Implementing this correctly adds 2 to 3 weeks of engineering. If the client can tolerate at-least-once processing (most analytical pipelines can, with deduplication at write time), skip the transaction overhead entirely. Schema evolution is the other consistent pain point: without a schema registry enforcing compatibility, a producer adding a required field breaks all consumers immediately. Avo defaults to optional fields and envelope versioning to avoid this. The operational cost of a Kafka cluster (patching, broker restarts, partition rebalancing) is real and should be priced into any engagement that requires Avo to own the infrastructure.
Production numbers
>10K/sec
Min event rate to justify Kafka
Redis Streams
Avo internal transport
Existing Kafka clusters
Client-side use case
After write, not before
Offset commit pattern
Argus generated regime signals, novelty anomalies, and trend-following calls from a 1,400-feature engine built on AVX2 SIMD.
3.2x Macro endpoint (1,476ms → 465ms warm)
Read case study →
DataWe built a 723M-row market data pipeline ingesting 10 exchanges simultaneously at under 50ms tick-to-storage latency.
723M+ Total rows stored
Read case study →
Start a project
Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a project →