We debugged 65 compounding bugs across seven subsystems of a live trading engine, fixed a score overflow that silently blocked all dark_matter_rs signals, and cut Redis memory from 11.8GB to 7.15GB.
65
Bugs fixed in one session
22%
Signal filter pass rate (was ~0%)
-4.65GB
Redis memory (11.8GB → 7.15GB)
100%
IB Gateway uptime (7-day)
CHAPTER 01
Apex was a multi-strategy algorithmic trading engine that aggregated signals from six research accounts and routed the best 100 to 300 trades per day through a single production account backed by IB Gateway. The system generated signals from ten sources. The initial state after the April 2026 build was infrastructure-grade A, trading-grade F. All 150 services ran without errors. But the paper account was down 12%, and win rate across 11,895 trades stood at 10.2%. Only 1.1% of trades hit their take-profit target. All exits were time-based stop-outs.
The problem was not infrastructure. The problem was a cascade of 65+ bugs across the signal pipeline, position manager, learning subsystems, and regime gate that individually appeared minor but compounded into a system that consistently entered on noise and exited before signal.
CHAPTER 02
The April 2026 debugging session fixed 65 bugs across seven subsystems. Three categories produced the most P&L damage.
Score overflow in dark_matter_rs: the Rust scoring kernel accumulated float64 values without clamping. Under specific market conditions, the accumulator overflowed the intended [0.0, 1.0] range and produced scores in the millions. The downstream signal filter treated any score outside [0.0, 1.0] as invalid and discarded all dark_matter_rs output. The fix was two lines: clamp(0.0, 1.0) before writing to Redis, plus a direction-field correction that had been mapping bullish signals to the wrong side.
Position key mismatch between allocator and position manager: the Master Allocator wrote order intents to Redis using keys formatted as apex:orders:{strategy}:{symbol}. The position manager read position state using keys formatted as apex:positions:{symbol}:{strategy}. When a fill confirmation arrived, the position manager could not find the originating order. Every trade ran to the time-based stop rather than the profit target. The fix standardized all keys to apex:fund:{type}:{symbol}.
Counterfactual simulator stuck for 2 days: the process was blocking on a SCAN_BATCH Redis operation iterating over 500,000+ keys per hour from a logging bug. The SCAN_BATCH completed in 45 minutes per cycle rather than 4 seconds. The fix redesigned SCAN_BATCH to use a bounded ring buffer capping stored counterfactuals at 10,000 entries.
ARCHITECTURE OVERVIEW
PRESENTATION
Python 3.12
API LAYER
Rust (dark_matter_rs, certifier_rs)
auth + rate limit + versioning
SERVICES
ib_insync 0.9.86
DATABASE
IB Gateway 10.37 + IBC
QUEUE
Redis 7
CHAPTER 03
The Argus-to-Apex signal interface used Redis Streams with consumer groups for at-least-once delivery. Each signal payload carried: signal_id (UUID), symbol, direction, confidence, regime classification, source binary name, feature vector hash, and a suggested Kelly fraction for position sizing.
The Apex consumer group had three members: the Master Allocator, the risk gate, and the signal logger. The Master Allocator acknowledged signals only after the IBKR order was submitted, ensuring that a crash between signal receipt and order submission would result in redelivery rather than signal loss.
The working IBKR path used IB Gateway 10.37 (not Client Portal Gateway). Client Portal had a known no-bridge issue on paper accounts that caused competing session errors and 5-minute timeout loops. IB Gateway over a single TCP socket on port 4002 had none of these failure modes. IBC auto-login handled the mandatory 23:55 UTC daily restart cycle by re-authenticating automatically.
TECH STACK
CHAPTER 04
65 bugs fixed in a single audit session. Signal filter pass rate corrected from approximately 0% (score overflow blocking all dark_matter_rs output) to 22% of candidate signals. Infrastructure load reduced from CPU load 32 to 19 by fixing the SCAN_BATCH key explosion. Redis memory reduced from 11.8GB to 7.15GB by purging orphaned counterfactual keys. IB Gateway uptime: 100% over the 7-day observation window. Zero no-bridge errors after switching from Client Portal.
65
Bugs fixed in one session
22%
Signal filter pass rate (was ~0%)
-4.65GB
Redis memory (11.8GB → 7.15GB)
100%
IB Gateway uptime (7-day)
CHAPTER 05
DECISION · 01
Two connection libraries for the same broker is two codebases to maintain. The original IBKR setup attempted to use Client Portal Gateway alongside IB Gateway for different use cases. Client Portal had a no-bridge failure mode that appeared intermittently and consumed approximately 8 hours of debugging before being abandoned. When a library has a known failure mode with a working alternative, switch immediately.
DECISION · 02
Score ranges must be enforced at the producer, not the consumer. The dark_matter_rs overflow bug could have been caught at the consumer by logging and discarding out-of-range scores. It was not, because the consumer assumed the producer respected the contract. The fix added range clamping at the producer and a WARN log at the consumer for any future out-of-range values.
DECISION · 03
ONE account, ONE track record, ONE story. The earlier two-tier architecture created two P&L curves with different histories. Consolidating to master_fund eliminated the ambiguity and made performance attribution unambiguous.
START A PROJECT
We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a ProjectWe launched a multi-tenant market intelligence SaaS serving computed signals from 425M rows, with all API routes under 500ms cold and unit economics positive from customer one.
425M+ ClickHouse rows at launch
Read case study →
PlatformsWe built a retail investor dashboard serving live fund performance from a paper trading account, with compliance banners enforced as server-side dependencies and JavaScript bundle under 120KB.
7 Pages built and deployed
Read case study →
PlatformsWe built a 90-inbox Google Workspace cold email system using Maildoso + Smartlead warmup, capable of 3,600 sends per day at 92 to 95% inbox placement for $369/month.
90 GWS inboxes
Read case study →