We audited 168 running services consuming 33GB of RAM, culled the dead weight, and reduced the Argus footprint to 25 production services using 12GB.
168
Services before audit (33GB RAM)
~25
Services after cleanup (12GB RAM)
21GB
RAM reclaimed
30 sec
QuestDB restart time (was 8 min)
CHAPTER 01
By late April 2026 the Argus server was running 168 active systemd services consuming 33GB of RAM, with 2.7GB of swap in use on a 125GB host. Of those 168 services, a meaningful fraction produced measurable output. The rest either monitored nothing, voted on nothing, or ran AI models that never ran inference.
The breakdown was not the result of negligence. It was the natural output of a build-first, audit-later approach: each feature prototype shipped as a new service with its own systemd unit, and the cumulative weight went unexamined until the swap pressure made the problem visible. An audit found five distinct categories of waste: 10+ Ollama model instances totaling 31GB on disk of which only bge-m3 was used for embedding; duplicate processes including two instances of apex.py; sentience-layer services that read health metrics for services that did not produce any signals worth measuring; a historical replay service generating 748K replay trades per week that fed a brain database but not the live execution path; and signal pipeline services that reported outputs to Redis streams with 0 entries.
CHAPTER 02
Argus services were designed from the start as systemd units, not Docker containers. The decision was deliberate: the Hetzner AX101 runs a single-tenant workload, container isolation provides no meaningful security boundary that systemd's User= and PrivateTmp= directives do not provide, and Docker's overhead adds roughly 2GB to 4GB of baseline memory.
Each Argus binary gets a dedicated .service file. The 31 units shipped cover the full pipeline: 10 exchange ingest daemons, the feature engine, regime classifier, correlation engine, novelty detector, early warning system, alert dispatcher, API server, WebSocket server, health monitor, archive daemon, meta-learner, and ML scorer.
The restart policy for production services uses Restart=on-failure with RestartSec=5. Services in the ingest tier additionally set StartLimitIntervalSec=60 and StartLimitBurst=3: if a service crashes more than 3 times in 60 seconds, systemd stops restarting it and marks it failed, triggering a persistent alert rather than a crash loop.
ARCHITECTURE OVERVIEW
INGRESS
systemd 252
Rust 1.84 CLUSTER
pod-1
pod-2
pod-3
pod-4
pod-5
pod-6
STORAGE
Tokio 1.40
OBSERVABILITY
ClickHouse 26.3.9.8
CHAPTER 03
The audit used a structured approach: for each running service, query its output streams in Redis or ClickHouse and check row counts. Services that wrote to streams with 0 entries were candidates for shutdown unless they had a clear explanation.
The Ollama situation required careful inventory. Only bge-m3 appeared in active code paths for embedding similarity. The remaining models had no callers in production code and were removed with ollama rm, reclaiming approximately 30GB of disk.
A timestamp validation gate was added to argus-common/src/questdb.rs to prevent the class of bad-timestamp garbage that caused the QuestDB partition explosion. All 30+ downloader crates were patched to call this validator before writing any row. The function runs in under 1 nanosecond and eliminates the class of bad-timestamp garbage entirely at the source.
TECH STACK
CHAPTER 04
Before the service audit: 168 running services, 33GB RAM consumed by services, 2.7GB swap. After removing dead-weight services and Ollama models: approximately 25 production services running, RAM footprint reduced to approximately 12GB for Argus services, swap usage zeroed. The 21GB reclaimed came from Ollama model memory (10 to 12GB), duplicate processes (2GB), and sentience/monitoring services with no data to monitor (6 to 8GB).
The QuestDB garbage partition removal cut restart time from approximately 8 minutes to under 30 seconds. CPU load on restart dropped from load average 38 to 3.
168
Services before audit (33GB RAM)
~25
Services after cleanup (12GB RAM)
21GB
RAM reclaimed
30 sec
QuestDB restart time (was 8 min)
CHAPTER 05
DECISION · 01
The architecture pattern that worked: services in the ingest tier are fully independent of each other. An OKX ingest daemon crashing does not affect the Binance daemon. This isolation made it safe to restart, upgrade, or remove individual exchange connectors without coordinating a maintenance window.
DECISION · 02
The pattern that created the most waste: services that depended on other services being fully operational before they could do useful work, but had Restart=always and no output validation. They ran continuously consuming memory and CPU, and their health checks reported OK because the binary was alive even if the business logic never executed.
DECISION · 03
Rows written per hour is a better health metric than service is running. All Argus production services now write their cycle output count to a health Redis key with a TTL. A health fanout script reads all health keys and alerts if any service has written 0 rows across 3 consecutive cycles.
START A PROJECT
We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a ProjectWe discovered 209,033 regime keys with no TTL and fixed them in a single SCAN pass, then cut the regime endpoint latency 13x by eliminating per-request key scans.
209,033 Keys without TTL (found)
Read case study →
InfrastructureWe built a 63-line Node.js proxy that gives Vercel serverless functions read-only access to a private ClickHouse instance with zero database exposure.
12ms Proxy overhead (end-to-end)
Read case study →
InfrastructureWe added a lock-free AtomicUsize round-robin proxy pool to argus-common, giving all 23 downloader binaries IP rotation without duplication or mutex contention.
180/min Download throughput (proxy)
Read case study →