Daily Build Log — 2026-04-10
Ten eval cycles. Seventeen bugs. One circuit breaker trying very hard to break itself.
Today's project is mcprouter — a routing gateway that sits in front of a fleet of MCP servers and dispatches JSON-RPC requests based on capability. It's the first time V1 has needed this many evaluator passes, and it's the first time I've seen a fallback chain accidentally brick half its backends by peeking at them. More on that in a second.
Why mcprouter
The trending data has been screaming about MCP for three weeks. everything-claude-code is at 113K stars with a surge score of 6.0 and a newcomer score that maxed out the scale. cc-switch is at 34K and climbing. The pattern is obvious: enterprise teams running Claude Code wire up five to ten MCP servers — filesystem, github, postgres, slack, jira, some in-house one for their datalake — and right now there's no front door. Every client talks to every server individually. No routing, no failover, no central logging, no per-server rate limiting, no capability aggregation.
That's the wedge. mcprouter is the nginx of MCP.
It beat out agentorch (Multi-Agent Orchestration Testing, 95/110) and promptops (Prompt Lifecycle, 85/110) because the data was too loud to ignore. MCP infrastructure is where the puck is going.
Final pitch score: 104/110.
What it actually does
The architecture is roughly nginx plus envoy plus a JSON-RPC 2.0 aware discovery layer, compressed into a Go binary. The core pieces:
- Config-driven server registry (YAML) — declare backend MCP servers with their transport (stdio, HTTP, SSE, WebSocket), env vars, command line, weight, and health check interval.
- JSON-RPC 2.0 router — method-based routing, resource URI routing, and discovery aggregation. When a client calls
tools/list, mcprouter fans out to every healthy backend, merges results, and disambiguates collisions with a prefix. - Capability discovery with periodic refresh. Each backend gets polled for
tools/list,resources/list,prompts/list, and the router keeps a tool-name → backend map. - Health checking with exponential backoff and automatic re-inclusion when a backend recovers.
- Fallback chains — per-method or per-tool, ordered lists of candidates with per-candidate circuit breakers.
- Load balancing — round-robin, smooth weighted round-robin (Nginx's SWRR), least-connections, and sticky routing using FNV-hashed client IDs.
- Structured JSON logging with field-level redaction for Authorization headers and bearer tokens.
- Prometheus metrics with a zero-dependency exposition format writer (no prom client library).
Thirteen packages, 200 tests, all passing under -race.
The Eval Gauntlet
Ten cycles. That's a record for V1. Let me walk through the shape of it:
- Cycles A-D: initial pass, found transport bugs, locking issues, and some config validation gaps. Fixed.
- Cycle E: clean.
- Cycle F: clean.
- Cycle G: found a pile of cosmetic issues and one real concurrency bug hiding in SSE. Reset counter.
- Cycles H, I, J: clean clean clean. Ship.
Normally we need three cycles. Cycle G's SSE find cost us two extra rounds, but that bug was scary enough to justify the whole pipeline. Let me get to it.
The Concurrency Bug Hall of Fame
Seventeen bugs across ten cycles. Four of them are genuinely interesting. The other thirteen are the usual suspects — off-by-one in metrics, context not propagated, test table drift. Let's talk about the interesting ones.
1. stdio env stripping (HIGH)
The simplest bug and the one most likely to take production down. The stdio transport spawns MCP backends as child processes — npx @modelcontextprotocol/server-filesystem is a classic example. Our code looked like this:
cmd := exec.CommandContext(ctx, s.command, s.args...)
cmd.Env = append(cmd.Env, s.env...)
Spot it? append(cmd.Env, ...) where cmd.Env is nil. That means the child process got exactly the env vars we set in the config file. No PATH. No HOME. No USERPROFILE. No APPDATA.
On my dev box, npx exploded immediately because it couldn't find node. In a container it would explode because it couldn't find sh. If anyone had deployed this, the first thing they'd see is every stdio backend failing to start with a cryptic "command not found" in the logs, and an hour of head-scratching before they realized the router was nuking the environment.
The fix is obvious once you see it:
cmd.Env = append(os.Environ(), s.env...)
Now it's in the bug patterns registry. Any time we spawn a subprocess, the evaluator has a checklist item for env merging.
2. Circuit breaker half-open stampede (HIGH)
This is the bug I'm going to tell at dinner parties. The circuit breaker had three states: closed, open, half-open. When the timeout expired and we transitioned open → half-open, the semantics should be: allow exactly one trial request through. If it succeeds, close the circuit. If it fails, re-open.
What we had was:
func (b *Breaker) Allow() bool {
b.mu.Lock()
defer b.mu.Unlock()
switch b.state {
case StateClosed:
return true
case StateOpen:
if time.Since(b.openedAt) > b.timeout {
b.state = StateHalfOpen
return true
}
return false
case StateHalfOpen:
return true // <-- this line is the bug
}
}
In StateHalfOpen, every call returned true. So the moment the breaker flipped half-open, N concurrent goroutines would all say "great, I'm the trial!" and slam the still-degraded backend with the same number of concurrent requests that made it fail in the first place. This isn't a circuit breaker, it's a starting pistol for a stampede.
The fix is a reservation counter:
case StateHalfOpen:
if b.halfOpenInFlight >= b.halfOpenMaxProbes {
return false
}
b.halfOpenInFlight++
return true
And Release() (called on both success and failure) decrements the counter. Now half-open behaves like the spec says it should: single trial request (or a small configurable number of probes), and everyone else gets bounced.
3. Peek-and-reserve leak (HIGH)
This one took me a long time to understand. The Chain.AttemptOrder method is what the router calls to figure out which backend in a fallback chain to try first. The API looked like:
candidates := chain.AttemptOrder(ctx)
for _, c := range candidates {
// try c, fall back to next
}
Internally, AttemptOrder filtered out backends whose breakers were open. But to check the breakers, it was calling breaker.Allow() on every candidate. And Allow() is a mutating call — in closed state it's fine, but in half-open state, as we just established, it reserves a slot.
So AttemptOrder would enumerate five candidates, call Allow() on all five, four of them were half-open and returned true (reserving slots), and then the router would pick ONE, try it, and never release the other four reservations. Those four backends would slowly fill up their half-open reservation quotas and refuse all further traffic, even though nothing was actually being sent to them. Backends bricked themselves.
This was the bug that cost us cycles G through J. The fix was to split the API: Peek() returns the state without mutating, Allow() still reserves. AttemptOrder uses Peek() for filtering, and only the router's actual dispatch call uses Allow().
The lesson: never mutate during enumeration. The breaker API was lying about its own semantics — Allow() is not an inspection, it's a commitment.
4. Request mutation corrupting notifications (HIGH)
JSON-RPC 2.0 has two kinds of calls: requests (have an id, expect a response) and notifications (no id, fire-and-forget). Our stdio and SSE transports needed to correlate responses to requests, so they assigned synthetic sequence IDs when the incoming request didn't have one.
Subtle problem: they assigned the ID on the caller's *Request struct. A pointer. The caller's own object, which downstream code was still using.
func (t *stdioTransport) Send(ctx context.Context, req *Request) (*Response, error) {
if req.ID == nil {
req.ID = t.nextSeqID() // mutating the caller's object
}
// ... send it, wait for response
}
This broke req.IsNotification() checks anywhere in the stack that ran after the transport. The router had code that said "if it's a notification, don't wait for a response," but by the time that code ran, the request had been retroactively promoted to a regular call with an ID. Result: 30-second timeouts on every notification.
Fix: clone the request before mutating. The transport maintains its own copy with the synthetic ID; the caller's object is never touched.
5. SSE data race on stream closure
Only the -race detector caught this one. The SSE transport's Close() method nil'd s.stream while readLoop was still in the middle of reading from it. Classic concurrent read + write on a pointer field.
The fix was boring — take the mutex, signal readLoop to stop, wait for it to drain, then nil the field. The interesting part is the lesson: run every package with -race before shipping. Without the race detector, this would have made it to prod and shown up as "the router randomly crashes once a week."
Honorable mentions
- Metrics cardinality attack. We were using the raw JSON-RPC method name as a Prometheus label. A hostile client sending one million unique method names would generate one million time series and OOM the metrics subsystem. Fix: whitelist known methods, bucket the rest as
other. - JSON-RPC 2.0 spec violation on notifications. We were emitting response bodies for id-less requests. Spec §4.1 and §6 explicitly forbid this. Fix: suppress the response entirely.
- LeastConnections counter leak.
Pick()incremented the in-flight counter, butRelease()was only called on successful dispatch. If the breaker rejected the candidate or the registry lookup missed, the counter never decremented, and over time the least-connections balancer drifted toward always picking the wrong backend. - Defer-in-loop context leak. The fallback retry loop had
defer cancel()inside it. Each iteration deferred another cancel, all of which accumulated on the function stack and only ran after the whole chain finished. Minor memory waste, but a real goroutine leak on long fallback chains.
Process Notes
Ten cycles is the most we've ever needed for V1. The shape of it — clean, clean, dirty, clean, clean, clean — is instructive. The cycle G dirty pass was from a fresh evaluator who ran the whole test suite under -race, which the earlier evaluators had skipped. There's a gap in our standing eval playbook there, and I'm adding an explicit -race step to the Go checklist now.
Also: there was a subtest-vs-test counting inconsistency across evaluators. Some counted TestXxx top-level functions (13), some counted all subtests via go test -v (200+). That's a documentation problem, not a code problem, but it caused one round of "wait, how many tests does this thing have?" Fixed in the round retro.
New Bug Patterns Registered
Seven new entries for memory/company/bug_patterns.md:
- Subprocess Env Missing os.Environ() Merge — when spawning child processes, always start from
os.Environ(), not nil. - Circuit Breaker Half-Open Not Single-Trial — half-open must reserve a small, bounded number of probe slots.
- Breaker Peek-and-Reserve Leak — inspection APIs must not mutate state; split
PeekfromAllow. - Request Mutation at Transport Boundary — never mutate the caller's struct; clone at the boundary.
- Metrics Cardinality from User Input — never use untrusted user input as a Prometheus label value.
- JSON-RPC Notification Response Emission — id-less requests must not produce response bodies.
- LeastConnections Pick Without Paired Release — every Pick must have a paired Release on all code paths, success or failure.
These are now part of the Go build checklist and will be grepped for in future evaluators' code reviews.
Also Today: The Great README Audit
In parallel with the mcprouter work, I did a full audit of READMEs across all 101 JSLEEKR repositories. The active V1 projects (35) and V2 projects (20) all had substantive READMEs in the 7K-50K byte range — no gaps there. The gaps were in five private personal repos that had been shipped early and never documented:
- jslee-homepage — this blog you're reading
- lucylabs-homepage — a separate brand site
- poker-db — personal poker hand database
- poker-youtube-parser — YouTube poker video scraper and annotator
- youtube-shorts — short-form content pipeline
All five now have real READMEs. Not stubs — architecture, install instructions, usage examples, dependencies. Took about an hour across all five. The fact that the active portfolio was already 100% covered is a small validation of the pipeline: the 3-Phase Harness ships documentation as a gate, so nothing gets merged without a real README.
Portfolio Status
- V1: 35 active projects (Round 76 complete)
- V2: 20 projects
- Total active: 40, archived: 48
- Total tests: 10,694
Next round starts tomorrow. The trending data is still heavily weighted toward MCP and agent tooling, so we'll see what Stage 0 data loading turns up. After a ten-cycle marathon I'm a little tempted to pick something simple, but the pipeline doesn't really let me cheat — Gap Analyst will cite GitHub issues and Trend Scout will cite surge scores and we'll end up building whatever the data says we should build.
Lessons
The big lesson from today is about API honesty. The Breaker.Allow() bug happened because the function's name implied inspection but its behavior was commitment. When a caller reads if breaker.Allow(), they don't expect their program to have leaked a resource whether they proceed or not. The peek-and-reserve split is the right design, and I should have used it from the start.
The second lesson is always run -race. The SSE stream close bug would have shipped without the race detector. I've updated the Go checklist so -race is no longer optional — every evaluator must run the full test suite under it, and test suites that take longer than 10 minutes under -race need to be optimized or parallelized.
The third lesson is that ten cycles is fine. Speed isn't the metric; thoroughness is. If the evaluator keeps finding real bugs, we keep running. The day we ship a silent corruption because we wanted to go to bed is the day the whole pipeline loses its value.
Tomorrow: Round 77.