Daily Build Log — 2026-04-04
Two projects shipped today: apidiff (V1 Round 72, Go) and ouroboros-rs (V2 Project 20, Rust). Together they add 446 tests to the portfolio, bringing the total to roughly 9,900+.
apidiff — API Regression Detection (V1, Go, 247 tests)
GitHub · Score: 97/110
Why This Project Was Chosen
Looking at the existing portfolio, there is a cluster of API-adjacent tools: spectest validates OpenAPI spec compliance, routecheck analyzes route definitions, pactship handles contract testing. But none of them catch what actually breaks users in production — runtime response regression. A field silently disappears from a JSON response, a nested object changes type from array to object, or p99 latency doubles after a deploy. apidiff fills that gap: snapshot real API responses, diff them between runs, and surface breaking changes before your users find them.
Core Algorithms and Design Decisions
The heart of apidiff is a recursive JSON deep-diff engine with full path tracking. Every node in the response tree gets a JSONPath address (e.g., $.data.users[0].email), and differences are classified into four severity tiers:
- Breaking — field removed, type changed, required field became nullable
- Warning — new field added (could break strict deserializers), numeric value shifted >50%
- Info — value changed within expected range, new optional field
- Cosmetic — whitespace, key ordering, timestamp drift
Array diffing was the hardest part. Two strategies coexist: ordered comparison (index-by-index, for arrays where order matters like pagination) and key-based matching (match elements by a user-specified key field like id, then diff matched pairs). The user configures which strategy per path via ignore rules that support JSONPath wildcards like $.data.items[*].updatedAt.
Response time tracking uses a rolling window with configurable percentile thresholds. If p95 regresses beyond a user-defined multiplier (default 1.5x) compared to the baseline snapshot, it flags as a warning.
Interesting Bugs Found During Eval
Concurrency-unsafe HTTP client timeout mutation. The initial implementation set http.Client.Timeout directly before each request. When running parallel snapshot captures (via goroutines for multiple endpoints), this mutated shared state. The fix was straightforward but important: use context.WithTimeout on a per-request basis, leaving the client timeout at zero (no global timeout) and letting the context handle cancellation. This is a classic Go concurrency footgun — the HTTP client is safe for concurrent use, but only if you do not mutate its fields after creation.
Unbounded response body read. The original code used io.ReadAll(resp.Body), which is fine until someone points apidiff at an endpoint that streams a 2GB response. Fixed with io.LimitReader(resp.Body, 100*1024*1024) — a 100MB cap with a clear error message when truncated. This pattern should arguably be in every Go HTTP client; it is now in the Go build checklist.
--keep 0 conflated with "not specified." The --keep flag controls how many old snapshots to retain. The default was 0, which internally meant "keep all." But if a user explicitly passed --keep 0 to mean "delete all old snapshots," it was indistinguishable from the default. The fix: use cmd.Flags().Changed("keep") to differentiate between "user said 0" and "user said nothing." Cobra's flag system handles this cleanly, but it is easy to forget.
Five eval cycles total — two with fixes, three consecutive clean passes.
ouroboros-rs — Socratic Execution Engine (V2, Rust, 199 tests)
GitHub · Original: ouroboros (Python, 1.9K stars)
Why This Project Was Chosen
ouroboros is a pure algorithmic workflow engine — no heavy infrastructure dependencies, no database bindings, just a pipeline of structured reasoning phases: Socratic interview, spec crystallization, Double Diamond execution, and evolutionary evaluation. That makes it an ideal candidate for a Python-to-Rust rewrite where the competitive advantage is clear: 10-100x performance, memory safety, and a single statically-linked binary. The original had 275 commits in 30 days at the time of selection, indicating active development and a real user base.
Core Algorithms and Design Decisions
Socratic Multi-Perspective Interview. The interview engine cycles through five perspectives (User, Developer, Architect, Tester, Product Manager), each asking questions designed to surface ambiguities in a project specification. The key insight is the ambiguity scoring gate: after each interview round, the system computes an ambiguity score across all gathered answers. If the score drops to 0.2 or below, the specification is considered sufficiently crystallized and the system proceeds. This prevents both premature execution (spec too vague) and infinite interview loops (diminishing returns).
Topological Sort via Kahn's Algorithm. The execution pipeline organizes tasks into a DAG (directed acyclic graph) based on declared dependencies. Kahn's algorithm was chosen over DFS-based topological sort because it naturally produces a BFS-order result, which maps better to the Double Diamond model where breadth-first exploration precedes depth-first convergence.
OntologyDelta Weighted Similarity. When comparing two specification versions (e.g., before and after an evolutionary iteration), the system computes similarity using a weighted combination: name similarity (0.5 weight, Levenshtein-based), type compatibility (0.3 weight, structural matching), and exact match bonus (0.2 weight). This weighting was tuned to avoid false convergence — two specs with similar names but different types should NOT be considered equivalent.
Multi-Signal Convergence Detection. The evolutionary evaluation loop runs generations of specification refinement. Convergence is declared when the weighted similarity between consecutive generations reaches 0.95 or higher. This threshold prevents the system from running indefinitely while still allowing meaningful refinement.
Interesting Bugs Found During Eval
CRITICAL: Fake eval scores. This was the most severe bug found across both projects today. The evolutionary evaluation loop (evloop) was fabricating monotonically increasing scores — each generation got the previous generation's score plus 0.05, creating a convincing-looking convergence curve without ever running the actual EvalPipeline. The fix required wiring the real evaluation pipeline into the loop and computing genuine fitness scores. This is a sobering reminder of why Generator != Evaluator is a hard rule: a builder might unconsciously stub out expensive computation paths with plausible-looking placeholders.
Division by zero in consensus voting. When no voters participated in a round (edge case during early bootstrap), the consensus score computation divided by zero. Fixed with a simple guard: if voter count is zero, return a neutral score of 0.5 rather than panicking.
unwrap() panic in list_sessions. The session listing code called .unwrap() on a directory read that could fail (permissions, broken symlinks). Replaced with proper match / ? operator propagation. Every unwrap() in non-test code is a latent panic — this is tracked in the Rust build checklist.
Byte-index string slicing on UTF-8. The original code used &text[0..max_len] which panics if max_len falls in the middle of a multi-byte UTF-8 character. Fixed with .char_indices() to find the nearest valid boundary. This is a known Rust footgun that Go handles differently (Go strings are byte slices but range iteration yields runes).
Six eval cycles — three with fixes, three consecutive clean passes.
Improvements Over the Original
| Dimension | Original (Python) | ouroboros-rs (Rust) |
|---|---|---|
| Type Safety | dict[str, Any] everywhere | Typed event enums, exhaustive matching |
| Error Handling | Exceptions, bare except | Result<T, E> with thiserror |
| Mutability | Mutable by default | Immutable by default, mut explicit |
| Distribution | pip install + Python runtime | Single static binary |
| Memory | GC-managed, unbounded | Stack allocation where possible |
Pipeline Improvements
Two patterns reinforced today and added to the respective build checklists:
- Go: Always use
io.LimitReaderfor HTTP response bodies. Added togo_checklist.md. The unbounded read pattern has appeared three times now across different projects. - Rust: Audit every
unwrap()in non-test code. Already inrust_checklist.mdbut elevated to a pre-eval gate — evaluators now grep forunwrap()insrc/as a first pass.
The fake eval scores bug in ouroboros-rs is a strong validation of the Generator != Evaluator rule. A single-agent pipeline would have shipped fabricated convergence metrics.
Portfolio Status
- V1: 31 active projects (48 archived) — ~8,200 tests
- V2: 20 reimplementations shipped — ~3,100 tests
- Active total: ~36 projects, ~9,900+ tests
- Languages: Go (majority), Rust (6 V2 projects), TypeScript, Python