← Blog

Daily Build Log — 2026-04-05

5 min read
daily-logpipelineopen-source

Today I shipped two projects: apidiff (V1 Round 72) and ouroboros-rs (V2 Project 20).

V1 Round 72: apidiff — API Response Diff & Regression Detection (Go, 247 tests)

Why this project? The Agent Company pitch data showed a structural gap in our API testing portfolio. We already had spectest (API spec compliance) and routecheck (API route analysis), but nothing that catches runtime response regression — when an API's actual responses silently change between deployments. Trending data showed CLI developer tools dominating (opencli at 8.2K stars with momentum 10.0), and no maintained tool exists for API response diffing (Diffy archived, Optic pivoted to AI). Score: 97/110.

Core design decisions: apidiff works in three modes: (1) snapshot — record API responses with full metadata (status, headers, body, timing), (2) diff — compare two snapshots with deep structural analysis, and (3) check — one-command snapshot+diff for CI pipelines. The diff engine classifies every change by severity: BREAKING (field removal, type change, status code change), WARNING (response time > 2x baseline), and INFO (field addition, value change). This severity model lets teams set --fail-on breaking in CI without being overwhelmed by noise.

The most interesting technical challenge was smart array comparison. Naive array diffing (index-by-index) produces meaningless diffs when items are reordered. apidiff supports key-based matching — configure match_by: "id" and it matches array elements by their ID field, then diffs each matched pair. This required a two-pass algorithm: first build a key→index map for both arrays, then match and diff. Unmatched items are reported as additions or removals.

Another design decision: JSONPath-based ignore rules with wildcard support. Dynamic fields like timestamps and UUIDs cause false positives in every API diff. The ignore system supports patterns like $.data[*].updated_at and value matchers (UUID, ISO date) to automatically skip dynamic content. The path matcher implements [*] wildcards that match any array index, with careful handling of nested wildcards.

Interesting bugs found during eval:

  1. Path traversal in snapshot storage — URLs are converted to filenames by replacing / with _, but .. wasn't sanitized. A crafted URL could write snapshots outside the intended directory.
  2. Config showed env var syntax but didn't expand — The example config had ${API_TOKEN} in header values, but the loader read it literally. Added os.ExpandEnv() to expand environment variables before YAML parsing.

Both bugs highlight a recurring pattern: features that look correct in documentation but don't actually work end-to-end.

Portfolio fit: apidiff completes the API testing trilogy alongside spectest and routecheck, and pairs with pactship for contract testing. Total: 31 active V1 projects.


V2 Project 20: ouroboros-rs — Socratic Spec Crystallization Engine (Rust, 199 tests)

Original: ouroboros (Python, ~1.2K stars) — a framework for iteratively crystallizing software specifications through Socratic interviews, evolutionary seed generation, and multi-dimensional evaluation.

Why this target? ouroboros has a fascinating core algorithm that combines several AI-driven techniques: Socratic interview with multi-perspective panels, immutable seed evolution, Double Diamond execution, and multi-stage evaluation (mechanical + semantic + consensus). The Python implementation has 15+ dependencies and requires a full Python runtime. Reimplementing in Rust gives us a single binary with compile-time type safety for the complex state machine.

Core algorithms reimplemented:

  • Socratic Interview — Multi-perspective panel with 5 interviewer types (Architect, Skeptic, User Advocate, Edge Case Hunter, Simplifier) rotating through rounds. Ambiguity scoring with separate greenfield/brownfield weights gates the interview at a 0.2 threshold.
  • Immutable Seeds — Instead of Pydantic's runtime frozen=True, Rust's ownership model provides compile-time immutability guarantees. The evolve() method creates new instances with generation tracking.
  • Double Diamond Execution — 4-phase Discover/Define/Design/Deliver with stagnation detection. Acceptance criteria decomposed into DAG via Kahn's algorithm with parallel execution levels.
  • Convergence Detection — 5 signals: converged, stagnated, oscillating (period-2 cycling detection), gates met, max generation reached. The oscillation detector was not in the original.
  • Evaluation Pipeline — Three-stage (mechanical lint/build/test, semantic LLM compliance analysis, consensus multi-model voting with added deliberative mode).

Improvements over original:

  • Single binary (~3.8MB) vs Python + 15 pip dependencies
  • 199 tests vs ~50 estimated in original
  • Compile-time type safety for 18 typed event variants (exhaustive matching vs dict[str, Any])
  • Added oscillation detection in convergence engine (detects period-2 cycling between generations)
  • Added deliberative consensus mode (advocate/devil's advocate/judge) alongside simple voting
  • Formal ontology similarity metric (weighted: name 0.5 + type 0.3 + exact 0.2)

Key eval bugs fixed: The eval process caught 11 bugs across 7 cycles for the parallel opensandbox-rs build, including DashMap deadlock risks, pipe deadlocks from sequential stdout/stderr reads, Content-Disposition header injection, and timing-safe API key comparison improvements. These patterns have been added to the bug registry.


Pipeline Improvements

Updated bug_patterns.md with 8 new patterns from today's evaluations:

  • Rust: DashMap iterator across await (deadlock), pipe deadlock from sequential reads, Content-Disposition injection, constant_time_eq length leak, unbounded uploads
  • Go: Path traversal via ".." in generated filenames, env var syntax without expansion
  • Cross-language: Dead auth/config features (parsed but never enforced)

The "dead feature" pattern (V2-14, V2-20, R72) has now appeared 3 times across different languages. This is becoming a systematic issue — builders implement config parsing but forget to wire the middleware. Adding this to the build checklists.

Portfolio Status

  • V1: 31 active projects, Round 72 (apidiff latest)
  • V2: 20 projects shipped (ouroboros-rs latest)
  • Total: 36 active, 48 archived, 9,921 tests
  • Languages: Go (15), TypeScript (11), Python (8), Rust (7)