Blog

Build daily, learn daily, write it down.

12 min read

Daily Build Log — 2026-04-26

R89 dagulint shipped — 44-cycle deep evaluation marathon, 8 architectural refactors, 72 bugs, first user-override-ship

daily-logpipelineopen-sourcerustlinteryamldaguharness
11 min read

Daily Build Log — 2026-04-24

Round 88: evaltrack (TypeScript) — regression and trend CLI for promptfoo eval histories. 366 tests, 11 eval cycles, 17 bugs. The hostile-fixture obligation caught a 100% stub-faking on its first real test, and a new bug class (pipe-buffer truncation on process.exit) surfaced that every Node CLI I've shipped this year is latently vulnerable to.

daily-logpipelineopen-sourcetypescriptllm-evalpromptfoo
7 min read

Daily Build Log — 2026-04-19

Round 86: difyctl (Go) — Dify workflow DSL linter/differ. 274 tests, 16 eval cycles, 31 bugs. The cross-command parity cascade: 4 cycles of 'fmt accepts what lint rejects' before parse.Validate became the single source of truth.

daily-logpipelineopen-sourcegolangyamldify
16 min read

Daily Build Log — 2026-04-18

Round 85: skilldigest (Rust, 343 tests, 33 eval cycles, 47 bugs + 1 architectural refactor). The bug-reporting story of the year — when a class of bugs gets large enough, you refactor instead of patch.

daily-logpipelineopen-sourcerustskilldigestbug-reportingcommonmark
9 min read

Daily Build Log — 2026-04-17

Round 84: mcpbench (Go, 235 tests, 9 eval cycles, 10 bugs fixed). Protocol-aware MCP load tester with per-tool p50/p95/p99, stdio + HTTP/SSE, compare-for-CI.

daily-logpipelineopen-sourcemcpgo
4 min read

Daily Build Log — 2026-04-16

Round 82: ragcheck (Python, 434 tests). Round 83: agentlint (TypeScript, 307 tests). Two projects shipped: RAG evaluation harness and AI agent config linter.

daily-logpipelineopen-source
12 min read

Daily Build Log — 2026-04-15

Round 81: skillpack — a package manager, lockfile, and bundler for agent skills (SKILL.md / .cursorrules / AGENT.md / skill.yaml). Resolves semver dep

daily-logpipelineopen-source
9 min read

Daily Build Log — 2026-04-14

Round 80: mcptrace-rs — a Rust observability proxy for MCP servers. Drop-in between your agent and any MCP server, records every tool-call as a span, exports to OTLP / Zipkin / stdout, enforces SLO burn-rate budgets. 199 tests, 12 eval cycles, two counter resets.

daily-logpipelineopen-sourcerustmcpobservabilityopentelemetry
7 min read

Daily Build Log — 2026-04-13

Round 79: benchdiff-rs — the first V1 Rust project. A statistical benchmark regression detector for Criterion, Go bench, and hyperfine output. Welch's t-test matching scipy to 8 decimals, 180 tests, 8 eval cycles, single static binary.

daily-logpipelineopen-sourcerustbenchmarkingstatisticsmilestone
6 min read

Daily Build Log — 2026-04-12

Round 78: envaudit — a Python CLI that audits environment variable configurations across .env, CI/CD, Docker Compose, and Kubernetes for leaked secrets, cross-environment drift, and missing required vars. 25 rules, 274 tests, 9 eval cycles.

daily-logpipelineopen-sourcepythonsecuritydevops
12 min read

Daily Build Log — 2026-04-11

Round 77: mcpaudit — a static security auditor for MCP server definitions. 20 rules, 327 tests, 92% coverage, hardened across 23 adversarial eval cycles. Plus six new bug patterns about regex false positives, schema inference, and arithmetic overflow.

daily-logpipelineopen-sourcemcppythonsecuritystatic-analysis
12 min read

Daily Build Log — 2026-04-10

Round 76: mcprouter — a routing gateway for MCP servers. 10 eval cycles, 17 bugs, and a masterclass in everything that can go wrong with circuit breakers, half-open states, and stdio child processes.

daily-logpipelineopen-sourcemcpgoconcurrency
8 min read

Daily Build Log — 2026-04-09

Round 75: sessaudit — CLI Agent Session Auditing (TypeScript, 205 tests). A security compliance tool for CLI agent sessions with policy enforcement and risk scoring.

daily-logpipelineopen-source
3 min read

Daily Build Log — 2026-04-07

Today I shipped agentmem (V1 Round 74) — a testing framework for AI agent memory systems.

daily-logpipelineopen-source
3 min read

Daily Build Log — 2026-04-06

Today I shipped promptdiff (V1 Round 73) — a prompt regression testing CLI for LLM-powered applications.

daily-logpipelineopen-source
5 min read

Daily Build Log — 2026-04-05

Today I shipped two projects: apidiff (V1 Round 72) and ouroboros-rs (V2 Project 20).

daily-logpipelineopen-source
2 min read

Graft v6.1: Safe Compilation — CLAUDE.md Is No Longer Overwritten

graft compile now outputs to orchestration.md instead of CLAUDE.md. Existing project support, repo cleanup, marketing push.

graftcompilerllmclaudeproductization
2 min read

Graft v6.2: Using Graft in Coding Workflows

Graft handles NL sub-steps (review, analysis, planning) within coding workflows. Import existing .claude/ structures with graft import. Positioning refined to natural-language I/O pipelines.

graftcompilerllmclaudeproductization
1 min read

Graft v6.2: graft import — Reverse-Compile Existing Harnesses into .gft

New graft import command reads existing .claude/ structures and generates .gft files. Round-trip validated, 1,722 tests.

graftcompilerllmclaudeproductization
7 min read

Daily Build Log — 2026-04-04

Two projects shipped: apidiff (API regression detection, Go, 247 tests) and ouroboros-rs (Socratic execution engine in Rust, 199 tests)

daily-logpipelineopen-source
4 min read

Graft v6.0: The Closed Loop — Generate, Execute, Validate, Fix

Natural language pipeline generation, Claude Code native integration, runtime quality validation with automatic .gft fix suggestions. 1,712 tests.

graftcompilerllmclaudeproductization
6 min read

Daily Build Log — 2026-04-03

Two projects shipped: spectest (TypeScript, OpenAPI spec compliance testing) and edict-rs (Rust reimplementation of edict multi-agent orchestration).

daily-logpipelineopen-source
4 min read

Graft v5.7: First npm Publish — 설치하고 돌려볼 수 있다

npm publish, 9 bug fixes from deep debugging, e2e Claude Code verification. 1,574 tests. M1 milestone complete.

graftcompilerllmnpmclaudeproductization
4 min read

Graft v5.8: Conditional Routing, Rustc Errors, and Honest Positioning

Conditional edge codegen, rustc-style errors with did-you-mean, graft watch, graft visualize. Honest execution model docs. 1,614 tests.

graftcompilerllmclaudeproductization
5 min read

Daily Build Log — 2026-04-02

Two projects shipped: strmtest (Go, streaming protocol testing) and understand-rs (Rust reimplementation of Understand-Anything).

daily-logpipelineopen-source
5 min read

Graft v3.0: Multi-Backend Codegen, Field-Level Writes, and What Happens When You Skip Debate

Graft v3.0 adds pluggable codegen backends, field-level memory writes, multi-field partial reads, and failure strategies. 477 tests, 168 ratchet decisions, 8 rounds — half without any debate.

graftcompileradversarial-debateai-agentstypescript
4 min read

Graft v3.1: LSP Completions, API Surface, and 12 Rounds Without a Bug

Graft v3.1 adds LSP autocomplete, programmatic API exports, parallel failure fix, and fallback cycle detection. 537 tests, 172 ratchets, 5 rounds — only 1 needed debate.

graftcompilerlspadversarial-debateai-agentstypescript
5 min read

Graft v3.2: Parser Error Recovery and the Brace-Depth Bug That Almost Was

Graft v3.2 adds panic-mode parser error recovery, keyword hover docs, LRU cache eviction, and import completion wiring. 582 tests, 180 ratchets, 4 rounds — under 10 agent calls.

graftcompilerparserlspadversarial-debateai-agentstypescript
4 min read

Graft v3.3: Auto-Import Code Actions and Runtime Conditional Routing

Graft v3.3 adds LSP code actions for auto-importing undefined references, runtime conditional edge routing, document symbols, and condition type validation. 636 tests, 190 ratchets, 4 rounds.

graftcompilerlspruntimeadversarial-debateai-agentstypescript
4 min read

Graft v3.4: LSP Rename and the 200th Ratchet Decision

Graft v3.4 adds LSP rename support, conditional edge token estimation, hierarchical document symbols, and splits the LSP features module. 690 tests, 200 ratchets, 4 rounds.

graftcompilerlspadversarial-debateai-agentstypescript
4 min read

Graft v3.5: Rename Hardening and the A3 Backlog

Graft v3.5 hardens the rename feature with comment/string filtering, CRLF normalization, cross-file conflict detection, and adds FlowNode locations to the parser. 739 tests, ~210 ratchets, 4 rounds.

graftcompilerlspadversarial-debateai-agentstypescript
3 min read

Graft v3.6: Find All References and the First Bug in 24 Rounds

Graft v3.6 adds LSP find-all-references, unifies the keyword system, fixes document symbol ranges, and catches the first NEEDS_CHANGES in 24 rounds. 790 tests, ~220 ratchets, 4 rounds.

graftcompilerlspadversarial-debateai-agentstypescript
4 min read

Graft v3.7: Runtime Hardening

Foreach failure handling and multi-hop conditional routing. 832 tests, 3 bugs caught by dual-agent debate, first runtime-focused version since v3.1.

graftcompileradversarial-debateruntimeclaude-code
4 min read

Graft v3.8: Estimator-Runtime Parity

Closing the longest-running tech debt, extracting the densest code block, and the first clean sweep in 7 rounds. 864 tests, ~10 agent calls, 0 NEEDS_CHANGES.

graftcompileradversarial-debatetech-debt
4 min read

Graft v3.9: Quality Polish

Final v3.x release — 890 tests, conditional edge transforms, tech debt zero. 3 rounds, ~9 agent calls.

graftcompilerllmadversarial-debateclaude
4 min read

Graft v4.0: Variables, Expressions, and Graph Composition

First major version bump — let bindings, expression evaluation, graph parameters, graph calls. 980 tests, 6 rounds, 33 new ratchets.

graftcompilerllmadversarial-debateclaude
3 min read

Graft v4.1: Quality Hardening

Quality-only release — multi-segment condition fix, output isolation, scope extraction, exhaustive switches. 1,001 tests, 4 rounds, 0 new features.

graftcompilerllmadversarial-debateclaude
3 min read

Graft v4.2: Expression Functions

Built-in functions (len, max, min, str), graph call return values, equality unification. 1,048 tests, 4 rounds, 2 new error codes.

graftcompilerllmadversarial-debateclaude
3 min read

Graft v4.3: Arithmetic Operators

Multiplication, modulo, abs/round/keys builtins, division precedence fix. 1,077 tests, 3 rounds, 17 consecutive PASS.

graftcompilerllmadversarial-debateclaude
3 min read

Graft v4.4: String Interpolation

Template expressions, evaluateExpr extraction, BUILTIN_FUNCTIONS enrichment. 1,123 tests, 4 rounds, 21 consecutive PASS.

graftcompilerllmadversarial-debateclaude
3 min read

Graft v1.1: Parallel Execution and Foreach Loops

Adding parallel and foreach flow control to Graft with an adaptive adversarial debate process. The debate even found a bug in the benchmark file itself. 135 tests, 58 ratchet-locked decisions.

graftcompileraiclaude-codeadversarial-debate
3 min read

Graft v1.2: From Code Generator to Execution Engine

Adding graft run — compile and execute AI pipelines by spawning Claude Code subagents. A4-Specialist proposed reusing generateAgent(), then self-retracted as forced dissenter. A3-Skeptic caught two showstoppers. 171 tests, 70 ratchet-locked decisions.

graftcompileraiclaude-codeadversarial-debateruntime
5 min read

Graft v2.0: Import System and Persistent Memory

Multi-file imports and persistent memory across pipeline runs. 5 debate rounds across all 7 pipeline stages, foreach memory staleness bug caught by A3-Skeptic alone, and a 3:1 vote for field-matching merge over full overwrite. 249 tests, 92 ratchet-locked decisions.

graftcompileraiclaude-codeadversarial-debateimportsmemory
5 min read

Graft v2.1: Token Tracking & Correctness

Quality release with runtime token tracking, 4 correctness fixes, and shared module extraction. 288 tests, 21 agent calls (42% under budget), zero review failures.

graftcompileradversarial-debatetoken-trackingclaude-code
5 min read

Graft v2.2: LSP, npm, and the Last Mile

Developer tooling release: LSP server with diagnostics/hover/go-to-definition, VS Code extension, npm distribution as @graft-lang/graft. 376 tests, ~33 agent calls, 132 ratchet-locked decisions.

graftcompileradversarial-debatelspvscodenpmclaude-code
4 min read

Daily Build Log — 2026-03-31

Today was a milestone day for the daily-challenge project — we shipped 7 projects total, including our first-ever Rust reimplementation in the V2 pipe

daily-logpipelineopen-source
5 min read

Graft v1.0: Building a Programming Language with Adversarial AI Debate

From idea to working compiler in one sprint. 4 AI agents argued over every design decision, caught 3 critical bugs, and produced a graph-native language that compiles .gft to Claude Code harness structures. 110 tests, 46 ratchet-locked decisions.

graftcompileraiclaude-codeadversarial-debatelanguage-design
9 min read

Building rtk from Scratch in Go — What I Learned

I reimplemented rtk (14.6K stars, Rust) in Go with zero dependencies. Here's the performance comparison and what I discovered.

reimplementationgorustllmtoken-optimizationbenchmark
3 min read

Day 8: Pipeline Overhaul — Skills, Safety, Self-Evolution

Rewrote the pipeline infrastructure: Node.js safety hooks, bug pattern auto-learning, 11 Claude Code skills, 6 plugins published. Then shipped 7 more projects to reach 57 total.

daily-logpipelineinfrastructureplugins
4 min read

Day 7: 14 Projects in One Session — Hitting 50 Projects

Built 14 open-source tools in a single session, reaching the 50-project milestone. Redesigned the scoring engine, added safety gates, and taught the pipeline to evolve itself.

daily-logpipelinemilestoneopen-source
7 min read

Building One Tool a Day — My AI-Powered Open Source Pipeline

How I built a pipeline that produces one open-source developer tool every day — 35 projects and 10,777 tests so far — by combining AI agents, adversarial evaluation, and trend data.

aiopen-sourcepipelineclaude-code