Graft v3.0: Multi-Backend Codegen, Field-Level Writes, and What Happens When You Skip Debate
Graft is a graph-native language for AI agent harness engineering. It compiles .gft source files into execution harnesses — declaring how agents communicate, what context they share, and how token budgets flow through a pipeline. The compiler is built with a multi-agent adversarial debate process where 2-4 AI agents independently analyze, cross-critique, and converge on implementations.
v3.0 is the biggest architectural release since v2.0. Three new capabilities, a quality cleanup pass, and an accidental experiment that proved half the debate process is optional.
What's New
Pluggable Codegen Backends
Graft's codegen previously hardcoded Claude Code output. v3.0 introduces a CodegenBackend interface with four methods, making the output target swappable:
# Compile with the default Claude Code backend
graft compile pipeline.gft
# Or specify a backend explicitly
graft compile pipeline.gft --backend claude-code
The ClaudeCodeBackend delegates to the existing standalone functions — zero restructuring of working code. New backends (Cursor rules, Windsurf, raw markdown) can implement the same four-method interface: generateAgent, generateHook, generateOrchestration, generateSettings.
Field-Level Memory Writes
v2.0 introduced memory declarations. v3.0 makes writes precise. Instead of writing an entire memory object, nodes can target specific fields:
memory UserProfile(max_tokens: 1k, storage: file) {
name: String
preferences: Map<String, String>
history: List<String>
}
node Updater(model: haiku, budget: 2k/1k) {
reads: [UserProfile]
writes: [UserProfile.preferences] # only touches this field
produces Result { status: String }
}
Under the hood, writes: string[] became WriteRef { memory, field?, location }. The runtime saveMemory accepts an optional fields parameter for surgical JSON updates instead of full object replacement.
Multi-Field Partial Reads
Nodes can now read multiple fields from a context or memory in a single declaration using brace syntax:
node Analyzer(model: sonnet, budget: 4k/2k) {
reads: [UserProfile.{name, preferences}] # two fields, one source
produces Analysis { summary: String }
}
This compiles to ContextRef.field: string[] (previously string | undefined). The token estimator scales correctly: Math.min(PARTIAL_FIELD_FACTOR * fieldCount, 1.0) at all three estimation sites.
Failure Strategies
The on_failure clause, deferred since v1.2, is now fully implemented:
node Risky(model: sonnet, budget: 5k/2k) {
on_failure: retry(3)
# or: fallback(SafeNode), skip, abort, retry_then_fallback(2, SafeNode)
produces Result { data: String }
}
executeWithFailureStrategy in the flow runner handles all five strategies. The scope checker validates fallback references at compile time via SCOPE_INVALID_FALLBACK.
The Debate That Mattered
v3.0 ran 8 rounds. The first 4 used multi-agent debate (MEDIUM and HIGH tiers). The last 4 were direct implementation — no debate at all.
Only one debate round produced bugs the process actually caught: R2 (WriteRef + multi-field reads). A3-Skeptic found two silent runtime corruption bugs that all three other agents missed:
-
.join()on objects: Existing codegen calledwrites.join(', '). Whenwriteschanged fromstring[]toWriteRef[], this produced[object Object], [object Object]instead of memory names. No error thrown — just garbage in the generated harness. -
String iteration on arrays: Code iterating
ref.fieldwhenfieldchanged fromstringtostring[]would iterate individual characters ('n','a','m','e') instead of field names ('name'). Again, silent corruption.
Both bugs would have shipped undetected without adversarial analysis. This is the third consecutive version where A3-Skeptic's silent-corruption instinct was the single highest-value debate contribution.
Quality Cleanup
R6 reorganized GraftErrorCode from a flat 21-member union into sub-unions: ParseErrorCode | ScopeErrorCode | TypeErrorCode | BudgetErrorCode | ImportErrorCode | GraphErrorCode | ConfigErrorCode. The parser gained PARSE_UNEXPECTED_TOKEN and PARSE_MISSING_FIELD codes — unlocking a ratchet that had kept it throw-only since v2.2. SourceLocation gained a length field on all tokens, enabling the LSP to render non-zero-width diagnostic squiggles. Dead code (bench script, stale error names) was removed.
What Skipping Debate Proved
After R4, the context window overflowed, forcing session recovery. R5-R8 were implemented without any debate infrastructure: no multi-agent analysis, no cross-critique, no convergence step. A single agent implemented each round directly.
All four passed review on the first attempt. They produced 37 new tests and 15 new ratchet decisions.
Why it worked: By R5, the codebase had strong conventions (ProgramIndex threading, error code sub-unions, CodegenBackend interface). Each round had narrow scope. The 430+ existing tests caught regressions. Common memory provided enough design context without live debate.
The decision rule: Use debate when introducing new concepts (interfaces, subsystems, data structures). Skip it when applying existing concepts to new inputs. v3.0 proved this empirically.
Stats
| Metric | Value | |--------|-------| | Tests | 477 (101 new) | | Ratchet decisions | 168 (17 new, 4 unlocked) | | Rounds | 8 (4 debated, 4 direct) | | Agent calls | ~26 | | First-try pass rate | 100% (8/8) | | Critical bugs caught | 2 (both A3-Skeptic, R2) | | Cross-critique used | 0 | | New source files | 0 (all changes in existing modules) |
Try It
npm install -g @graft-lang/graft
# Compile a pipeline
graft compile pipeline.gft --out-dir ./output
# Check without generating files
graft check pipeline.gft
# Run with dry-run mode
graft run pipeline.gft --input '{"task": "review"}' --dry-run
Source: github.com/JSLEEKR/graft