Graft v3.0: Multi-Backend Codegen, Field-Level Writes, and What Happens When You Skip Debate

Graft is a graph-native language for AI agent harness engineering. It compiles .gft source files into execution harnesses — declaring how agents communicate, what context they share, and how token budgets flow through a pipeline. The compiler is built with a multi-agent adversarial debate process where 2-4 AI agents independently analyze, cross-critique, and converge on implementations.

v3.0 is the biggest architectural release since v2.0. Three new capabilities, a quality cleanup pass, and an accidental experiment that proved half the debate process is optional.

What's New

Pluggable Codegen Backends

Graft's codegen previously hardcoded Claude Code output. v3.0 introduces a CodegenBackend interface with four methods, making the output target swappable:

# Compile with the default Claude Code backend
graft compile pipeline.gft

# Or specify a backend explicitly
graft compile pipeline.gft --backend claude-code

The ClaudeCodeBackend delegates to the existing standalone functions — zero restructuring of working code. New backends (Cursor rules, Windsurf, raw markdown) can implement the same four-method interface: generateAgent, generateHook, generateOrchestration, generateSettings.

Field-Level Memory Writes

v2.0 introduced memory declarations. v3.0 makes writes precise. Instead of writing an entire memory object, nodes can target specific fields:

memory UserProfile(max_tokens: 1k, storage: file) {
  name: String
  preferences: Map<String, String>
  history: List<String>
}

node Updater(model: haiku, budget: 2k/1k) {
  reads: [UserProfile]
  writes: [UserProfile.preferences]    # only touches this field
  produces Result { status: String }
}

Under the hood, writes: string[] became WriteRef { memory, field?, location }. The runtime saveMemory accepts an optional fields parameter for surgical JSON updates instead of full object replacement.

Multi-Field Partial Reads

Nodes can now read multiple fields from a context or memory in a single declaration using brace syntax:

node Analyzer(model: sonnet, budget: 4k/2k) {
  reads: [UserProfile.{name, preferences}]    # two fields, one source
  produces Analysis { summary: String }
}

This compiles to ContextRef.field: string[] (previously string | undefined). The token estimator scales correctly: Math.min(PARTIAL_FIELD_FACTOR * fieldCount, 1.0) at all three estimation sites.

Failure Strategies

The on_failure clause, deferred since v1.2, is now fully implemented:

node Risky(model: sonnet, budget: 5k/2k) {
  on_failure: retry(3)
  # or: fallback(SafeNode), skip, abort, retry_then_fallback(2, SafeNode)
  produces Result { data: String }
}

executeWithFailureStrategy in the flow runner handles all five strategies. The scope checker validates fallback references at compile time via SCOPE_INVALID_FALLBACK.

The Debate That Mattered

v3.0 ran 8 rounds. The first 4 used multi-agent debate (MEDIUM and HIGH tiers). The last 4 were direct implementation — no debate at all.

Only one debate round produced bugs the process actually caught: R2 (WriteRef + multi-field reads). A3-Skeptic found two silent runtime corruption bugs that all three other agents missed:

.join() on objects: Existing codegen called writes.join(', '). When writes changed from string[] to WriteRef[], this produced [object Object], [object Object] instead of memory names. No error thrown — just garbage in the generated harness.
String iteration on arrays: Code iterating ref.field when field changed from string to string[] would iterate individual characters ('n', 'a', 'm', 'e') instead of field names ('name'). Again, silent corruption.

Both bugs would have shipped undetected without adversarial analysis. This is the third consecutive version where A3-Skeptic's silent-corruption instinct was the single highest-value debate contribution.

Quality Cleanup

R6 reorganized GraftErrorCode from a flat 21-member union into sub-unions: ParseErrorCode | ScopeErrorCode | TypeErrorCode | BudgetErrorCode | ImportErrorCode | GraphErrorCode | ConfigErrorCode. The parser gained PARSE_UNEXPECTED_TOKEN and PARSE_MISSING_FIELD codes — unlocking a ratchet that had kept it throw-only since v2.2. SourceLocation gained a length field on all tokens, enabling the LSP to render non-zero-width diagnostic squiggles. Dead code (bench script, stale error names) was removed.

What Skipping Debate Proved

After R4, the context window overflowed, forcing session recovery. R5-R8 were implemented without any debate infrastructure: no multi-agent analysis, no cross-critique, no convergence step. A single agent implemented each round directly.

All four passed review on the first attempt. They produced 37 new tests and 15 new ratchet decisions.

Why it worked: By R5, the codebase had strong conventions (ProgramIndex threading, error code sub-unions, CodegenBackend interface). Each round had narrow scope. The 430+ existing tests caught regressions. Common memory provided enough design context without live debate.

The decision rule: Use debate when introducing new concepts (interfaces, subsystems, data structures). Skip it when applying existing concepts to new inputs. v3.0 proved this empirically.

Stats

| Metric | Value | |--------|-------| | Tests | 477 (101 new) | | Ratchet decisions | 168 (17 new, 4 unlocked) | | Rounds | 8 (4 debated, 4 direct) | | Agent calls | ~26 | | First-try pass rate | 100% (8/8) | | Critical bugs caught | 2 (both A3-Skeptic, R2) | | Cross-critique used | 0 | | New source files | 0 (all changes in existing modules) |

Try It

npm install -g @graft-lang/graft

# Compile a pipeline
graft compile pipeline.gft --out-dir ./output

# Check without generating files
graft check pipeline.gft

# Run with dry-run mode
graft run pipeline.gft --input '{"task": "review"}' --dry-run

Source: github.com/JSLEEKR/graft