Graft v1.0: Building a Programming Language with Adversarial AI Debate
What is Graft?
Graft is a graph-native language for AI agent harness engineering. A .gft source file declares contexts, nodes, edges with transforms, and a graph — and the compiler produces a complete .claude/ directory with agent definitions, hook scripts, orchestration docs, and settings.
The core insight: multi-agent AI systems waste tokens by passing full natural language context between agents. When Agent A produces a 2,000-token analysis and Agent B only needs three fields, current frameworks still forward the entire blob. Graft fixes this at the language level with typed schemas and edge transforms.
context BugReport(max_tokens: 800) {
title: String
description: String
stack_trace: Optional<String>
}
node Classifier(model: sonnet, budget: 2k/1k) {
reads: [BugReport]
produces Classification {
category: enum(crash, regression, feature_request, performance, security)
priority: enum(p0, p1, p2, p3)
confidence: Float(0..1)
}
}
node Assigner(model: haiku, budget: 1k/500) {
reads: [Classification]
produces Assignment {
team: String
suggested_owner: String
reason: String
}
}
edge Classifier -> Assigner
| select(category)
| select(priority)
| compact
graph BugTriage(input: BugReport, output: Assignment, budget: 5k) {
Classifier -> Assigner -> done
}
That select(category) | select(priority) | compact pipeline extracts only what the downstream node needs — the compiler estimates ~79% token savings on that hop.
The Adversarial Debate Process
Rather than writing code directly, every implementation task went through a structured adversarial debate with 4 AI agents running on Claude Opus 4.6:
- A1 - Architect: System design, module boundaries, interface contracts
- A2 - Pragmatist: YAGNI enforcement, shipping speed, practical tradeoffs
- A3 - Skeptic: Bug finder, edge case hunter
- A4 - Specialist: Compiler theory, domain expertise
7-Step Process per task: Research → Analysis → Cross-Critique → Convergence → TDD Implementation → Review → Memory Update
Forced Dissenter: The agent with the highest confidence must argue against their own position. This broke confirmation bias in every single task.
Ratchet System: Once confirmed, design decisions are locked. No circular debates.
~14 agent calls per task, 8 tasks, ~99 total agent calls.
The Language Design
The specification started in Korean, defining everything from lexical structure to memory systems. The brainstorming session narrowed scope:
- TypeScript as implementation language — fast iteration with
vitest, Node.js ecosystem alignment - Hand-written recursive descent parser — full control over error messages with caret-style diagnostics
- End-to-end scope — parser through codegen in one milestone, not "parser first, codegen later"
- k-suffix notation —
4kmeans 4000,2k/1kmeans budget-in 2000 / budget-out 1000 - Pipe transforms on edges —
| select(findings) | compactinstead of nested blocks
Implementation: 8 Tasks via Adversarial Debate
T1: Scaffolding
tsc-only build, ESM, NodeNext resolution. A3 argued against pre-creating empty directories (YAGNI won). GraftError class with SourceLocation and caret-style formatting locked in.
T2: Lexer
A3 found critical bug #1: GraftError didn't extend Error — breaks all instanceof checks and vitest's toThrow().
A3+A4 found: Float parsing 42.} — without checking that a digit follows the dot, the lexer would consume . as part of a float literal.
T3: AST
A3 proposed narrow literal unions (name: 'String' | 'Int' | 'Float' | 'Bool' instead of name: string). A2 opposed (YAGNI). A3 won — the narrow unions caught 2 codegen bugs in T6.
T4: Parser — The Show-Stopper Bug
A3-Skeptic found the CRITICAL keyword-identifier collision:
The parser had expectIdentifier() which only accepts Identifier tokens. But keywords like input, compact, skip are valid field names. The lexer tokenizes input as a keyword token, so expectIdentifier() rejects it. Every common English word used as a field name would crash the parser.
produces Result {
input: String // "input" is TokenType.Input, not Identifier
output: String // "output" is TokenType.Output
model: String // "model" is TokenType.Model
}
Fix: expectIdentifierOrKeyword() backed by a KEYWORD_TYPES Set. All three other agents missed this.
T5: Analyzer
Three-class decomposition: ScopeChecker (name validation), TypeChecker (field validation), TokenEstimator (budget analysis). A3 found retry_then_fallback worst-case omits fallback cost.
T6: Code Generator
Five codegen files: agents, hooks, orchestration, settings, codegen orchestrator. A2 found toLocaleString() locale dependency — on a Japanese-locale machine, numbers format differently. Fix: always 'en-US'.
T7: Compiler Pipeline & CLI
A3 found: no graph = silent success. A .gft file with contexts and nodes but no graph would produce degenerate output. Fix: explicit guard.
T8: Final Verification
1-agent smoke test. All 110 tests passing, all 14 benchmarks passing, clean build.
A3-Skeptic: 100% Hit Rate
| Task | Bug | Severity |
|------|-----|----------|
| T2 | GraftError not extending Error | Critical |
| T2 | Float parsing 42.} | Medium |
| T3 | Loose name: string vs narrow literals | Medium |
| T4 | Keyword-identifier collision | Critical (show-stopper) |
| T5 | Fallback cost omitted from worst case | Medium |
| T7 | No-graph silent success | Critical |
At least one significant finding in every task from T2–T7.
Honest Retrospective
What worked: The adversarial debate found real bugs that would have shipped. YAGNI discipline kept the compiler at ~2,000 lines. TypeScript + vitest was the right toolchain.
What didn't work: The plan document's code snippets went stale immediately. 3-4 agent calls per task were spent catching mechanical mismatches. The toLocaleString bug should have been caught by a linter, not a 4-agent debate.
What I'd do differently: Start with the benchmark suite for acceptance criteria. Use 2 agents instead of 4 for simple tasks (T1, T3, T8). Regenerate plan code from source before each task.
Stats
| Metric | Value |
|--------|-------|
| Source lines | ~1,936 |
| Unit tests | 110 |
| Benchmarks | 14 |
| Agent calls | ~99 |
| Ratchet-locked decisions | 46 |
| Critical bugs caught | 3 |
| Dependencies (runtime) | 1 (commander) |
Try It
git clone https://github.com/JSLEEKR/graft.git
cd graft && npm install && npm run build
node dist/index.js compile examples/hello.gft --out-dir ./output
node dist/index.js check examples/hello.gft
Built with Claude Opus 4.6 via Claude Code. March 2026.