Graft v6.0: The Closed Loop — Generate, Execute, Validate, Fix
Graft is a DSL that compiles .gft pipeline definitions into Claude Code harness structures. v6.0 closes the loop — from natural language intent to validated execution results with automatic fix suggestions.
The Core Idea
Before v6.0, the workflow was open-ended:
Write .gft → compile → run → ???
The user had to manually read JSON output files, judge quality, and figure out what to fix. v6.0 closes every gap:
Describe what you want (natural language)
→ Claude Code writes .gft (it knows the syntax)
→ graft compile → graft run
→ Formatted results + quality validation
→ Automatic fix suggestions
→ Apply fixes → rerun
What Shipped
Claude Code Native Integration
graft init now generates .claude/CLAUDE.md containing the full .gft syntax reference. Any Claude Code session in an initialized project automatically understands Graft.
graft init my-project
cd my-project
claude
Then just say: "Make a code review pipeline with security, logic, and performance reviewers running in parallel, then a senior reviewer."
Claude Code writes the .gft file, runs graft compile, done. No DSL learning required.
Token Efficiency
We measured the difference. For a 4-node data analysis pipeline:
| Metric | Manual (natural language) | Graft (.gft) | |--------|--------------------------|---------------| | LLM input tokens | ~483 | ~387 | | LLM output tokens | ~3,888 (12 files) | ~388 (1 file) | | Time | ~48 sec | ~4 sec | | Files to manage | 12 | 1 |
82% fewer total tokens, 10.8x faster. The key insight: output token savings (90%) dominate. The LLM writes one .gft file instead of 12 config files, and the compiler handles the rest.
Result Summary Formatter
graft run now outputs human-readable results instead of raw JSON:
Graph 'DataAnalysis' completed in 15.2s
✓ Classifier haiku 1.2s 1,200 tok
✓ StatAnalyzer sonnet 3.4s 5,420 tok
✓ TrendAnalyzer sonnet 3.1s 5,420 tok
✓ ReportWriter opus 7.5s 15,520 tok
Token usage: 27,560 / 40,000 (69%)
[████████████████████░░░░░░░░░░]
Runtime Quality Validation
After execution, the validator checks every node's output against the .gft schema:
- Schema: Are all declared
producesfields present? - Type: Is a
Stringactually a string? Is aListactually an array? - Range: Is
Float(0..1)actually between 0 and 1? - Empty: Are lists or strings unexpectedly empty?
- Budget: Did token usage exceed 90% of the declared budget?
── Quality Check ─────────────────────────────
✓ score OK [Analyzer.score]
✓ items OK [Analyzer.items]
⚠ Field 'recommendations' is empty [ReportWriter.recommendations]
✓ Token budget OK: 69%
────────────────────────────────────────────────
Quality: 75% (3/4 checks passed)
Automatic Fix Suggestions
When quality issues are found, the feedback engine suggests specific .gft modifications:
── Suggestions ───────────────────────────────
⚠ Field 'recommendations' is empty. The node may need more output tokens.
→ Increase ReportWriter output budget: budget: 10k/10k
────────────────────────────────────────────────
Each suggestion maps to a concrete .gft change:
| Problem | Suggestion |
|---------|-----------|
| Empty field | Increase node output budget |
| Budget exhaustion | Add edge transforms (truncate, compact) |
| Node failure | Add on_failure: retry(2) |
| Type mismatch | Check produces schema |
graft generate (CLI fallback)
For environments without Claude Code, graft generate calls Claude Code as a subprocess to generate .gft from natural language. It validates the output with the Graft compiler and retries up to 2 times on parse failure.
Architecture
Three new runtime modules:
RunResult → result-formatter.ts → human-readable output
→ result-validator.ts → QualityReport (schema/type/range/empty/budget)
→ feedback.ts → Suggestion[] (concrete .gft fixes)
All three integrate into graft run automatically. Use --json for machine-readable output.
Numbers
- 1,712 tests (from 1,668 in v5.9)
- 3 new runtime modules + CLI integration
- ~8x compression ratio (
.gftto generated files) - ~82% token savings vs manual configuration
What's Next
The loop is closed for single-pipeline workflows. Next priorities:
- Multi-pipeline composition (import and chain pipelines)
- Other codegen backends (Cursor, Windsurf)
- Community example gallery