← Blog

Graft v6.0: The Closed Loop — Generate, Execute, Validate, Fix

4 min read
graftcompilerllmclaudeproductization

Graft is a DSL that compiles .gft pipeline definitions into Claude Code harness structures. v6.0 closes the loop — from natural language intent to validated execution results with automatic fix suggestions.

The Core Idea

Before v6.0, the workflow was open-ended:

Write .gft → compile → run → ???

The user had to manually read JSON output files, judge quality, and figure out what to fix. v6.0 closes every gap:

Describe what you want (natural language)
  → Claude Code writes .gft (it knows the syntax)
  → graft compile → graft run
  → Formatted results + quality validation
  → Automatic fix suggestions
  → Apply fixes → rerun

What Shipped

Claude Code Native Integration

graft init now generates .claude/CLAUDE.md containing the full .gft syntax reference. Any Claude Code session in an initialized project automatically understands Graft.

graft init my-project
cd my-project
claude

Then just say: "Make a code review pipeline with security, logic, and performance reviewers running in parallel, then a senior reviewer."

Claude Code writes the .gft file, runs graft compile, done. No DSL learning required.

Token Efficiency

We measured the difference. For a 4-node data analysis pipeline:

| Metric | Manual (natural language) | Graft (.gft) | |--------|--------------------------|---------------| | LLM input tokens | ~483 | ~387 | | LLM output tokens | ~3,888 (12 files) | ~388 (1 file) | | Time | ~48 sec | ~4 sec | | Files to manage | 12 | 1 |

82% fewer total tokens, 10.8x faster. The key insight: output token savings (90%) dominate. The LLM writes one .gft file instead of 12 config files, and the compiler handles the rest.

Result Summary Formatter

graft run now outputs human-readable results instead of raw JSON:

Graph 'DataAnalysis' completed in 15.2s

  ✓ Classifier       haiku    1.2s    1,200 tok
  ✓ StatAnalyzer     sonnet   3.4s    5,420 tok
  ✓ TrendAnalyzer    sonnet   3.1s    5,420 tok
  ✓ ReportWriter     opus     7.5s   15,520 tok

Token usage: 27,560 / 40,000 (69%)
  [████████████████████░░░░░░░░░░]

Runtime Quality Validation

After execution, the validator checks every node's output against the .gft schema:

  • Schema: Are all declared produces fields present?
  • Type: Is a String actually a string? Is a List actually an array?
  • Range: Is Float(0..1) actually between 0 and 1?
  • Empty: Are lists or strings unexpectedly empty?
  • Budget: Did token usage exceed 90% of the declared budget?
── Quality Check ─────────────────────────────
  ✓ score OK [Analyzer.score]
  ✓ items OK [Analyzer.items]
  ⚠ Field 'recommendations' is empty [ReportWriter.recommendations]
  ✓ Token budget OK: 69%
────────────────────────────────────────────────
Quality: 75% (3/4 checks passed)

Automatic Fix Suggestions

When quality issues are found, the feedback engine suggests specific .gft modifications:

── Suggestions ───────────────────────────────
  ⚠ Field 'recommendations' is empty. The node may need more output tokens.
    → Increase ReportWriter output budget: budget: 10k/10k
────────────────────────────────────────────────

Each suggestion maps to a concrete .gft change:

| Problem | Suggestion | |---------|-----------| | Empty field | Increase node output budget | | Budget exhaustion | Add edge transforms (truncate, compact) | | Node failure | Add on_failure: retry(2) | | Type mismatch | Check produces schema |

graft generate (CLI fallback)

For environments without Claude Code, graft generate calls Claude Code as a subprocess to generate .gft from natural language. It validates the output with the Graft compiler and retries up to 2 times on parse failure.

Architecture

Three new runtime modules:

RunResult → result-formatter.ts → human-readable output
         → result-validator.ts → QualityReport (schema/type/range/empty/budget)
         → feedback.ts → Suggestion[] (concrete .gft fixes)

All three integrate into graft run automatically. Use --json for machine-readable output.

Numbers

  • 1,712 tests (from 1,668 in v5.9)
  • 3 new runtime modules + CLI integration
  • ~8x compression ratio (.gft to generated files)
  • ~82% token savings vs manual configuration

What's Next

The loop is closed for single-pipeline workflows. Next priorities:

  • Multi-pipeline composition (import and chain pipelines)
  • Other codegen backends (Cursor, Windsurf)
  • Community example gallery