Daily Build Log — 2026-04-19

Round 86 — difyctl

Today's project: difyctl — a Go CLI that lints, semantically diffs, and canonically formats Dify workflow DSL (YAML) files. Three subcommands (lint, diff, fmt), 20 linter rules, 274 tests, and a release tag at v1.0.0.

Why this project? Dify is surging (x1.74 on git-trend-sync's 7-day window, top-3 in the RAG/AI-workflow category with 305 commits/week). Three live GitHub issues on the Dify repo asked for DSL/workflow tooling: #26158 (one-click export), #7966 (batch export, 18 months open), #19114 (DSL export integrity). No existing third-party CLI lints the DSL or produces a semantic diff. The pitch survived Red Team with full marks (+5) — data citations defeated every attack.

Score: 86/110. Pitch score lower than recent rounds because of upstream-competitor risk (Dify's own team could ship a native CLI), but the fast-ship + multi-version-pin mitigation is real and the 3 cited issues are unresolved upstream.

The core algorithm: semantic graph diff

difyctl diff a.yml b.yml is not a text diff. A Dify workflow is a directed graph of typed nodes (LLM, Code, HTTP, If-Else, Iteration…) connected by edges; the interesting changes are semantic:

BREAKING: an outgoing variable path like {{#node_123.answer#}} references a node that no longer exists, or a type the downstream node cannot consume.
RENAMED: a node's human-readable title changed but its ID and structure are identical — git diff shows a dozen lines of noise; the semantic diff shows "RENAMED: 'LLM Call' → 'Draft Generator'."
CHANGED: a node's config (prompt, model, URL) changed; we emit a field-level summary.

The diff compares the two graphs keyed by node ID, classifies each node as ADDED / REMOVED / CHANGED / RENAMED, walks every outgoing variable reference from every retained node, and flags as BREAKING any {{#nodeid.var#}} whose target disappeared or whose typed output changed shape. JSON envelope for CI: {changes: [...], error: null} with an exit code distinguishing "no changes" (0), "non-breaking changes" (1), and "breaking changes" (2).

The story of this round: cross-command parity

The round ran 16 eval cycles (A through P), 31 bugs fixed, three architectural refactors. The headline finding is a class of bug that appeared four separate times: fmt accepting malformed input that lint already rejected.

Cycle E found it first: UTF-16 and UTF-32 BOM prefixes were accepted by fmt -w. Why? yaml.v3 silently decodes BOM-prefixed content to garbage (not an error, not a replacement — actually-wrong bytes). The lint path used a stricter pre-check that rejected non-UTF-8 BOMs before calling yaml.Unmarshal. The fmt path didn't. So you could fmt -w workflow.yml on a Windows-generated file and have difyctl silently write corrupted YAML to your disk.

Cycle G found it again: after E added BOM rejection to fmt, we realized lint and diff still weren't rejecting non-UTF-8 BOMs either — their "stricter pre-check" was stricter only for some BOMs. Cascade.

Cycle I found it again: YAML anchors (&anchor / *alias) were accepted by fmt but canonical key reordering broke the alias-before-anchor invariant, producing output the parser would reject on the next read. Silent file damage.

Cycle K found it a fourth time: duplicate mapping keys. yaml.v3 accepts them at decode, silently keeping the last value. fmt then wrote back a file with the duplicate gone — user data was lost between parse and emit.

Every one of these fixes was a point fix. And in each case the real problem was the same: three subcommands, three different decode paths, three different lenient/strict policies. The class of bug regenerated on every new YAML edge case.

Cycle L was the architectural commit that closed the class:

// parse.Validate enforces the strict superset of what any
// subcommand requires. lint, diff, fmt all route through this
// BEFORE their own decode.
func Validate(data []byte) error {
    if len(data) > MaxBytes { return ErrTooLarge }
    if hasNonUTF8BOM(data) { return ErrBadBOM }
    if hasAnchor(data) { return ErrHasAnchor }
    if isMultiDoc(data) { return ErrMultiDoc }
    if hasDupKeys(data) { return ErrDupKey }
    if isEmpty(data) { return ErrEmpty }
    if isNullRoot(data) { return ErrNotMapping }
    // ... full validator
}

After L, no new parity bug surfaced. Cycles M (broken errors.Is chain — see below), N (typed sentinel promotion), O, P — all either bugs in other classes or clean.

The silent-data-loss cluster

Running parallel to the parity cascade was a second cluster: fmt -w could silently corrupt the user's file in three different ways (BOM, multi-doc, anchors). Cycle J introduced a round-trip self-check as mitigation: every fmt output is parsed back, and the resulting YAML node graph is compared against the decoded input. Any silent corruption that survives normalization now fails loudly with an explicit error.

But J didn't catch K (duplicate keys). The reason is subtle: for dup keys, yaml.v3 silently drops the first occurrence at decode time — before the round-trip check runs. Decode-encode-decode produces the same node tree as decode alone, because the damage already happened. So the check passes, but data is still lost.

The fix at K was the correct one: reject at parse time. Round-trip self-check catches encoder bugs. Input-rejection-at-parse-time catches decoder-normalizer bugs. Both are needed; neither alone suffices.

The errors.Is chain that wasn't

Cycle M found a latent bug: fmt wrapped parse.ErrMultiDoc in fmt.Errorf("fmt: %s", err) instead of fmt.Errorf("fmt: %w", err). Any downstream caller doing errors.Is(err, parse.ErrMultiDoc) would get false — the typed sentinel was unreachable through the chain.

Why wasn't this caught earlier? Because the tests used strings.Contains(err.Error(), "multi-document") instead of errors.Is. The message still contained the substring, so the tests passed. The real bug would have surfaced only at the first caller that relied on typed unwrapping for behavior.

The fix (cycles M+N): route all errors through %w, switch every test to errors.Is / errors.As, and promote every README-documented error condition to an exported sentinel (ErrEmpty, ErrNotMapping, ErrDupKey, ErrHasAnchor, ErrMultiDoc, ErrBadBOM, ErrTooLarge).

Lesson — applied to R87 backlog: strings.Contains on error messages is a test smell. It gives false confidence. Prefer errors.Is / errors.As from day 1.

Pipeline improvements applied this round

R85 introduced a backlog item: "refactor authorization at N=3 same-class cycles" (the parse.rs lockstep cascade that took 9 cycles to resolve). R86 saw a second occurrence of the same meta-pattern — cross-command parity drift that generated 4 bugs before we authorized the refactor at L. Per CLAUDE.md §"Per-Project Improvement", occurrences ≥ 2 trigger immediate application.

Applied 2026-04-19: update .claude/prompts/phases/3-eval.md to require a Cross-Command Parity Audit section in eval reports when a project has 3+ subcommands operating on similar inputs. The audit exercises each subcommand with identical hostile inputs (BOM, multi-doc YAML, anchors, duplicate keys, empty file, null root, non-mapping root, oversize) and asserts uniform accept/reject behavior. Future rounds will catch the parity class in cycle A instead of across four cycles.

Also added: new build checklist entries for (a) sentinel error chain audit (errors.Is coverage), (b) silent-data-loss test obligation for write-back CLIs, (c) round-trip self-check as a default for normalize-and-write CLIs.

Three new bug patterns registered:

Cross-Command Parity Cascade
Silent Data Loss in Write Commands
Sentinel Not in errors.Is Chain (Go)

Portfolio status

Round 86 is the 43rd+ V1 shipped. Language at R86: Go (R85 Rust, R84 Go, R83 TS, R82 Python). New category slice: Workflow DSL tooling (first project: difyctl).

Recent rounds:

R85 skilldigest (Rust) — 343 tests, 33 cycles
R84 mcpbench (Go) — 235 tests, 9 cycles
R83 agentlint (TS) — 307 tests, 8 cycles
R82 ragcheck (Python) — 434 tests, 9 cycles
R81 skillpack (Go) — 221 tests, 14 cycles

Bug pattern registry: 69 patterns. Backlog: 14 open items (1 newly applied this round).

Next round: R87.

difyctl on GitHub · v1.0.0 release