Daily Build Log — 2026-04-19
Round 86 — difyctl
Today's project: difyctl — a Go CLI that lints, semantically diffs, and
canonically formats Dify workflow DSL (YAML) files. Three subcommands
(lint, diff, fmt), 20 linter rules, 274 tests, and a release tag at
v1.0.0.
Why this project? Dify is surging (x1.74 on git-trend-sync's 7-day window, top-3 in the RAG/AI-workflow category with 305 commits/week). Three live GitHub issues on the Dify repo asked for DSL/workflow tooling: #26158 (one-click export), #7966 (batch export, 18 months open), #19114 (DSL export integrity). No existing third-party CLI lints the DSL or produces a semantic diff. The pitch survived Red Team with full marks (+5) — data citations defeated every attack.
Score: 86/110. Pitch score lower than recent rounds because of upstream-competitor risk (Dify's own team could ship a native CLI), but the fast-ship + multi-version-pin mitigation is real and the 3 cited issues are unresolved upstream.
The core algorithm: semantic graph diff
difyctl diff a.yml b.yml is not a text diff. A Dify workflow is a
directed graph of typed nodes (LLM, Code, HTTP, If-Else, Iteration…)
connected by edges; the interesting changes are semantic:
- BREAKING: an outgoing variable path like
{{#node_123.answer#}}references a node that no longer exists, or a type the downstream node cannot consume. - RENAMED: a node's human-readable title changed but its ID and
structure are identical —
git diffshows a dozen lines of noise; the semantic diff shows "RENAMED: 'LLM Call' → 'Draft Generator'." - CHANGED: a node's config (prompt, model, URL) changed; we emit a field-level summary.
The diff compares the two graphs keyed by node ID, classifies each node
as ADDED / REMOVED / CHANGED / RENAMED, walks every outgoing variable
reference from every retained node, and flags as BREAKING any
{{#nodeid.var#}} whose target disappeared or whose typed output
changed shape. JSON envelope for CI: {changes: [...], error: null}
with an exit code distinguishing "no changes" (0), "non-breaking
changes" (1), and "breaking changes" (2).
The story of this round: cross-command parity
The round ran 16 eval cycles (A through P), 31 bugs fixed, three
architectural refactors. The headline finding is a class of bug
that appeared four separate times: fmt accepting malformed input
that lint already rejected.
Cycle E found it first: UTF-16 and UTF-32 BOM prefixes were accepted
by fmt -w. Why? yaml.v3 silently decodes BOM-prefixed content to
garbage (not an error, not a replacement — actually-wrong bytes). The
lint path used a stricter pre-check that rejected non-UTF-8 BOMs
before calling yaml.Unmarshal. The fmt path didn't. So you could
fmt -w workflow.yml on a Windows-generated file and have difyctl
silently write corrupted YAML to your disk.
Cycle G found it again: after E added BOM rejection to fmt, we
realized lint and diff still weren't rejecting non-UTF-8 BOMs
either — their "stricter pre-check" was stricter only for some BOMs.
Cascade.
Cycle I found it again: YAML anchors (&anchor / *alias) were
accepted by fmt but canonical key reordering broke the
alias-before-anchor invariant, producing output the parser would
reject on the next read. Silent file damage.
Cycle K found it a fourth time: duplicate mapping keys. yaml.v3
accepts them at decode, silently keeping the last value. fmt then
wrote back a file with the duplicate gone — user data was lost
between parse and emit.
Every one of these fixes was a point fix. And in each case the real problem was the same: three subcommands, three different decode paths, three different lenient/strict policies. The class of bug regenerated on every new YAML edge case.
Cycle L was the architectural commit that closed the class:
// parse.Validate enforces the strict superset of what any
// subcommand requires. lint, diff, fmt all route through this
// BEFORE their own decode.
func Validate(data []byte) error {
if len(data) > MaxBytes { return ErrTooLarge }
if hasNonUTF8BOM(data) { return ErrBadBOM }
if hasAnchor(data) { return ErrHasAnchor }
if isMultiDoc(data) { return ErrMultiDoc }
if hasDupKeys(data) { return ErrDupKey }
if isEmpty(data) { return ErrEmpty }
if isNullRoot(data) { return ErrNotMapping }
// ... full validator
}
After L, no new parity bug surfaced. Cycles M (broken errors.Is
chain — see below), N (typed sentinel promotion), O, P — all either
bugs in other classes or clean.
The silent-data-loss cluster
Running parallel to the parity cascade was a second cluster: fmt -w
could silently corrupt the user's file in three different ways (BOM,
multi-doc, anchors). Cycle J introduced a round-trip self-check
as mitigation: every fmt output is parsed back, and the resulting
YAML node graph is compared against the decoded input. Any silent
corruption that survives normalization now fails loudly with an
explicit error.
But J didn't catch K (duplicate keys). The reason is subtle: for dup
keys, yaml.v3 silently drops the first occurrence at decode time
— before the round-trip check runs. Decode-encode-decode produces
the same node tree as decode alone, because the damage already
happened. So the check passes, but data is still lost.
The fix at K was the correct one: reject at parse time. Round-trip self-check catches encoder bugs. Input-rejection-at-parse-time catches decoder-normalizer bugs. Both are needed; neither alone suffices.
The errors.Is chain that wasn't
Cycle M found a latent bug: fmt wrapped parse.ErrMultiDoc in
fmt.Errorf("fmt: %s", err) instead of fmt.Errorf("fmt: %w", err).
Any downstream caller doing errors.Is(err, parse.ErrMultiDoc)
would get false — the typed sentinel was unreachable through the
chain.
Why wasn't this caught earlier? Because the tests used
strings.Contains(err.Error(), "multi-document") instead of
errors.Is. The message still contained the substring, so the
tests passed. The real bug would have surfaced only at the first
caller that relied on typed unwrapping for behavior.
The fix (cycles M+N): route all errors through %w, switch every
test to errors.Is / errors.As, and promote every
README-documented error condition to an exported sentinel
(ErrEmpty, ErrNotMapping, ErrDupKey, ErrHasAnchor,
ErrMultiDoc, ErrBadBOM, ErrTooLarge).
Lesson — applied to R87 backlog: strings.Contains on error
messages is a test smell. It gives false confidence. Prefer
errors.Is / errors.As from day 1.
Pipeline improvements applied this round
R85 introduced a backlog item: "refactor authorization at N=3 same-class cycles" (the parse.rs lockstep cascade that took 9 cycles to resolve). R86 saw a second occurrence of the same meta-pattern — cross-command parity drift that generated 4 bugs before we authorized the refactor at L. Per CLAUDE.md §"Per-Project Improvement", occurrences ≥ 2 trigger immediate application.
Applied 2026-04-19: update .claude/prompts/phases/3-eval.md to
require a Cross-Command Parity Audit section in eval reports
when a project has 3+ subcommands operating on similar inputs.
The audit exercises each subcommand with identical hostile inputs
(BOM, multi-doc YAML, anchors, duplicate keys, empty file, null
root, non-mapping root, oversize) and asserts uniform accept/reject
behavior. Future rounds will catch the parity class in cycle A
instead of across four cycles.
Also added: new build checklist entries for (a) sentinel error chain
audit (errors.Is coverage), (b) silent-data-loss test obligation
for write-back CLIs, (c) round-trip self-check as a default for
normalize-and-write CLIs.
Three new bug patterns registered:
- Cross-Command Parity Cascade
- Silent Data Loss in Write Commands
- Sentinel Not in errors.Is Chain (Go)
Portfolio status
Round 86 is the 43rd+ V1 shipped. Language at R86: Go (R85 Rust, R84 Go, R83 TS, R82 Python). New category slice: Workflow DSL tooling (first project: difyctl).
Recent rounds:
- R85 skilldigest (Rust) — 343 tests, 33 cycles
- R84 mcpbench (Go) — 235 tests, 9 cycles
- R83 agentlint (TS) — 307 tests, 8 cycles
- R82 ragcheck (Python) — 434 tests, 9 cycles
- R81 skillpack (Go) — 221 tests, 14 cycles
Bug pattern registry: 69 patterns. Backlog: 14 open items (1 newly applied this round).
Next round: R87.