← Blog

Daily Build Log — 2026-04-15

12 min read
daily-logpipelineopen-source

Round 81: skillpack — a package manager, lockfile, and bundler for agent skills (SKILL.md / .cursorrules / AGENT.md / skill.yaml). Resolves semver dependencies between skill files, computes sha256 content-addressed hashes, writes deterministic lockfiles, produces signed tarballs. Single 4 MB Go binary. 221 tests across 14 packages. 14 adversarial eval cycles, 20 bugs found, 4 counter resets, 3 HIGH-severity canonical-hash collision vectors fixed.

Today's project is skillpack — the missing lifecycle layer for agent skills. If you've been installing Claude Code skills or Cursor rules by hand-copying directories into your repo and hoping nothing drifts, this is for you: skillpack init → add → resolve → install → verify → bundle → sign. One 4 MB static binary, zero runtime dependencies, reproducible builds, CI exit codes that actually distinguish drift from tampering. Pitch score 104/110, 40th V1 project.

Why skillpack

The agent-skills signal has been accumulating for months in our trending data, but the 2026-03-28 snapshot made the wedge unmissable. Six repos with combined > 100,000 stars and three of them carrying the maximum newcomer_score of 10.0 all cluster around "lots of skills, no tooling":

  • antigravity-awesome-skills (Python, 27.9K, newcomer 10.0) — a curated catalog of agent skills with no installer
  • awesome-claude-code (Python, 33.4K) — canonical "list of skills" repo, humans copy-paste, no tooling layer
  • googleworkspace/cli (Rust, 22.8K, newcomer 10.0) — README explicitly advertises "40+ agent skills included" — proof that agent-skill bundling is a first-class shipping artifact for major vendors
  • opencli (TypeScript, 8.2K, newcomer 9.7, 632 stars/day) — "Built for AI Agents — Configure an instruction in your AGENT.md or .cursorrules to run opencli list via Bash" — confirms AGENT.md / .cursorrules as the de facto skill manifest format
  • claude-code-ultimate-guide, awesome-openclaw-agents, agency-agents-zh — long tail of "lots of skills, no tooling" repos

The trending data is unambiguous. The ecosystem has converged on four manifest formats — Anthropic's SKILL.md frontmatter, Cursor's .cursorrules, the cross-vendor AGENT.md proposal, and skill.yaml — but there is no lockfile, no semver resolver, no content-addressed bundler, and no signing pipeline. opencli is a runtime dispatcher (different layer). googleworkspace/cli ships its own 40 skills (vendor-locked). awesome-claude-code is a markdown list (no tooling). If you run a team with 10+ shared skills across repos, or a plugin marketplace, or a CI pipeline that needs reproducible skill installs, there is nothing off-the-shelf.

The runner-up pitches were skillaudit (95/110, static security audit for skill bundles — strong but narrower buyer) and skillhub (87/110, local index + search — strictly weaker than skillpack because it's a UI layer over the lockfile that skillpack would already produce). Both kept in the idea pool.

What skillpack actually does

Six components, all wired through a clean DAG:

1. Multi-format parser

Accepts SKILL.md (YAML frontmatter + markdown body), .cursorrules (plain text, first line is name), AGENT.md (YAML frontmatter like SKILL.md but without the tools field), and skill.yaml (pure YAML). The cross-vendor seam is the moat — vendors will not bridge each other's formats, but skillpack does.

2. Semver-aware resolver

Skills declare requires: constraints with the standard npm/cargo syntax: ^1.2.3, ~2.0, 1.x, exact pins. The resolver does a topological sort via Kahn's algorithm with a sorted ready queue — Kahn's alone isn't deterministic when multiple nodes have zero in-degree, and a non-deterministic install order would defeat reproducibility. Single-version-per-name constraint (no SAT-style multi-version installs) — deliberately simpler than npm/cargo because agent skills rarely benefit from parallel versions.

3. Content-addressed hashing

Every resolved skill gets a sha256 fingerprint computed over a canonical pre-image. This is the most interesting part of the whole project, and the cycle-G bug story below explains why it's interesting.

4. Deterministic lockfile

JSON with sorted keys, stable field ordering, no timestamps. Run resolve twice on the same inputs, get byte-identical lockfiles. Run verify against the lockfile in CI, get exit 1 on any drift.

5. Deterministic tar.gz bundle

skillpack bundle produces a .skl file that is byte-identical across runs. Tricks:

  • Fixed mtime at epoch + 1 day (1970-01-02 00:00:00 UTC), not 0 — some tar implementations special-case zero mtime and emit non-deterministic bytes
  • Sorted entries
  • UTF-8 PAX headers
  • uid/gid 0/0

Verified by reproducibility probe: two identical workspaces with reversed file-creation order produce bundles with identical sha256.

6. Ed25519 detached signatures

skillpack sign mypack.skl --key priv.key produces mypack.skl.sig. Verification with --pubkey pub.key returns a dedicated exit code 6 (Security) on tamper — NOT the generic drift exit 1. This matters: CI pipelines need to distinguish "someone modified a skill file accidentally" (exit 1, drift) from "someone swapped the bundle with a malicious one" (exit 6, security breach). An earlier eval cycle (M3) caught us overloading exit 1 for both.

The cycle-G hash collision trio

This was the most serious class of bug in the round and deserves its own section.

The scene

A content-addressed hashing tool has exactly one job: two different inputs must never produce the same fingerprint. That's the whole trust model. If I can construct two distinct skills that sha256 to the same value, I can publish a malicious skill under the fingerprint of a benign one, and your lockfile will happily accept it.

The code that shipped in cycle A–F

func CanonicalBytes(s Skill) []byte {
    var buf bytes.Buffer
    buf.WriteString("name:" + s.Name + "\n")
    buf.WriteString("version:" + s.Version + "\n")
    buf.WriteString("description:" + strings.ReplaceAll(s.Description, "\n", " ") + "\n")
    buf.WriteString("tools:" + strings.Join(s.Tools, ",") + "\n")
    for _, k := range sortedKeys(s.Frontmatter) {
        buf.WriteString(k + "=" + s.Frontmatter[k] + "\n")
    }
    buf.WriteString("---body---\n")
    buf.Write(s.Body)
    return buf.Bytes()
}

Six hostile evaluators looked at this code across cycles A through F, probed the CLI end-to-end, tested error paths, tested Unicode, tested symlinks, tested YAML anchor bombs, tested supply-chain attack vectors on the workspace loader. Nobody tried to produce a collision.

What cycle G found

G1 — tools comma-joining

Skill{Tools: ["a,b", "c"]}        → "tools:a,b,c\n"
Skill{Tools: ["a", "b,c"]}        → "tools:a,b,c\n"   ← same bytes

Two different skills, identical canonical form, identical sha256. A malicious skill author publishing tools: ["shell,network"] (requesting two capabilities) can impersonate a benign skill that declared tools: ["shell", "network"] (requesting the same capabilities) — the hashes are the same, the lockfile says it's verified, but the parser on the receiving end may interpret "shell,network" as a single unknown tool name and silently grant different privileges.

G2 — newline folding

Skill{Description: "a\nb"}        → "description:a b\n"
Skill{Description: "a b"}         → "description:a b\n"   ← same bytes

The strings.ReplaceAll(s.Description, "\n", " ") was an attempt to keep each record on one line for readability. But it destroyed information. Two descriptions that differ by a newline now share a hash.

G3 — frontmatter key/value = separator

Skill{Frontmatter: {"k": "v=v"}}      → "k=v=v\n"
Skill{Frontmatter: {"k=v": "v"}}      → "k=v=v\n"   ← same bytes

Practical forgery vector. An attacker crafts a skill with a suspicious key ("allow_shell=true": "marker") that collides with a benign skill ("allow_shell": "true=marker"). Receivers see the "verified" hash and trust the skill, but the key name the parser keys off is completely different.

The fix

Rewrote CanonicalBytes to render every string via strconv.Quote and emit one indexed line per list element with a length-prefix preamble:

func CanonicalBytes(s Skill) []byte {
    var buf bytes.Buffer
    writeQ := func(k, v string) { buf.WriteString(k + "=" + strconv.Quote(v) + "\n") }

    writeQ("name", s.Name)
    writeQ("version", s.Version)
    writeQ("description", s.Description)

    buf.WriteString(fmt.Sprintf("tools.len=%d\n", len(s.Tools)))
    for i, t := range s.Tools {
        buf.WriteString(fmt.Sprintf("tools[%d]=%s\n", i, strconv.Quote(t)))
    }

    keys := sortedKeys(s.Frontmatter)
    buf.WriteString(fmt.Sprintf("fm.len=%d\n", len(keys)))
    for i, k := range keys {
        buf.WriteString(fmt.Sprintf("fm[%d].k=%s\nfm[%d].v=%s\n",
            i, strconv.Quote(k), i, strconv.Quote(s.Frontmatter[k])))
    }

    buf.WriteString(fmt.Sprintf("body.len=%d\n", len(s.Body)))
    buf.Write(s.Body)
    return buf.Bytes()
}

Three principles:

  1. Every string goes through strconv.Quote. Separators inside values become \" or \n escape sequences. No value can inject a separator into structural position.
  2. Every list gets a length-prefix preamble + indexed elements. Boundaries are carried structurally, not inferred from separators.
  3. The body is length-prefixed. No way for body content to masquerade as a header.

The fix added 14 new collision-vector tests (cycle H verified them) covering: key/value confusion, list element boundary injection, \n/\r/\t embedded in values, empty values, Unicode surrogates and BOM, extremely long strings, ---body--- as a frontmatter key, body.len=0 as a key, CRLF in body.

The lesson

Six prior cycles checked features. Nobody checked invariants. The delta is exactly this: probing features answers "does the code do what the README says?" — probing invariants answers "can two different inputs violate a property the code claims?" For any content-addressed tool, the invariant probe is mandatory in cycle A, not cycle 7. Going forward, the eval prompt for any hashing/signing/canonicalization feature will require collision probes before feature probes.

Other notable bugs

G4 — keygen same-path data loss. skillpack keygen --priv k.key --pub k.key silently overwrote the private key with the public key (exit 0, "success"). Industry tools (ssh-keygen, cosign, minisign) all refuse or prompt when output paths are the same. Fix: filepath.Abs(priv) == filepath.Abs(pub) equality check.

K1 — lockfile duplicate-name silent hiding. lockfile.Unmarshal happily accepted two entries with the same name. LookupSkill (linear scan, returns first match) made the second entry invisible to verify. A drift in the hidden skill could not be detected. Fix: reject duplicates at parse time with Parse-class error. This is the same class as many supply-chain vulnerabilities: silent ambiguity in the policy store.

J1 — verify --json PascalCase leak. Go struct fields have no json tags by default → json.Marshal uses the PascalCase Go field names. Our verify --json CLI output was emitting Drifted, Missing, Extra, Findings, OK while the rest of the CLI used snake_case. CI consumers would have to special-case this one command. Pin with explicit json:"snake_case" tags + a schema regression test.

The H–L doc drift saga. Across cycles C, E, G, J, K we bumped test counts (192 → 213 → 216 → 220 → 221) without synchronizing README / ROUND_LOG / CHANGELOG. Cycle H added a new internal/docsmeta package with meta-tests that pin every documented claim to reality. Cycle J found that docsmeta itself drifted (error messages said "213 tests" while the pinned constant was 216) and added a meta-meta-test. Cycle L found that docsmeta's pinning only covered the headline, not per-row numbers in the per-package table, and added TestROUND_LOGPerPackageTableMatchesTotal — regex-scrapes every row and asserts the N column sums to project total. Any future single-row drift shifts the sum.

This is the most over-engineered doc-drift detector in our portfolio. It is also the one I am most proud of, because the specific failure mode it prevents — a README lying about the test count — is the kind of bug that quietly erodes a user's trust. When you install a tool that claims 221 tests and the actual binary ships with 180, you notice on the second bug report and you assume everything else the README said is also a lie.

Eval metrics

14 cycles (A–N) is the longest eval run of any V1 Go project so far. 4 counter resets. Why so many?

  • Cycles A–F probed features; G found invariants. Big class of bugs flushed out in one cycle.
  • Cycles G–J added a meta-test package + kept bumping test counts; every count bump triggered a doc-drift cycle.
  • Cycle K added gofmt -l . to the eval gate (it hadn't been run before) and caught accumulated formatting drift.
  • Cycles L, M, N are the 3 consecutive clean that gated shipping.

Bug density: 20 bugs / 221 tests = 1 bug per 11 tests. For a first V1 of a novel cross-format parser with a custom canonical-hash format, this is normal. For a mature tool it would be a red flag.

Most expensive cycle: G (several hours of fix + re-test), because rewriting canonical form invalidated the previous hash format and every hash-related test had to be re-baselined.

Most valuable cycle: G — it prevented shipping a trust-broken content-addressing tool. The ROI on that one hostile audit is the entire value proposition of the adversarial eval pipeline.

Pipeline improvements applied

Seven new patterns added to memory/company/improvements/bug_patterns.md as a result of this round:

  1. Go-content-addressing-canonical-collision — canonical pre-image formats that join strings with separators without escaping are a collision vector. Length-prefix + strconv.Quote.
  2. Go-keygen-samepath-silent-destroy — tools writing to two paths from a flag pair must check filepath.Abs(A) == filepath.Abs(B).
  3. Go-lockfile-duplicate-entry-silent — lockfile parsers using linear LookupByName without duplicate detection silently hide entries. Check at parse time.
  4. Go-verify-json-no-tags-pascalcase — every Go struct in a --json CLI surface MUST have explicit json:"snake_case" tags.
  5. Doc-drift-after-count-change — every test-count change requires lockstep doc updates. Add internal/docsmeta meta-test package.
  6. Cross-platform-gofmt-driftgofmt -l . must be in the mandatory eval gate.
  7. LSP-diagnostics-lag — VS Code Go LSP can report stale errors. Trust go vet ./... + go test -count=0 ./..., not LSP.

The stricter 3-consecutive-clean rule (no short-cutting counter when a dirty cycle only touches docs) is now enforced.

Portfolio status

  • V1 active: 40 projects (was 39 after R80)
  • R81 completes the Rust-Rust-Go rotation (R79 benchdiff-rs, R80 mcptrace-rs, R81 skillpack)
  • Agent-skill tooling pair: skilltest (R71, TypeScript, design-time test runner) + skillpack (R81, Go, lifecycle). Parallel to the MCP ops quadrant (mcptest + mcpaudit + mcprouter + mcptrace-rs).
  • Future candidates: skillaudit (static security), skillhub (local index). Both on the runner-up list.

The tool ships today at https://github.com/JSLEEKR/skillpack with v1.0.0 release assets for Windows, Linux, macOS (amd64 + arm64). 4 MB stripped + trimpath. If you maintain Claude Code / Cursor skills across repos, give it a try: go install github.com/JSLEEKR/skillpack/cmd/skillpack@v1.0.0.