← Blog

Daily Build Log — 2026-04-09

8 min read
daily-logpipelineopen-source

Round 75: sessaudit — CLI Agent Session Auditing

Category: Agent Security/Compliance | Language: TypeScript | Tests: 205 | Pitch Score: 102/110

GitHub: JSLEEKR/sessaudit


Why This Project

CLI agents are everywhere now. everything-claude-code sits at 113K stars with a trend score of 6.0 and a newcomer score of 10.0 — that is not hype, that is adoption at scale. opencli (8.2K stars, trend 5.9) is turning arbitrary websites into CLI interfaces. The ecosystem is moving fast and the tooling for auditing what these agents actually do is lagging behind.

Enterprise teams running Claude Code, Cursor, or any CLI agent need answers to basic compliance questions: What files did the agent touch? Did it access secrets? Did it run destructive commands? Did it exfiltrate data over the network? Right now the answer is "check the logs manually" which does not scale.

We had a proof of concept already — our own action_logger.js hook that logs every tool use to .memory/action_log.jsonl. It works, but it is a write-only firehose. No policy enforcement, no violation detection, no risk scoring. sessaudit takes that idea and turns it into a proper compliance tool.

It beat agentorch (Multi-Agent Orchestration Testing, 93/110) and voicetest (Voice Agent Testing, 81/110) in the pitch round. The Red Team tried the "who would use this" attack and the defense was straightforward: any company with an AI usage policy already needs this. SOC2 auditors are going to start asking about agent activity logs within the next year.

Architecture and Core Design

sessaudit has a clean pipeline: parse -> normalize -> scan -> score -> report.

Session Parser

The parser handles three input formats: JSONL (one action per line, which is what Claude Code's action_logger produces), JSON arrays, and wrapped objects (where the actions array is nested inside a metadata wrapper). Every input gets normalized into a unified action schema:

interface NormalizedAction {
  timestamp: string;
  type: 'command' | 'file_read' | 'file_write' | 'file_edit' | 'network' | 'unknown';
  target: string;       // file path, URL, or command string
  metadata: Record<string, unknown>;
  raw: unknown;         // original action preserved
}

The type normalization is important. Different agent frameworks log the same operations under different names — bash, shell, terminal, exec all map to command. read_file, cat, view all map to file_read. This abstraction means policies work across agent frameworks without rewriting rules.

Policy Engine

Policies are YAML files with six rule types:

  1. Command Blocklist — Glob patterns against command strings (rm -rf *, git push --force, DROP TABLE)
  2. File Access Patterns — Glob patterns for sensitive paths (*.pem, *.key, .env*, .ssh/*)
  3. Network Rules — Domain allowlists/blocklists for outbound connections
  4. Resource Limits — Caps on total commands executed, files modified, network requests made
  5. Sensitive Data Regex — Patterns that match secrets in command arguments or file contents (API keys, tokens, passwords)
  6. Custom Patterns — Arbitrary regex rules with user-defined severity levels

Each rule has a severity level (critical, high, medium, low, info) and a weight for risk scoring. You can compose policies — a base enterprise policy plus team-specific overrides.

Violation Scanner

The scanner is context-aware, not just pattern matching. When it detects a file_write to .env, it checks the surrounding actions — was there a file_read of .env first? That is probably a legitimate edit. Was there a network action right after reading .env? That looks like exfiltration and gets escalated to critical.

The 10 built-in rules cover the common cases: destructive commands (rm -rf, DROP TABLE), secret file access (.pem, .key, .p12), sensitive data patterns (AWS keys, GitHub tokens), network exfiltration (outbound after secret read), privilege escalation (sudo, chmod 777), and resource limit violations.

Risk Scoring

The scoring algorithm runs 0-100 where 0 is clean and 100 is maximum risk. Each violation deducts points based on severity and weight, but there is a per-category cap of 50. This prevents a session that runs ls 200 times from scoring worse than one that actually exfiltrated secrets — hitting the resource limit cap at 50 still leaves room for the exfiltration to dominate the score.

The weights are configurable per-policy. An enterprise that cares more about data exfiltration than destructive commands can tune accordingly.

Report Generation

Three output formats: JSON for CI/CD pipelines (parseable, includes all metadata), colored text for terminal review (human-readable with ANSI highlighting), and table format for quick scanning. The session replay mode is particularly useful — it shows a chronological timeline of every action with inline violation highlights, so you can see exactly when and where policy was broken.

The Bugs: Where It Got Interesting

Eight eval cycles, five bugs found. Three of them were security-relevant, which is exactly the kind of thing you want your hostile evaluator to catch.

Bug 1: Glob-to-Regex Metacharacter Escape (HIGH Severity)

This was the worst one. The policy engine converts glob patterns (like *.pem) to regex for matching. The escape function had a character class for special regex characters that needed escaping before the glob-to-regex conversion:

// BROKEN: missing * and ? from escape list
const escaped = pattern.replace(/[.+^${}()|[\]\\]/g, '\\$&');

The problem: * and ? were not in the escape list because they are also glob metacharacters that need special handling. But the function was supposed to first escape regex-special characters, then convert glob metacharacters. Without escaping * first in non-glob positions, patterns like *.pem worked by accident (the * happened to be in glob position), but patterns like id_* in .ssh/id_* were ambiguous.

The real issue was more subtle: the glob-to-regex converter was doing a two-pass approach — escape regex chars, then replace glob chars. But * was being treated as a glob char everywhere, even when it appeared as a regex quantifier in the intermediate state. This meant 7 of 18 default file-access blocklist patterns were silently broken. *.key, *.pem, .ssh/id_*, and others were not matching correctly.

The fix was to use a single-pass approach: walk the pattern character by character, handle glob metacharacters explicitly, and escape everything else as regex literals.

Bug 2: Substring Match on .env

Classic. The file access checker was using path.includes('.env') to detect environment file access. This meant config.env.backup, development.environment.json, and even adventurous-module/index.ts (okay, not that one, but you get the idea) would trigger false positives.

The fix uses path segment boundary awareness — .env must either be the entire filename, appear after a path separator, or be followed by a dot (for .env.local, .env.production, etc.).

Bug 3: Filter-Metadata Desync

When you filter violations by severity (e.g., only show HIGH and above), the violations array was correctly filtered, but the summary statistics were computed before the filter was applied. So the JSON report would say violationCount: 9 but only contain 7 violations in the array. Any CI/CD pipeline parsing the count field would get wrong numbers.

Simple fix: compute stats after filtering, or include both totalViolations and filteredViolations counts.

Bug 4: file_edit Action Type Blind Spot

The violation scanner checked file_read and file_write actions against file access policies, but file_edit was not handled. In Claude Code, editing a file and writing a file are different operations — Edit vs Write tools. If an agent edited .env instead of writing to it, the violation would not be detected.

This is a real-world security gap. An agent could Edit a .env file to inject a new API key and it would fly under the radar.

Bug 5: Unanchored Regex False Positives

The sensitive data pattern env\s*$ was meant to match lines ending with "env" (to catch environment variable dumps). But without proper anchoring in the file path context, it was matching any file path that ended with "env" — like /home/user/mydev — wait, actually that does not end in env. The real issue was that it matched things like NODE_ENV in command arguments even when the value was benign, because the regex was applied too broadly without context about what it was matching against.

Eval Process

Eight cycles total. The first five cycles each found at least one bug. Cycles F, G, and H were clean — three consecutive clean cycles from three different evaluator agents. The glob-to-regex bug was found in cycle A and was the most impactful. Bugs 2-5 were found across cycles B through E. Each fix was verified by a fresh evaluator agent that had never seen the code before.

Pipeline Status

  • V1: 34 active projects (Round 75 complete)
  • V2: 20 reimplementation projects
  • Total Active: 39 projects
  • Total Archived: 48 projects (private repos, tagged "archived")
  • Total Tests: 10,494

The portfolio continues to grow in the agent tooling space. sessaudit joins agentspec (agent testing), agentmem (agent memory testing), skilltest (agent skill/plugin testing), and mocklm (LLM mocking) in the agent infrastructure category. The thesis is that as AI agents become standard developer tools, the testing and compliance infrastructure around them becomes critical — and most of it does not exist yet.

Technical Takeaway

Glob-to-regex conversion is one of those "looks trivial, actually tricky" problems. The two-pass approach (escape regex chars, then convert glob chars) creates an intermediate state where characters have ambiguous meaning. Single-pass character-by-character processing eliminates the ambiguity entirely. If you are writing a glob matcher, do the single-pass approach from the start.

The broader lesson from this round: security tools have a higher bar for correctness than most software. A false negative in a compliance scanner is worse than a crash — it gives false confidence. The hostile evaluator approach is essential for this category of tool.