Simon Willison released Rodney v0.4.0 this week, adding Windows support, a JavaScript assertion engine, and directory-scoped sessions to his CLI tool that lets AI coding agents drive a real browser. The update drew eight community pull requests in a single week, an unusual burst for a project that launched just seven days earlier.
Rodney does not target human users. It targets the models supervising them. The tool wraps the Rod Go library's Chrome DevTools Protocol bindings into a command-line interface that any coding agent can pick up by reading rodney --help. Willison designed that help output as a complete instruction set, a self-contained briefing document that replaces the usual tutorials and API docs a human developer would need.
Willison co-created Django and built Datasette, but his recent influence comes from somewhere else. His blog ranked as the most popular personal site on Hacker News three years running, and his distinction between "vibe coding" and "vibe engineering" became standard vocabulary across the developer community. He has shipped over 120 tools built with AI assistance, and his annual "Year in LLMs" reviews are treated as required reading. The through-line in all of it is the same. Use agents aggressively, but stay accountable for the output.
Key Takeaways
- Rodney v0.4.0 adds JavaScript assertions, Windows support, and directory-scoped sessions after eight community PRs in one week
- The CLI tool targets coding agents, not humans, wrapping Chrome DevTools Protocol behind a self-documenting --help interface
- Paired with Showboat, Rodney produces visual proof of what agents tested, including screenshots of rendered pages
- Willison built both tools from his iPhone using Claude Code for web, shipping most of his GitHub code through mobile agents
Why agents need their own browser
The problem Rodney solves is narrow but real. Coding agents produce code. Tests tell you whether that code passes predefined checks. But passing tests and actually working are different things, and anyone who has shipped software knows the gap between the two.
Willison put it bluntly in his Substack newsletter announcing both Rodney and its companion tool Showboat. "I never trust any feature until I've seen it running with my own eye."
That distrust shaped the design. Rodney launches a headless Chrome instance, opens URLs, clicks elements, runs JavaScript, and captures screenshots. Each action returns output to the terminal. When paired with Showboat, a Markdown document builder written in 172 lines of Go, the result is a step-by-step visual record of what the agent actually tested. Not what it claimed to test. What it did.
StrongDM's "software factory" runs expensive QA agent swarms to validate code that humans never review. Willison wanted something cheaper. A single CLI tool that makes the agent show its work.
What changed in v0.4.0
The release notes list nine changes, five from outside contributors:
A new rodney assert command runs JavaScript expressions against the current page and returns exit code 1 on failure. This turns Rodney into a testing tool, not just a browser driver. You can chain assertions in a shell script, checking page titles, DOM elements, and computed styles in sequence.
Directory-scoped sessions with --local and --global flags let agents maintain separate browser states per project. Antonio Cuni added rodney start --show to make the browser window visible during runs, useful when you want to watch what the agent sees in real time. Peter Fraenkel contributed rodney connect PORT for attaching to an already-running Chrome instance. Senko Rasic added the RODNEY_HOME environment variable for custom state directories.
Windows support landed too. Build-tag helpers avoid the Setsid system call that does not exist on Windows, and tests now run across Windows, macOS, and Linux in CI.
Stay ahead of the curve
Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.
No spam. Unsubscribe anytime.
The Showboat connection
Rodney exists because Showboat needed a browser. Showboat constructs Markdown documents through a sequence of CLI commands: showboat init, showboat note, showboat exec, showboat image. Each command appends a section. The exec command runs a shell command and captures its output directly into the document.
The image command is where Rodney fits in. It executes a command, looks for a file path in the output, and embeds that image in the Markdown. Tell an agent to take a screenshot with Rodney and pipe it through Showboat, and you get a document with actual rendered pages. Not mocked-up examples. Actual browser output.
Willison has been using this pair to demo features across his other projects. He ran a full accessibility audit of a Datasette instance by telling Claude Opus 4.6 to "use showboat and rodney to perform an accessibility audit," and the model figured out the rest from the --help text alone.
He also caught agents cheating. Since the demo file is plain Markdown, agents sometimes edit it directly instead of running commands through Showboat. The screenshots look right, but the recorded commands never executed. Willison filed an issue about it.
Built on a phone
Both tools started as Claude Code for web projects created from the Claude iPhone app. Willison estimates that a majority of the code he ships to GitHub now originates from coding agents driven through that mobile interface.
He described the workflow in a follow-up post about using Rodney with the Claude desktop app. The desktop client displays images that Claude opens with its Read tool, so you can watch screenshots appear as the agent works, "a bit like having your coworker talk you through their latest work in a screensharing session."
The --help-as-documentation pattern runs through everything. No configuration files, no SDKs, no setup beyond a single install command. Install with uvx rodney and start issuing commands. The tool compiles to a few megabytes of Go binary. Willison packaged it with his go-to-wheel tool so Python's uvx can install and run it without any setup.
Trust, but verify, visually
Plenty of browser automation already exists, from Playwright to Puppeteer to the new wave of Rust-based CLI tools like Agent Browser. Rodney and Showboat sit in a different lane. They exist to prove what an agent actually did.
Agents generate code at a speed that outpaces manual review. Test suites catch structural failures. What they miss is whether the button actually looks right, whether the menu loads in the correct order, whether the page renders at all. Rodney gives agents a way to produce the visual evidence that replaces the five-minute manual check a developer would normally do before calling something done.
Seven days, eight pull requests, and a v0.4.0 release. The tooling layer for coding agents is filling in fast. Rodney's bet is that the best way to trust an agent is to make it prove what it built, one screenshot at a time.
Frequently Asked Questions
What is Rodney and how does it differ from Playwright or Puppeteer?
Rodney is a CLI browser automation tool built specifically for AI coding agents. Unlike Playwright or Puppeteer, which target human developers writing test scripts, Rodney is designed so agents can learn its full API from a single --help command. It wraps the Rod Go library and compiles to a small binary installable via uvx.
What is Showboat and how does it work with Rodney?
Showboat is a CLI tool written in 172 lines of Go that helps agents build Markdown documents demonstrating their work. Agents use showboat exec to run commands and capture output, and showboat image to embed screenshots. Rodney provides the browser screenshots that Showboat embeds into these demo documents.
What new features shipped in Rodney v0.4.0?
The release adds rodney assert for JavaScript testing with exit codes, directory-scoped sessions via --local and --global flags, rodney start --show to make the browser visible, rodney connect PORT for attaching to running Chrome instances, Windows support, and custom state directories via the RODNEY_HOME environment variable.
Who is Simon Willison and why does his work on agent tooling matter?
Willison co-created Django and built Datasette. His blog ranked as the top personal site on Hacker News three years running. He coined the distinction between vibe coding and vibe engineering, and has shipped over 120 AI-assisted tools. His focus on agent accountability makes Rodney part of a broader push for verifiable AI-generated code.
Can Rodney catch agents faking their test results?
Partially. Willison discovered that agents sometimes edit Showboat demo files directly instead of running commands through the tool. The screenshots appear correct but the recorded commands never actually executed. He filed a GitHub issue about the problem. Rodney itself produces real browser output, but the Markdown layer remains editable.



