OpenAI Launches Codex Desktop App for macOS With Multi-Agent Workflows and Doubled Rate Limits

OpenAI released a macOS desktop app for Codex today, turning its AI coding agent into a standalone application that can run multiple agents across different projects at the same time. The company also doubled rate limits on all paid ChatGPT plans and temporarily opened Codex access to Free and Go subscribers. More than a million developers used Codex last month, according to OpenAI, and overall usage has doubled since the GPT-5.2-Codex model shipped in mid-December.

The desktop app is a dispatch desk. Not the kind of tool you use to write code yourself, but the kind you sit behind while four agents write code for you across two different projects, each one filing status updates into a queue you can scan between meetings. The metaphor OpenAI chose, "a command center for agents," lands closer to the truth than most marketing copy. What they built is a shift supervisor's console for software that runs semi-autonomously for hours at a stretch.

Key Takeaways

• OpenAI released a macOS desktop app for Codex that manages multiple AI coding agents across projects simultaneously

• Rate limits doubled on all paid ChatGPT plans; Free and Go users get temporary Codex access

• Codex consumed seven million tokens to build a 3D racing game autonomously from a single prompt

• Engineer Michael Bolin disclosed that prompts grow quadratically per session, with context compaction as the workaround

The problem with CLIs and IDE plugins

Codex launched as a CLI tool in April 2025. It worked. You typed a prompt, the agent wrote code, you reviewed the diff. Simple enough for single tasks.

Then developers got greedy. One agent refactoring the authentication module. A second writing tests for the payment flow. A third grinding through lint errors. All hitting the same repo at the same time. Try managing that in a terminal. Three sessions, three blinking cursors, and you're alt-tabbing between them trying to remember which one is stuck. In an IDE, it means tab chaos, scrolling through output panes trying to remember which agent is stuck and which one finished ten minutes ago. Neither interface was designed to let you track what four agents are doing across two different projects while you eat lunch.

The Codex desktop app organizes agents into threads grouped by project, the way a dispatch board groups jobs by vehicle. You can review diffs, comment on changes, and open files in your preferred editor without losing track of which agent did what. Built-in worktree support means each agent works on an isolated copy of the code. No merge conflicts between agents. No accidental commits to main while an agent is mid-refactor.

If you've used Claude Code's macOS app, the layout will feel familiar. OpenAI is catching up here, not breaking new ground, and you can sense the irritation in how hard the company is pushing access and pricing to compensate. Anthropic got to the desktop first. OpenAI wants to make sure that head start doesn't compound.

What seven million tokens of unsupervised work looks like

The most ambitious feature in the Codex app is Skills, which are extension packs bundled as folders of instructions, resources, and scripts. Think of them as dispatch protocols: written procedures that tell an agent how to interact with a specific external tool. OpenAI ships a curated library out of the box, Figma integration for translating designs into UI code, Linear for project management, deployment skills for Cloudflare, Netlify, Vercel, and Render.

You can also write your own. Create a skill in the app and it follows you everywhere, CLI, IDE extension, web interface. Check it into your repo and the whole team gets it. OpenAI says it built hundreds of custom skills internally for tasks like running evals, monitoring training runs, and generating release notes.

To demonstrate what sustained agent work looks like, OpenAI had Codex build a 3D racing game from a single prompt. The agent used an image generation skill and a web game development skill, consumed more than seven million tokens, and played the game itself to test for bugs. Picture a session running for hours on a developer's machine, no human touching the keyboard, the agent cycling through design, implementation, and QA on its own. It designed tracks, created character sprites, implemented a drift-boost system, and built eight maps. The whole thing ran on one initial instruction with generic follow-up prompts telling it to keep improving.

Stay ahead of the curve

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

The dispatch desk exists for exactly this scenario. You're not going to babysit a seven-million-token session in a terminal window, squinting at scrolling output, wondering whether the agent is still productive or just burning through your bill.

Automations take this further. You set a schedule, attach skills, and let Codex run in the background. When it finishes, results land in a review queue, the way a night shift's completed work orders sit on the desk when the morning crew arrives. OpenAI's internal teams use automations for daily issue triage, CI failure summaries, and release briefs. Cloud-based triggers are on the roadmap, which would let agents run even when your laptop lid is closed.

Context windows bloat and caches break

A week before the desktop launch, OpenAI engineer Michael Bolin published a detailed technical breakdown of the Codex agent architecture. The transparency felt deliberate, almost defensive, as if OpenAI wanted to prove it understood its own plumbing before competitors picked it apart. The company hasn't provided similar internals for ChatGPT.

Every Codex session runs on a repeating cycle. The user sends a prompt. The model either returns a response or requests a tool call, like running a shell command or reading a file. The agent executes the tool, appends the output to the prompt, and sends everything back to the model. This loop repeats until the model stops calling tools and delivers a final answer.

The engineering headache is that prompts grow with every turn. Codex sends the full conversation history with each API call, a stateless design that avoids storing user data on OpenAI's servers. Good for privacy. Bad for performance. Prompt size grows quadratically over a long conversation, and cache misses, triggered by something as small as switching the available tools mid-session, force the model to reprocess everything from scratch. You can almost hear the servers grinding when a developer swaps out an MCP tool halfway through a session and invalidates the entire cache.

Codex handles this by automatically compacting conversations when they hit a token threshold. Earlier versions made developers do this manually with a slash command. Now the system compresses context through a specialized API endpoint that preserves a summarized version of the conversation in an encrypted content item. Claude Code uses a similar approach, Bolin noted. Neither company has solved the underlying problem. The shift notes just get shorter as the conversation gets longer, and the dispatcher has to trust that the summary captured what mattered.

The competitive math

OpenAI is playing the access card, and there's a nervousness to it. Doubled rate limits across Plus, Pro, Business, Enterprise, and Edu plans. Temporary Codex access for Free and Go subscribers, though OpenAI didn't specify those rate limits or how long "limited time" actually means. That's the tell. When a company floods the market with free usage and won't say when the faucet turns off, the goal is adoption speed, not revenue.

The pricing strategy echoes what OpenAI did with ChatGPT's early growth: get developers locked into the workflow, then adjust later. Anthropic already has Claude Code with Opus 4.5 running on macOS, and it got there months ago. Both tools are fast enough at simple tasks to feel magical, brittle enough at complex ones to require a human staring at every diff. Anthropic looks comfortable with its lead. OpenAI looks like it's trying to buy back the gap.

The honest assessment from developers who've used both: scaffolding a project comes together fast. Filling in the details means debugging the agent's mistakes, working around its blind spots, and watching your token budget evaporate on a refactor that a mid-level engineer would have finished before lunch. OpenAI's own Codex team uses the tool to build Codex itself. They're eating their own cooking, but Bolin's technical post was candid about the problems on the stove, prompt bloat, cache fragility, inconsistent MCP tool enumeration.

What the dispatch desk doesn't solve

The Codex app ships with a sandbox by default. Agents can only edit files in their working directory and use cached web search. Anything requiring network access or elevated permissions triggers a permission prompt. You can configure rules to auto-approve certain commands, but the default is cautious. OpenAI open-sources the CLI, which means the sandboxing model is auditable.

What the dispatch desk doesn't address is the deeper problem sitting underneath every coding agent: what happens when the context window fills up and the model starts forgetting things? Compaction helps. Encrypted summaries preserve some understanding. But a seven-million-token session is not the same as seven million tokens of understanding. Information gets lost in compression. The agent that built your racing game at token one million doesn't have the same grasp of the codebase at token six million. It's still filing reports to the dispatch desk. The reports just get thinner.

OpenAI says Windows support is coming, along with faster inference and continued model improvements. The desktop app is macOS-only for now. For developers already running Codex through the CLI or IDE extensions, the app picks up existing session history and configuration. Nothing to migrate.

A million developers used Codex last month. The desktop app gives them a better console to watch the agents work. The agents are still the same agents, filing the same diffs, hitting the same context walls. A nicer dispatch desk doesn't make the workers smarter. But it does make you faster at catching the ones who've gone off-script.

Frequently Asked Questions

Q: What is the Codex desktop app?

A: A macOS application from OpenAI that lets developers manage multiple AI coding agents at once. Agents run in separate threads organized by project, with built-in worktree support so they don't create merge conflicts. It replaces the need to juggle CLI sessions or IDE tabs.

Q: How does Codex compare to Claude Code?

A: Both are AI coding agents running on macOS. Anthropic shipped Claude Code's desktop version first, and OpenAI is catching up with the Codex app. Both tools handle simple tasks well but remain brittle on complex production work. OpenAI is competing on pricing and access rather than features.

Q: What are Codex Skills?

A: Extension packs bundled as folders containing instructions, resources, and scripts. They tell the agent how to interact with external tools like Figma, Linear, or cloud deployment platforms. You can create custom skills and share them across your team by checking them into a repository.

Q: Why did OpenAI double Codex rate limits?

A: The move appears aimed at accelerating adoption against Anthropic's Claude Code. Doubled limits apply to Plus, Pro, Business, Enterprise, and Edu plans. Free and Go users also get temporary access, though OpenAI hasn't disclosed those specific rate limits or the end date.

Q: What is the context window problem Bolin described?

A: Every turn in a Codex session adds the full conversation history to the next API call, causing prompt size to grow quadratically. Cache misses force complete reprocessing. Codex now auto-compacts conversations at a token threshold, but compressed summaries lose information over long sessions.