OpenAI’s ChatGPT Agent Signals a New Era of Autonomous AI Work Tools

💡 TL;DR - The 30 Seconds Version

👉 OpenAI launched ChatGPT Agent Thursday, the first AI that can control your entire computer to handle tasks like calendar management and research reports.

📊 The agent scored 41.6% on Humanity's Last Exam, double what OpenAI's o3 and o4-mini achieved, and hit 27.4% on FrontierMath versus 6.3% previously.

🏭 Tasks take 15-30 minutes to complete, but the agent asks permission before "irreversible" actions like sending emails or making bookings.

🛡️ OpenAI activated safeguards for "high biological and chemical capabilities" and restricts financial transactions with Watch Mode for financial sites.

🌍 The tool combines Operator and Deep Research capabilities, rolling out to Pro, Plus, and Team subscribers with Enterprise coming this summer.

🚀 This represents the first time an AI can actually control computers instead of just browsing the web, moving beyond chatbots to real task automation.

OpenAI just dropped its most ambitious tool yet. ChatGPT Agent launched Thursday, and it's not content to just answer questions anymore. This thing wants to control your entire computer.

The new agent can navigate your calendar, book restaurants, generate slide decks, and write research reports. It handles tasks while you work on other things. But here's the catch - it might take 15 to 30 minutes to complete tasks that would normally take you hours.

"Even if it takes 15 minutes, half an hour, it's quite a big speed-up compared to how long it would take you to do it," says Isa Fulford, research lead on ChatGPT Agent. Translation: grab a snack while your AI does the heavy lifting.

More Than Just Web Browsing

This isn't your typical chatbot upgrade. ChatGPT Agent combines OpenAI's previous tools - Operator and Deep Research - into something that can actually manipulate software. Where Operator could only click around websites, this new agent has access to terminals, APIs, and a full computer environment.

The model behind it has no fancy name. OpenAI just calls it "the model behind ChatGPT Agent." Marketing department must've been out sick that day.

During demos, OpenAI showed the agent planning date nights by cross-referencing Google Calendar with OpenTable. It generated research reports comparing Labubu toys to Beanie Babies. One executive started using it to request office parking every Thursday - apparently forgetting to do this manually was becoming a weekly problem.

Safety First (Because AI Controlling Computers Sounds Terrifying)

OpenAI isn't naive about the risks. The company activated safeguards typically reserved for models with "high biological and chemical capabilities." They don't think ChatGPT Agent can help novices build weapons, but they're not taking chances.

The agent asks permission before doing anything "irreversible" like sending emails or making bookings. Financial transactions are restricted "for now." There's also something called Watch Mode - if you navigate to financial sites, you can't leave the tab or the agent stops working. It's essentially a safety switch that kicks in when you visit financial sites.

Performance Jumps Are Real

ChatGPT Agent scored 41.6% on Humanity's Last Exam, a brutal test covering over 100 subjects. That's double what OpenAI's o3 and o4-mini managed. On FrontierMath, one of the hardest math benchmarks around, it hit 27.4% with tool access. The previous best score was just 6.3%.

These aren't perfect scores, but they're substantial improvements. The agent isn't replacing human intelligence - it's just getting better at specific tasks.

The AI Agent Arms Race

OpenAI isn't alone in this space. Anthropic released Computer Use last October. Google, Meta, and Amazon executives won't shut up about their agent strategies on earnings calls. Everyone wants to build the next JARVIS.

The trend started gaining steam after Klarna announced its AI agent handled two-thirds of customer service chats in one month - equivalent to 700 full-time workers. That got everyone's attention.

But here's the thing about AI agents: they've been promising the moon for years while delivering incremental improvements. Most still struggle with complex tasks and feel more like tech demos than useful products.

Rolling Out to the Masses

ChatGPT Agent is available now to Pro, Plus, and Team subscribers. Just select "agent mode" in the tools menu or type "/agent" to access it. Enterprise and Education users get it later this summer. Europe and Switzerland will have to wait - no timeline yet.

The tool works best when you can walk away from it. Kumar and Fulford emphasize this isn't meant for real-time interaction. Set it up, let it work, come back to results.

Still Has Growing Pains

Don't expect lightning speed. The agent prioritizes thoroughness over velocity. For online shopping, Fulford found it more thorough than using Operator alone, but that thoroughness comes with wait times.

The team of 20 to 35 people who built this thing focused on "optimizing for hard tasks" rather than quick responses. They're betting users will accept slower performance for better results.

Why this matters:

We've had AI agents that could browse websites for a while now. ChatGPT Agent is the first one that can actually control your computer - clicking buttons, running programs, managing files. That's a big jump from what came before.
The benchmark scores aren't just numbers. When an AI doubles its performance on complex tests, it usually means the underlying capabilities took a real step forward. We're seeing AI move from "sounds smart in conversation" to "can handle actual work tasks."

❓ Frequently Asked Questions

Q: How is this different from OpenAI's previous Operator tool?

A: Operator could only click around websites. ChatGPT Agent has access to an entire computer environment including terminals, APIs, and can connect to apps like Gmail and GitHub. It combines Operator's web browsing with Deep Research's analysis capabilities.

Q: What exactly are "irreversible actions" that require permission?

A: These include sending emails, making bookings, or any action that can't be undone. The agent will ask you to approve before executing these tasks. Financial transactions are completely restricted for now.

Q: How does Watch Mode work for financial sites?

A: If you navigate to financial websites, you must stay on that tab while the agent works. If you click away or switch tabs, the agent stops working entirely. It's an extra safety layer for sensitive sites.

Q: Why does it take 15-30 minutes to complete tasks?

A: The team optimized for handling complex tasks rather than speed. They want thoroughness over quick responses. The idea is you start a task, walk away, and come back to completed work.

Q: What do those benchmark scores actually mean?

A: Humanity's Last Exam covers over 100 subjects from math to literature. The 41.6% score means it answered correctly on about 4 out of 10 questions. FrontierMath is one of the hardest math tests available.

Q: How big is the team that built this?

A: OpenAI combined the Operator and Deep Research teams into one group of 20-35 people across product and research roles. They unified the teams specifically to create this more capable agent.

Q: When will this be available in Europe?

A: OpenAI hasn't announced a timeline for the European Economic Area and Switzerland. Enterprise and Education users in available regions get access later this summer.

Q: How does this compare to Anthropic's Computer Use tool?

A: Anthropic launched Computer Use in October 2024 with similar computer control capabilities. ChatGPT Agent adds terminal access, API connections, and combines web browsing with research analysis in one tool.

OpenAI’s ChatGPT Agent Signals a New Era of Autonomous AI Work Tools

More Than Just Web Browsing

Safety First (Because AI Controlling Computers Sounds Terrifying)

Performance Jumps Are Real

The AI Agent Arms Race

Rolling Out to the Masses

Still Has Growing Pains

❓ Frequently Asked Questions

Robert Brown

Read next

OpenAI’s ChatGPT Agent Signals a New Era of Autonomous AI Work Tools

More Than Just Web Browsing

Safety First (Because AI Controlling Computers Sounds Terrifying)

Performance Jumps Are Real

The AI Agent Arms Race

Rolling Out to the Masses

Still Has Growing Pains

❓ Frequently Asked Questions

Robert Brown

Read next

EU Finds TikTok's Infinite Scroll and Algorithm Violate Digital Safety Law

GPT-5.3-Codex Beats Claude on Paper. Your Workflow Doesn't Care.

Anthropic's Safety Obsession Built a Shipping Machine. New Opus 4.6 Proves It.

Anthropic Opus 4.6 Tops GPT-5.2 on Knowledge Work Benchmarks by Wide Margin