Five Replace Fifty. Chips Stay Home.
Good Morning from San Francisco, Coca-Cola swapped fifty crew members for five AI specialists. Production time collapsed from a year
OpenAI's new ChatGPT Agent can control your entire computer to handle tasks like calendar management and research reports. It's the first AI that goes beyond chatbots to actual task automation, but it takes 15-30 minutes per task.
💡 TL;DR - The 30 Seconds Version
👉 OpenAI launched ChatGPT Agent Thursday, the first AI that can control your entire computer to handle tasks like calendar management and research reports.
📊 The agent scored 41.6% on Humanity's Last Exam, double what OpenAI's o3 and o4-mini achieved, and hit 27.4% on FrontierMath versus 6.3% previously.
🏭 Tasks take 15-30 minutes to complete, but the agent asks permission before "irreversible" actions like sending emails or making bookings.
🛡️ OpenAI activated safeguards for "high biological and chemical capabilities" and restricts financial transactions with Watch Mode for financial sites.
🌍 The tool combines Operator and Deep Research capabilities, rolling out to Pro, Plus, and Team subscribers with Enterprise coming this summer.
🚀 This represents the first time an AI can actually control computers instead of just browsing the web, moving beyond chatbots to real task automation.
OpenAI just dropped its most ambitious tool yet. ChatGPT Agent launched Thursday, and it's not content to just answer questions anymore. This thing wants to control your entire computer.
The new agent can navigate your calendar, book restaurants, generate slide decks, and write research reports. It handles tasks while you work on other things. But here's the catch - it might take 15 to 30 minutes to complete tasks that would normally take you hours.
"Even if it takes 15 minutes, half an hour, it's quite a big speed-up compared to how long it would take you to do it," says Isa Fulford, research lead on ChatGPT Agent. Translation: grab a snack while your AI does the heavy lifting.
This isn't your typical chatbot upgrade. ChatGPT Agent combines OpenAI's previous tools - Operator and Deep Research - into something that can actually manipulate software. Where Operator could only click around websites, this new agent has access to terminals, APIs, and a full computer environment.
The model behind it has no fancy name. OpenAI just calls it "the model behind ChatGPT Agent." Marketing department must've been out sick that day.
During demos, OpenAI showed the agent planning date nights by cross-referencing Google Calendar with OpenTable. It generated research reports comparing Labubu toys to Beanie Babies. One executive started using it to request office parking every Thursday - apparently forgetting to do this manually was becoming a weekly problem.
OpenAI isn't naive about the risks. The company activated safeguards typically reserved for models with "high biological and chemical capabilities." They don't think ChatGPT Agent can help novices build weapons, but they're not taking chances.
The agent asks permission before doing anything "irreversible" like sending emails or making bookings. Financial transactions are restricted "for now." There's also something called Watch Mode - if you navigate to financial sites, you can't leave the tab or the agent stops working. It's essentially a safety switch that kicks in when you visit financial sites.
ChatGPT Agent scored 41.6% on Humanity's Last Exam, a brutal test covering over 100 subjects. That's double what OpenAI's o3 and o4-mini managed. On FrontierMath, one of the hardest math benchmarks around, it hit 27.4% with tool access. The previous best score was just 6.3%.
These aren't perfect scores, but they're substantial improvements. The agent isn't replacing human intelligence - it's just getting better at specific tasks.
OpenAI isn't alone in this space. Anthropic released Computer Use last October. Google, Meta, and Amazon executives won't shut up about their agent strategies on earnings calls. Everyone wants to build the next JARVIS.
The trend started gaining steam after Klarna announced its AI agent handled two-thirds of customer service chats in one month - equivalent to 700 full-time workers. That got everyone's attention.
But here's the thing about AI agents: they've been promising the moon for years while delivering incremental improvements. Most still struggle with complex tasks and feel more like tech demos than useful products.
ChatGPT Agent is available now to Pro, Plus, and Team subscribers. Just select "agent mode" in the tools menu or type "/agent" to access it. Enterprise and Education users get it later this summer. Europe and Switzerland will have to wait - no timeline yet.
The tool works best when you can walk away from it. Kumar and Fulford emphasize this isn't meant for real-time interaction. Set it up, let it work, come back to results.
Don't expect lightning speed. The agent prioritizes thoroughness over velocity. For online shopping, Fulford found it more thorough than using Operator alone, but that thoroughness comes with wait times.
The team of 20 to 35 people who built this thing focused on "optimizing for hard tasks" rather than quick responses. They're betting users will accept slower performance for better results.
Why this matters:
Q: How is this different from OpenAI's previous Operator tool?
A: Operator could only click around websites. ChatGPT Agent has access to an entire computer environment including terminals, APIs, and can connect to apps like Gmail and GitHub. It combines Operator's web browsing with Deep Research's analysis capabilities.
Q: What exactly are "irreversible actions" that require permission?
A: These include sending emails, making bookings, or any action that can't be undone. The agent will ask you to approve before executing these tasks. Financial transactions are completely restricted for now.
Q: How does Watch Mode work for financial sites?
A: If you navigate to financial websites, you must stay on that tab while the agent works. If you click away or switch tabs, the agent stops working entirely. It's an extra safety layer for sensitive sites.
Q: Why does it take 15-30 minutes to complete tasks?
A: The team optimized for handling complex tasks rather than speed. They want thoroughness over quick responses. The idea is you start a task, walk away, and come back to completed work.
Q: What do those benchmark scores actually mean?
A: Humanity's Last Exam covers over 100 subjects from math to literature. The 41.6% score means it answered correctly on about 4 out of 10 questions. FrontierMath is one of the hardest math tests available.
Q: How big is the team that built this?
A: OpenAI combined the Operator and Deep Research teams into one group of 20-35 people across product and research roles. They unified the teams specifically to create this more capable agent.
Q: When will this be available in Europe?
A: OpenAI hasn't announced a timeline for the European Economic Area and Switzerland. Enterprise and Education users in available regions get access later this summer.
Q: How does this compare to Anthropic's Computer Use tool?
A: Anthropic launched Computer Use in October 2024 with similar computer control capabilities. ChatGPT Agent adds terminal access, API connections, and combines web browsing with research analysis in one tool.
Get the 5-minute Silicon Valley AI briefing, every weekday morning — free.