Cloudflare's 2025 data shows Googlebot ingests more content than all other AI bots combined. Publishers who want to block AI training face an impossible choice: lose search visibility entirely. The structural advantage runs deeper than most coverage acknowledged.
The administration cut 317,000 federal workers. Now it wants 1,000 tech recruits from Palantir, Amazon, and Microsoft—who keep their stock while shaping government AI. The math is interesting. So are the conflicts of interest.
Nvidia released open AI models the same week reports surfaced that Meta is abandoning open source. Coincidence? The real story: Nvidia's best customers are building their own chips. Open models create dependencies that survive hardware defection.
Chinese researchers ship agent that finds tools mid-thought, beating OpenAI's rigid workflows
Chinese researchers abandon AI's rigid think-act-observe loops for fluid reasoning that discovers tools mid-thought. DeepAgent hits 89% success where competitors reach 55%, revealing the bottleneck was never intelligence but architectural rigidity.
Chinese researchers from Renmin University and Xiaohongshu released DeepAgent, an AI system that discovers and uses tools dynamically within its reasoning stream. The approach abandons predetermined workflows entirely. Instead of cycling through "think, act, observe" loops like current agents, DeepAgent treats tool discovery as part of thinking itself, achieving 89% success rates on complex tasks where the best competing models hit 55%.
What's actually new
Previous agents followed scripts: plan the task, select tools, execute actions, observe results, repeat. Even OpenAI's latest research agents work with a fixed toolkit of search, code, and browsing. DeepAgent discovers tools as it reasons. Traditional agents are GPS navigation following predetermined routes. DeepAgent navigates like a local who spots shortcuts mid-journey.
The system can tap 16,000+ real-world APIs without pre-selection. When stuck, it compresses its entire interaction history into structured memory and starts fresh, preventing the death spirals that trap conventional agents in wrong exploration paths.
The Breakdown
• DeepAgent discovers and uses tools dynamically within reasoning, achieving 89% success versus 55% for best competitors
• Memory folding lets agents compress history and pivot when stuck, preventing death spirals in exploration paths
• System taps 16,000+ real APIs without pre-selection while OpenAI limits agents to search, code, and browsing
• Chinese labs now lead autonomous tool use through architectural innovation rather than model size increases
The architectural revolution
From Beijing's perspective: finally, agent architecture that matches how reasoning models actually think. Western labs built scaffolding around reasoning models, forcing them into rigid cycles. Chinese researchers let reasoning flow naturally.
OpenAI constrains its agents to research tools. Search, browse, code. That's it. DeepAgent pulls from movie databases, music players, robotic environments, shopping APIs, whatever the task demands. The performance gap shows: on Spotify integration tasks, DeepAgent hits 75% success versus 52% for the strongest baseline.
Enterprises get agents that actually work. Previous systems required pre-configuring tool access, defining workflows, managing state transitions. DeepAgent discovers what it needs. A film festival query triggers Vimeo search, identifies cinema professionals, constructs YouTube links, all within one coherent thought stream.
Memory as infrastructure
The memory folding mechanism solves a fundamental constraint. Long interactions explode context windows. Agents accumulate interaction history until they can't process new information. Most systems just truncate old data. Information vanishes.
DeepAgent compresses history into three memories: episodic (long-term progress), working (current state), tool (usage patterns). JSON schemas, not prose. Structure preserves precision. When an exploration path fails, the agent folds its memory and pivots. It's controlled forgetting that enables progress.
The training innovation matters. Real APIs break, rate-limit, cost money. DeepAgent simulates them with auxiliary LLMs during training. This trades perfect fidelity for stable, fast, cheap iteration. ToolPO, their reinforcement learning method, assigns credit to specific tool-call tokens rather than whole sequences. Precise feedback accelerates learning.
The scalability test
Three signals determine whether this architecture wins. First, whether memory folding scales beyond 50-action sequences to hundred-step workflows. Current tests stop at 50. Real enterprise processes run longer.
Second, API simulation fidelity. Training on LLM-simulated APIs works until it doesn't. Edge cases, authentication flows, complex state dependencies, these emerge in production. The 89% success rate assumes well-behaved APIs.
Third, tool discovery latency. Searching 16,000 tools mid-thought adds overhead. The paper doesn't report timing. Fast enough for research, but production systems measure milliseconds.
DeepAgent reveals the real bottleneck wasn't reasoning capability but architectural rigidity. Reasoning models can handle complex tool orchestration when freed from predetermined workflows. The constraint was the scaffold, not the model.
Why this matters:
Agent architecture shifts from scripted workflows to fluid reasoning. Performance gaps prove flexibility beats structure.
Chinese labs now lead autonomous tool use, integrating thousands of APIs while Western approaches stay limited.
❓ Frequently Asked Questions
Q: How does memory folding actually work when DeepAgent gets stuck?
A: DeepAgent compresses its entire interaction history into three JSON-structured memories: episodic (overall progress), working (current state), and tool (usage patterns). This compression happens after 50 actions or when exploration fails. The agent then restarts with this condensed memory, avoiding previous dead ends while preserving critical information.
Q: What's ToolPO and why does training with fake APIs work?
A: ToolPO (Tool Policy Optimization) uses LLMs to simulate real APIs during training instead of calling actual services. This cuts training costs by 90%+ and prevents rate-limiting issues. The method assigns rewards to specific tool-call tokens rather than entire sequences, making learning 3-5x faster than standard reinforcement learning approaches.
Q: Which specific models did DeepAgent beat and by how much?
A: On ToolBench tasks, DeepAgent (69% success) beat GPT-4o with ReAct (52%), QwQ-32B with CodeAct (54%), and DeepSeek-R1 (57%). On complex GAIA benchmarks, DeepAgent scored 53.3% versus WebThinker's 37% and HiRA's 42.5%. The gaps widen further on music/movie database tasks.
Q: When can developers actually use DeepAgent?
A: The code and demo are available now at github.com/RUC-NLPIR/DeepAgent. It runs on QwQ-32B as the base model with Qwen2.5-32B handling auxiliary tasks. You'll need 64 NVIDIA H20-141GB GPUs for full training, but inference works on smaller setups. The paper was released October 24, 2024.
Q: What real-world tasks does DeepAgent handle better than alternatives?
A: DeepAgent excels at multi-tool coordination: film festival planning (searching Vimeo, finding speakers, linking YouTube), e-commerce workflows (91.8% success on WebShop vs 65.7% for competitors), and embodied AI tasks. It's strongest when tasks need 3-7 sequential API calls across different services, where rigid agents typically fail after 2-3 steps.
Tech journalist. Lives in Marin County, north of San Francisco. Got his start writing for his high school newspaper. When not covering tech trends, he's swimming laps, gaming on PS4, or vibe coding through the night.
Cloudflare's 2025 data shows Googlebot ingests more content than all other AI bots combined. Publishers who want to block AI training face an impossible choice: lose search visibility entirely. The structural advantage runs deeper than most coverage acknowledged.
Stanford's AI hacker cost $18/hour and beat 9 of 10 human pentesters. The headlines celebrated a breakthrough. The research paper reveals an AI that couldn't click buttons, mistook login failures for success, and required constant human oversight.
Microsoft analyzed 37.5M Copilot conversations. Health queries dominated mobile usage every hour of every day. Programming's share collapsed. The data shows users want a confidant, not a productivity tool. The industry built for the boardroom anyway.
64% of teens use AI chatbots. But which ones? Higher-income teens cluster around ChatGPT for productivity. Lower-income teens are twice as likely to use Character.ai—the companion bot facing wrongful death lawsuits. The technology is sorting kids by class.