Tech Giants Pay Up
Good Morning from San Francisco, Tech executives lined up Thursday to shower Trump with billions in AI pledges. Microsoft promised
DeepSeek promises a sophisticated AI agent by Q4 2025 while admitting hallucinations remain "unavoidable." The tension between ambitious goals and candid limitations reveals the reality of today's agent race.
💡 TL;DR - The 30 Seconds Version
🤖 DeepSeek plans to release an AI agent system in Q4 2025 that can handle multi-step tasks and learn from prior actions.
💰 The company's R1 model disrupted the industry in January, reportedly costing under $6 million to train while matching top US systems.
⚠️ DeepSeek simultaneously published transparency docs admitting AI hallucinations remain "unavoidable," an unusually candid safety disclosure.
🏃 OpenAI, Microsoft, and Anthropic already ship agent features while Chinese rivals like Alibaba push aggressive releases, making DeepSeek's timeline risky.
🎯 DeepSeek's deliberate engineering approach bets that superior capability can beat rapid iteration in the agent market standardization race.
🌍 Success could reshape how enterprises calculate AI value based on task completion and cost rather than just model benchmarks.
A Chinese AI upstart promises a late-year agent. Its own paperwork warns “hallucinations” remain “unavoidable.” That’s the tension. Hangzhou-based DeepSeek is preparing to unveil a multi-step, self-improving agent in the final quarter of 2025, according to a detailed report on DeepSeek’s agent timeline. At the same time, the company published a data-sourcing and safety note acknowledging persistent accuracy limits as Beijing tightens oversight.
Two signals dropped at once. First, the company is pushing an “agentic” successor to the R1 reasoning model that stunned the market in January. The goal: handle multi-step tasks with minimal human guidance and adapt based on prior actions. That’s the agent pitch.
Second, DeepSeek described how it filters training data—removing hate speech, sexual content, violence, spam, and potentially infringing material—and conceded that hallucinations are still unavoidable. The disclosure, published Monday local time, reads like a compliance-forward move under China’s expanding AI rules. It’s a candid note. And unusual.
Bloomberg reports the new agent is slated for the year’s final quarter, with the system positioned as more autonomous than chatbots that simply reply with text. The company’s R1 model became a sensation in January after claims it matched or outperformed top U.S. systems on select benchmarks at a fraction of the cost; Reuters reported DeepSeek said training ran under $6 million. Those figures remain contested by researchers who argue headline training costs undercount data generation, ablations, and reinforcement learning work. Keep that caveat in view.
On safety, the South China Morning Post summarized DeepSeek’s disclosure: public-web and authorized third-party data; automated and human-reviewed filters; bias mitigation steps; and a plain-spoken statement that hallucinations can’t be eliminated yet. That last line matters, because agent systems amplify error if they string flawed steps together. A wrong click can cascade.
Agents are the next skirmish line. OpenAI introduced a ChatGPT agent in July that can browse, log in, and complete tasks across tools; Microsoft in May rolled out multi-agent orchestration for Copilot and an expanded Agents SDK; Anthropic has been publishing agent playbooks and shipping enterprise-grade “computer use” features. None of these efforts are magic. All are getting more practical.
China’s platform players are pushing too—fast. Alibaba’s Qwen team has been aggressive across agent frameworks and GUI automation research; Tencent and others are embedding agent capabilities into services people already use. Manus, a Chinese-founded startup now based in Singapore, drew global buzz—and policy scrutiny—by marketing a “general AI agent” that handles complex workflows. This is not a sideline anymore. It’s a feature race.
DeepSeek, by contrast, has moved deliberately since R1. Local media framed the slow R2 timeline as founder Liang Wenfeng’s perfectionism; others cite ordinary engineering delays. Betting on a single, more capable agent instead of churning interim releases is a high-variance strategy. If it lands, it lands big. If not, the market moves on.
Start with workflows. A competent agent that reliably plans, clicks, writes, and revises could consolidate five apps into one automated run. Vacation research, expense audits, vendor sourcing, even basic IT tickets—chunks of knowledge work become recipe calls. Latency, tool-use accuracy, and permissioning will decide how far it goes in real deployments.
Then pricing. DeepSeek’s rise challenged the “only billions win” thesis on training budgets. If the company delivers an agent with strong autonomy at competitive cost, it pressures enterprise buyers to renegotiate value: it’s not just model quality; it’s end-to-end task completion per dollar and minute. That would ripple through cloud margins, software bundles, and the way vendors pitch “AI inside.”
Finally, geopolitics. Export controls constrain China’s access to top GPUs, nudging companies toward efficiency, not brute-force scale. A credible DeepSeek agent would be read in Washington and Brussels as proof that controls slow, but don’t stop, capability diffusion. Expect louder calls for guardrails on how agents authenticate, transact, and audit their own actions.
Agents break in boring ways. Tool calls fail, session state gets lost, and subtle misreadings compound across steps. Even strong systems still need “adult supervision” for sensitive tasks. DeepSeek’s own safety note acknowledges hallucinations remain unsolved; that’s an honest baseline and a reminder to watch for containment features: sandboxing, step-level logs, replayable traces, and permission prompts. One more risk: arriving late to a crowded market with superior tech but no distribution. Adoption hinges on integrations and trust as much as benchmark wins. Speed still matters.
Why this matters
Q: What exactly makes an AI "agent" different from regular chatbots like ChatGPT?
A: Agents can take actions across multiple steps—logging into accounts, clicking buttons, filling forms, and completing entire workflows. Instead of just answering questions with text, they execute tasks like booking travel or processing expense reports with minimal human guidance.
Q: Why are researchers questioning DeepSeek's claimed $6 million training cost for R1?
A: The $6 million figure likely undercounts data preparation, failed experiments, and reinforcement learning iterations that happen before the final training run. Industry experts estimate true development costs for comparable models run $50-200 million when including all research phases.
Q: What do AI "hallucinations" actually look like in real use?
A: AI systems confidently state false information—claiming historical events that never happened, citing nonexistent research papers, or providing wrong calculations. In agents, this becomes dangerous because a hallucinated API call or wrong button click can cascade into serious errors.
Q: How far behind is DeepSeek compared to OpenAI and Microsoft's agent capabilities?
A: OpenAI launched ChatGPT agents in July 2024, Microsoft shipped Copilot agents in May. DeepSeek's Q4 2025 timeline puts them roughly 12-18 months behind in market deployment, though they claim their eventual system will be more autonomous and capable.
Q: What specific AI regulations is Beijing implementing that prompted DeepSeek's safety disclosure?
A: China's new AI regulations require companies to disclose training data sources, safety measures, and known limitations before public deployment. Companies must also report model capabilities to authorities and implement content filtering for Chinese users, driving more transparent safety documentation.
Get the 5-minute Silicon Valley AI briefing, every weekday morning — free.