Nvidia unveiled Nemotron 3 Ultra at its GTC Taipei keynote on Monday and released a benchmark, run with the evaluation firm Artificial Analysis, that places the model first among open-weight systems built in the United States and behind the Chinese-led frontier. The 550-billion-parameter model scores 48 on the Artificial Analysis Intelligence Index, ahead of Google's Gemma 4 31B at 39 and OpenAI's gpt-oss-120b at 33, and six points behind Moonshot's Kimi K2.6 at 54.

That gap is the context for the rest of the announcement. Nemotron 3 Ultra is the centerpiece of a free enterprise Agent Toolkit, and Nvidia is not really competing to build the world's smartest open model, a race Chinese labs are currently winning. It is competing to make Nvidia hardware the default place enterprise AI agents run. The model is free; the speed that sells it, and the runtime and skills bundled around it, are tuned to Nvidia silicon. It is a play the company has run before with its open models.

Key Takeaways

AI-generated summary, reviewed by an editor. More on our AI guidelines.

Where Nemotron 3 Ultra lands on the intelligence index

The model uses a mixture-of-experts design, with roughly 550 billion total parameters but about 55 billion active per token, which keeps its running cost closer to a far smaller system. Artificial Analysis, which partnered with Nvidia on the pre-release evaluation, called it "the most intelligent US open weights model" in the same writeup that ranked it second to China's open frontier.

Where Nemotron does lead is speed. On a pre-release endpoint at the cloud provider DeepInfra it served more than 300 tokens per second, against the 50 to 100 that comparably sized models from DeepSeek and Moonshot manage in the market today, Artificial Analysis said. That is the number Nvidia is selling, more than the intelligence score.

Nvidia's own headline is that Nemotron runs up to five times faster and up to 30% cheaper than open frontier rivals in its class. "Those are Nvidia's own numbers, against rivals Nvidia chose, so treat them as a starting point rather than gospel until independent testing catches up," wrote Abbas Ali at tbreak. Nvidia has disclosed a five-year, $26 billion plan to fund open-weight development, Decrypt reported, and says a next model, Nemotron 4, is already in progress.

What the toolkit wraps around the model

Nvidia paired the model with three other parts of the Agent Toolkit it released Monday. NemoClaw is an open framework for building the orchestration layer that turns a model into an agent. OpenShell is a secure runtime that sets privacy and policy controls. And a set of Nvidia's CUDA-X libraries now exposes itself to agents as reusable "skills."

"NVIDIA NemoClaw provides enterprise software developers with the open building blocks to create more secure, long-running AI coworkers that amplify human expertise as they reshape how work gets done," Jensen Huang, Nvidia's chief executive, said in the announcement.

The building blocks are open source, and so are the skills. The CUDA-X libraries Nvidia exposed to agents, including cuDF for data processing and cuOpt for routing and scheduling, are released under open licenses such as Apache 2.0. What they share is a dependency on Nvidia's CUDA software and its GPUs, and the local devices OpenShell names as deployment targets are Nvidia machines, from RTX Spark laptops to DGX Station GB300 systems, though the runtime also runs on-premises and in the cloud. "Nemotron 3 Ultra shows the company wants developers building agents on its models, not only buying its chips," Startup Fortune wrote of the strategy. The free software gives developers a reason to standardize on tools that run fastest on the chips Nvidia sells.

Know someone who'd find this useful? ✉️ Email it to a friend in one click, or they can subscribe free here.

Cadence, CrowdStrike and Palantir sign on

The toolkit's first named customer is Nvidia itself. Cadence built ChipStack, a fully autonomous chip-verification agent, on its own design software and secured it with Nvidia's OpenShell runtime; Nvidia said it is the first customer using ChipStack to verify its own chip designs, the agent checking the silicon the rest of the stack depends on.

Beyond that, CrowdStrike is running agents on Nemotron models to identify and remediate software vulnerabilities, and Palantir has folded the models into the air-gapped systems its Forward Deployed Engineer platform builds for clients, according to Nvidia. Siemens and Synopsys are using NemoClaw for chip-design workflows. On the factory floor, Foxconn is building a manufacturing agent called MoMClaw on the same stack; Nvidia says Foxconn projects an 80% improvement in root-cause analysis time, a 15% gain in labor productivity and a 10% drop in machine-failure rates, projections rather than audited results.

The investor read is straightforward. Goldman Sachs analyst James Schneider, who attended the keynote, told clients Nvidia is "aggressively investing to drive the adoption of agentic AI across developers and ecosystem partners," and kept a buy rating with a $285 price target.

Nemotron 3 Ultra is expected to reach Hugging Face, OpenRouter and build.nvidia.com on June 4 as an Nvidia NIM microservice; NemoClaw is available now and OpenShell is in early preview. Because the weights are open, enterprises are not bound to a single vendor's API the way a closed model binds them. Whether many run Nemotron off Nvidia's own hardware, rather than staying on the tuned path the toolkit lays down, is what will show whether the open model loosened Nvidia's grip on enterprise AI or tightened it. The company, already at work on Nemotron 4, is building for the second.

Frequently Asked Questions

What is Nvidia's Nemotron 3 Ultra?

A 550-billion-parameter open-weight AI model, with about 55 billion parameters active per token, that Nvidia unveiled at GTC Taipei on June 1, 2026. It scores 48 on the Artificial Analysis Intelligence Index, the highest of any US-built open model, and is designed for long-running enterprise AI agents across coding, research and operations.

How does it compare to Chinese open models?

It trails them on intelligence. Moonshot's Kimi K2.6 scores 54 on the same index, six points ahead. Nemotron's edge is speed: more than 300 tokens per second on a DeepInfra endpoint, versus 50 to 100 for comparable DeepSeek and Moonshot models, according to Artificial Analysis.

What is the Nvidia Agent Toolkit?

A free software stack for building autonomous AI agents. It pairs Nemotron models with NemoClaw, an orchestration framework; the OpenShell secure runtime; and CUDA-X libraries exposed as agent skills. NemoClaw is available now and OpenShell is in early preview.

Which companies are building on it?

Cadence, Siemens and Synopsys for chip design; CrowdStrike for security; Palantir for air-gapped systems; and Foxconn for factory operations. Nvidia is the first customer using Cadence's ChipStack agent to verify its own chip designs.

Is the toolkit truly open?

The model weights and CUDA-X skill libraries carry open licenses such as Apache 2.0. But they depend on Nvidia's CUDA software and GPUs, and OpenShell's named device targets are Nvidia machines, so the openness still routes developers toward Nvidia hardware.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

HybridClaw Pitches German Agent Controls as Hermes Tops 120,000 Stars
HybridClaw is positioning a German-built enterprise agent runtime around controls that the viral Hermes Agent movement still leaves mostly to operators. The open-source package, latest on npm at versi
Meta Plans to Open-Source New AI Models but Will Keep Its Most Powerful Closed
Meta is preparing to release its first AI models developed under chief AI officer Alexandr Wang, with plans to eventually offer open-source versions, Axios reported Monday. Before any public release,
Zuckerberg's Bots Run Meta. Cursor's Bot Ran From Beijing.
San Francisco | Monday, March 23, 2026 Mark Zuckerberg is building an AI agent to help manage Meta. His employees already built their own, and the bots now talk to each other autonomously. One trigge
AI News

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai