Nvidia unveiled Nemotron 3 Ultra at its GTC Taipei keynote on Monday and released a benchmark, run with the evaluation firm Artificial Analysis, that places the model first among open-weight systems built in the United States and behind the Chinese-led frontier. The 550-billion-parameter model scores 48 on the Artificial Analysis Intelligence Index, ahead of Google's Gemma 4 31B at 39 and OpenAI's gpt-oss-120b at 33, and six points behind Moonshot's Kimi K2.6 at 54.
That gap is the context for the rest of the announcement. Nemotron 3 Ultra is the centerpiece of a free enterprise Agent Toolkit, and Nvidia is not really competing to build the world's smartest open model, a race Chinese labs are currently winning. It is competing to make Nvidia hardware the default place enterprise AI agents run. The model is free; the speed that sells it, and the runtime and skills bundled around it, are tuned to Nvidia silicon. It is a play the company has run before with its open models.
Key Takeaways
- Nvidia's Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index, the top US open model but behind China's Kimi K2.6 at 54.
- The model anchors a free Agent Toolkit (NemoClaw, OpenShell, CUDA-X skills) built so enterprise agents run fastest on Nvidia hardware.
- Cadence, CrowdStrike, Palantir, Siemens and Foxconn signed on; Nvidia is the first customer using Cadence's ChipStack agent to verify its own chips.
- The skills carry open licenses but stay CUDA- and GPU-bound; Nemotron 3 Ultra is expected June 4, with NemoClaw available now.
AI-generated summary, reviewed by an editor. More on our AI guidelines.
Where Nemotron 3 Ultra lands on the intelligence index
The model uses a mixture-of-experts design, with roughly 550 billion total parameters but about 55 billion active per token, which keeps its running cost closer to a far smaller system. Artificial Analysis, which partnered with Nvidia on the pre-release evaluation, called it "the most intelligent US open weights model" in the same writeup that ranked it second to China's open frontier.
Where Nemotron does lead is speed. On a pre-release endpoint at the cloud provider DeepInfra it served more than 300 tokens per second, against the 50 to 100 that comparably sized models from DeepSeek and Moonshot manage in the market today, Artificial Analysis said. That is the number Nvidia is selling, more than the intelligence score.
Nvidia's own headline is that Nemotron runs up to five times faster and up to 30% cheaper than open frontier rivals in its class. "Those are Nvidia's own numbers, against rivals Nvidia chose, so treat them as a starting point rather than gospel until independent testing catches up," wrote Abbas Ali at tbreak. Nvidia has disclosed a five-year, $26 billion plan to fund open-weight development, Decrypt reported, and says a next model, Nemotron 4, is already in progress.
What the toolkit wraps around the model
Nvidia paired the model with three other parts of the Agent Toolkit it released Monday. NemoClaw is an open framework for building the orchestration layer that turns a model into an agent. OpenShell is a secure runtime that sets privacy and policy controls. And a set of Nvidia's CUDA-X libraries now exposes itself to agents as reusable "skills."
"NVIDIA NemoClaw provides enterprise software developers with the open building blocks to create more secure, long-running AI coworkers that amplify human expertise as they reshape how work gets done," Jensen Huang, Nvidia's chief executive, said in the announcement.
AI moves fast. We make it make sense.
Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.
No spam. Unsubscribe anytime.
The building blocks are open source, and so are the skills. The CUDA-X libraries Nvidia exposed to agents, including cuDF for data processing and cuOpt for routing and scheduling, are released under open licenses such as Apache 2.0. What they share is a dependency on Nvidia's CUDA software and its GPUs, and the local devices OpenShell names as deployment targets are Nvidia machines, from RTX Spark laptops to DGX Station GB300 systems, though the runtime also runs on-premises and in the cloud. "Nemotron 3 Ultra shows the company wants developers building agents on its models, not only buying its chips," Startup Fortune wrote of the strategy. The free software gives developers a reason to standardize on tools that run fastest on the chips Nvidia sells.
Know someone who'd find this useful? ✉️ Email it to a friend in one click, or they can subscribe free here.
Cadence, CrowdStrike and Palantir sign on
The toolkit's first named customer is Nvidia itself. Cadence built ChipStack, a fully autonomous chip-verification agent, on its own design software and secured it with Nvidia's OpenShell runtime; Nvidia said it is the first customer using ChipStack to verify its own chip designs, the agent checking the silicon the rest of the stack depends on.
Beyond that, CrowdStrike is running agents on Nemotron models to identify and remediate software vulnerabilities, and Palantir has folded the models into the air-gapped systems its Forward Deployed Engineer platform builds for clients, according to Nvidia. Siemens and Synopsys are using NemoClaw for chip-design workflows. On the factory floor, Foxconn is building a manufacturing agent called MoMClaw on the same stack; Nvidia says Foxconn projects an 80% improvement in root-cause analysis time, a 15% gain in labor productivity and a 10% drop in machine-failure rates, projections rather than audited results.
The investor read is straightforward. Goldman Sachs analyst James Schneider, who attended the keynote, told clients Nvidia is "aggressively investing to drive the adoption of agentic AI across developers and ecosystem partners," and kept a buy rating with a $285 price target.
Nemotron 3 Ultra is expected to reach Hugging Face, OpenRouter and build.nvidia.com on June 4 as an Nvidia NIM microservice; NemoClaw is available now and OpenShell is in early preview. Because the weights are open, enterprises are not bound to a single vendor's API the way a closed model binds them. Whether many run Nemotron off Nvidia's own hardware, rather than staying on the tuned path the toolkit lays down, is what will show whether the open model loosened Nvidia's grip on enterprise AI or tightened it. The company, already at work on Nemotron 4, is building for the second.
Frequently Asked Questions
What is Nvidia's Nemotron 3 Ultra?
A 550-billion-parameter open-weight AI model, with about 55 billion parameters active per token, that Nvidia unveiled at GTC Taipei on June 1, 2026. It scores 48 on the Artificial Analysis Intelligence Index, the highest of any US-built open model, and is designed for long-running enterprise AI agents across coding, research and operations.
How does it compare to Chinese open models?
It trails them on intelligence. Moonshot's Kimi K2.6 scores 54 on the same index, six points ahead. Nemotron's edge is speed: more than 300 tokens per second on a DeepInfra endpoint, versus 50 to 100 for comparable DeepSeek and Moonshot models, according to Artificial Analysis.
What is the Nvidia Agent Toolkit?
A free software stack for building autonomous AI agents. It pairs Nemotron models with NemoClaw, an orchestration framework; the OpenShell secure runtime; and CUDA-X libraries exposed as agent skills. NemoClaw is available now and OpenShell is in early preview.
Which companies are building on it?
Cadence, Siemens and Synopsys for chip design; CrowdStrike for security; Palantir for air-gapped systems; and Foxconn for factory operations. Nvidia is the first customer using Cadence's ChipStack agent to verify its own chip designs.
Is the toolkit truly open?
The model weights and CUDA-X skill libraries carry open licenses such as Apache 2.0. But they depend on Nvidia's CUDA software and GPUs, and OpenShell's named device targets are Nvidia machines, so the openness still routes developers toward Nvidia hardware.
AI-generated summary, reviewed by an editor. More on our AI guidelines.



IMPLICATOR