Nvidia Commits $26 Billion to Open-Weight AI, Ships Nemotron 3 Super

Nvidia disclosed plans to spend $26 billion over five years building open-weight AI models, according to a 2025 financial filing confirmed by company executives in interviews with WIRED. The investment, buried in an SEC document until now, positions the chipmaker to compete directly with OpenAI, Anthropic, and a fast-growing field of Chinese AI labs that have dominated the open-model space. Alongside the disclosure, Nvidia released Nemotron 3 Super, a 120-billion-parameter model with 12 billion active parameters, built specifically for multi-agent AI systems.

The move escalates a strategy Nvidia signaled in December when it announced the Nemotron 3 family and shipped the smaller Nano variant. Super and Ultra were teased as coming attractions. Now $26 billion is earmarked, Super is live on Hugging Face, and the company that built its fortune selling GPUs to AI labs is building its own models on those same GPUs.

"Nvidia is taking open model development much more seriously," Bryan Catanzaro, VP of applied deep learning research, told WIRED. "And we are making a lot of progress."

The Breakdown

Nvidia disclosed $26 billion in planned open-weight AI spending over five years, buried in a 2025 SEC filing
Nemotron 3 Super activates 12B of 120B parameters, targeting multi-agent systems with a 1-million-token context window
Independent tests score Super at 36 on Intelligence Index, behind Qwen3.5 (42) but 11% faster per GPU
Weights, training data, and recipes published openly, but license includes safety guardrail and patent litigation clauses

A model built for the agent economy

Nemotron 3 Super exists to solve a specific engineering problem. Multi-agent AI systems, the kind that chain tool calls, code execution, and long reasoning sequences across dozens of steps, generate up to 15 times more tokens than a standard chat interaction. That volume creates two failure modes. Context explosion, where agents lose track of their original objective after thousands of turns of accumulated history. And what Nvidia calls the "thinking tax," the cost of running a full-size reasoning model on every subtask in a pipeline.

Super attacks both. A 1-million-token context window lets agents hold entire codebases or document stacks in memory without trimming and regenerating context. And a mixture-of-experts architecture activates only 12 billion of 120 billion total parameters per pass, cutting inference cost while preserving the reasoning depth of a much larger model.

The architecture is a triple hybrid. Mamba-2 state-space layers handle the bulk of sequence processing at linear-time complexity, keeping memory manageable at million-token scales. Transformer attention layers, interleaved at key depths, provide the precise associative recall that pure state-space models struggle with, the ability to find one specific fact buried in a million tokens of noise. And a novel Latent MoE routing system compresses token embeddings into a low-rank space before sending them to specialist experts, letting the model consult four times as many experts at the same computational cost.

Nvidia also baked in multi-token prediction. Most language models forecast one token per pass. Super predicts several at once, a trick that amounts to built-in speculative decoding. For structured output like code and tool calls, that means up to 3x faster wall-clock speed. No separate draft model required.

On Blackwell GPUs, the model runs in NVFP4, a 4-bit floating-point format Super was pretrained in natively. Most quantized models start at full precision and get compressed after training, which introduces accuracy loss. Super learned to work within 4-bit constraints from the first gradient update. Nvidia claims that cuts memory requirements and pushes inference four times faster than FP8 on the previous Hopper platform, with no measured accuracy loss.

The training pipeline behind all of this was substantial. The pretraining corpus ran to 25 trillion tokens. Ten trillion were unique, the rest repeated for emphasis. Fine-tuning drew on 7 million samples from a 40-million-sample pool spanning reasoning, coding, and multi-step agent work. Reinforcement learning came last, 21 environments and 1.2 million rollouts. The objective was concrete: chain tool calls, write working code, produce plans that pass verifiable checks. Not a chatbot training regimen. An agent training regimen.

The benchmarks, and what they show

Independent testing from Artificial Analysis scored Nemotron 3 Super at 36 on its Intelligence Index. That places it ahead of OpenAI's gpt-oss-120b at 33 but behind Alibaba's Qwen3.5 122B at 42. In throughput tests on eight B200 GPUs with 50,000-token input workloads, Super delivered 11% higher throughput per GPU than gpt-oss in NVFP4 precision.

The tradeoff is clear. Super is not the smartest open model available today. Qwen3.5 scores six points higher. But Qwen achieves that at 40% lower throughput per GPU, according to Artificial Analysis. For enterprises running hundreds of concurrent agents, that efficiency gap matters more than raw benchmark leads.

On agentic tasks, the results look stronger. Super scored 85.6% on PinchBench, a benchmark measuring model performance as the brain of an OpenClaw agent, the top score among open models. It powers Nvidia's AI-Q research agent to the number one position on both DeepResearch Bench and DeepResearch Bench II.

Code review firm Greptile tested an early checkpoint against real pull requests and came away impressed. The model reviewed a 19-file, 134KB diff in 12.5 seconds with just two tool calls. It caught a CORS regression that could have slipped through a cleanup-heavy refactor, flagged a type mismatch in a refresh flag, and identified a negative-duration edge case. "Punches far above its weight class," the company wrote. For a model activating barely a tenth of the parameters frontier models use, that is a telling result.

But the benchmarks carry a caveat worth noting. Nemotron 3 Super consumed 110 million output tokens to complete the Artificial Analysis evaluation suite, roughly 40% more than gpt-oss-120b at high reasoning effort. Verbose reasoning has cost implications in production. You get smarter answers per GPU. You also burn more tokens getting there.

Stay ahead of the curve

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

Open, with fine print

Nvidia published the weights, 10 trillion tokens of pretraining data, 40 million post-training samples, 15 reinforcement learning environments, and complete training recipes. Artificial Analysis gave the release an 83 on its Openness Index, behind only Ai2 and MBZUAI among top-performing models. "By far the most intelligent model ever released with this level of openness," the organization wrote.

But the license complicates the picture. Nvidia's Open Model License Agreement, last updated October 2025, allows commercial use. Enterprises can build derivative models and owe nothing in royalties. Worldwide, perpetual. Nvidia makes no claim on outputs.

Two clauses keep it from being truly unconditional. Strip the model's safety guardrails without replacing them with something "substantially similar," and the license terminates. File patent or copyright litigation against Nvidia alleging the model infringes your intellectual property, and the license dies on contact.

For enterprises building production systems on Nemotron, neither clause is likely to trigger. For the open-source community that tracks license purity, the distinction matters. The model is inspectable, modifiable, commercially deployable. It is not unconditionally free. Call it "open enough for business, guarded enough for lawyers."

The China calculation

The $26 billion figure needs context beyond Nvidia's product roadmap. Open-weight AI has shifted since Meta released Llama in 2023. Zuckerberg signaled last year that Meta might not keep future models fully open. OpenAI's gpt-oss, its concession to open weight, remains inferior to the company's proprietary offerings and poorly suited to modification.

Into that vacuum, Chinese labs walked. DeepSeek, Alibaba's Qwen, Moonshot AI, Z.ai, MiniMax. Their models are used by startups and researchers across the world, free to download and modify. Qwen in particular has become a default choice for teams that need an open model they can fine-tune without friction. And then there is DeepSeek. A new model, widely rumored to have trained entirely on Huawei chips, would pull compute demand toward Chinese hardware and away from Nvidia's.

Nvidia's models, though, run best on Nvidia GPUs. Every startup that builds on Nemotron instead of Qwen is a startup more likely to buy Nvidia silicon. Every researcher who adopts Nvidia's training recipes learns to optimize for Nvidia's stack. The models are genuinely useful. The hardware dependency they create is by design.

Catanzaro told WIRED that helping the ecosystem served Nvidia's interests directly. "It's in our interest to make the ecosystem diverse and strong everywhere," he said.

Nathan Lambert leads the ATOM Project at the Allen Institute for AI. He called the investment significant and argued the US government should fund open models too. Andy Konwinski, who runs the Laude Institute, went further. "They sit at the front of so many open and closed AI efforts," he told WIRED. No major chip company has committed this heavily to publishing weights and training recipes.

For Nvidia's biggest GPU customers, the companies now watching their chip supplier compete with them on models, the mood is nervous. Nvidia insists the models exist to strengthen its hardware business, not to replace its customers' products. But $26 billion is not a side project. It is a second business.

What comes next

GTC starts March 16. Nvidia has not announced a release date for Nemotron 3 Ultra, the roughly 500-billion-parameter model teased in December with about 50 billion active parameters. Catanzaro confirmed to WIRED that a 550-billion-parameter model recently finished pretraining.

Perplexity already runs Nemotron 3 Super as one of 20 orchestrated models in its Computer agent system. Siemens, Palantir, and Cadence are customizing the model for telecom, cybersecurity, and semiconductor design. Google Cloud and Oracle support deployment now, with AWS and Azure expected to follow.

Twenty-six billion dollars. A 120-billion-parameter model anyone can download. Training recipes published for anyone with enough GPUs to replicate. Nvidia sells the chips. It publishes models tuned for those chips. The ecosystem grows around its silicon. The loop feeds itself. Jensen Huang's company now occupies a position no one else in AI holds: the infrastructure vendor that also writes the software the infrastructure was built to run. The filing says $26 billion. The bet says everything.

Frequently Asked Questions

What makes Nemotron 3 Super different from other open models?

It combines three architectures: Mamba-2 state-space layers for efficient sequence processing, Transformer attention for precise recall, and Latent MoE routing that activates only 12 billion of 120 billion total parameters per pass. The hybrid design targets multi-agent AI systems that generate 15x more tokens than standard chat, cutting inference cost while preserving reasoning quality.

How does Nemotron 3 Super compare to Qwen3.5?

Qwen3.5 122B scores 42 on the Artificial Analysis Intelligence Index versus Super's 36, making it the smarter model on benchmarks. But Super delivers 11% higher throughput per GPU and runs natively in 4-bit precision on Blackwell GPUs. For workloads running hundreds of concurrent agents, the efficiency advantage offsets the benchmark gap.

Can enterprises use Nemotron 3 Super commercially?

Yes. Nvidia's Open Model License Agreement allows commercial use, derivative models, and worldwide deployment with no royalties. Two conditions apply: you cannot strip safety guardrails without equivalent replacements, and filing IP litigation against Nvidia terminates the license. For most enterprise use cases, neither clause restricts deployment.

Why is Nvidia investing $26 billion in open-weight models?

Nvidia's open models run best on Nvidia GPUs. By publishing weights and training recipes, Nvidia creates an ecosystem where startups and researchers build on Nemotron instead of Chinese alternatives like Qwen, keeping compute demand tied to Nvidia hardware. The investment also counters a vacuum left by Meta's retreat from fully open models.

What is Nemotron 3 Ultra and when will it launch?

Nemotron 3 Ultra is the largest planned model in the Nemotron 3 family, with roughly 500-550 billion total parameters and about 50 billion active. Bryan Catanzaro confirmed to WIRED that pretraining recently finished. No release date has been announced, but GTC on March 16 is a likely venue for updates.