Is China’s MiniMax the New Contender in the Global AI Race?

💡 TL;DR - The 30 Seconds Version

🚀 Chinese startup MiniMax released an open-source AI model that beats DeepSeek while using 75% less computing power.

💰 Training cost was just $534,700 compared to DeepSeek's $5-6 million and OpenAI's $100+ million for GPT-4.

🧠 M1 processes 1 million tokens at once - eight times more than DeepSeek and enough for several books.

📊 The model scored 86% on math competitions and 65% on coding benchmarks, beating other open-source alternatives.

🔓 Apache 2.0 license means businesses can use, modify, and sell products built on M1 without restrictions.

🏭 Efficient training methods could make frontier AI models accessible to smaller companies with limited budgets.

Chinese AI startup MiniMax just released something that should make enterprise developers pay attention. Their new MiniMax-M1 model claims to beat DeepSeek's latest offering while using a fraction of the computing power. The kicker? It's completely open source.

MiniMax-M1 arrives with a context window of one million tokens. That's eight times larger than DeepSeek R1's capacity. To put this in perspective, most AI models struggle to remember a long conversation. M1 can process the equivalent of several books worth of information in a single interaction.

The Shanghai-based company didn't just focus on size. They built M1 to be efficient. According to their technical documentation, the model uses only 25% of the computational resources that DeepSeek R1 requires when generating 100,000 tokens. This efficiency comes from what they call a "lightning attention mechanism" combined with a hybrid architecture.

Training costs that raise eyebrows

Here's where things get interesting from a business perspective. MiniMax trained their frontier-level model for $534,700. Compare that to DeepSeek R1's reported training cost of $5-6 million, or OpenAI's GPT-4 which allegedly cost over $100 million to train.

The company achieved this through a custom reinforcement learning algorithm called CISPO. Instead of the typical approach of clipping token updates, CISPO clips importance sampling weights. This technical distinction might sound minor, but it appears to deliver significant cost savings.

MiniMax used 512 Nvidia H800 GPUs for training. The rental cost for this hardware setup was exactly $534,700, suggesting they managed to train the entire model in a single rental period.

Benchmark performance tells the story

MiniMax-M1 comes in two variants: M1-40k and M1-80k, referring to their "thinking budgets" or maximum output lengths. The larger model shows impressive results across standard AI benchmarks.

On AIME 2024, a mathematics competition that tests advanced reasoning, M1-80k scored 86.0%. It achieved 65.0% on LiveCodeBench for coding tasks and 56.0% on SWE-bench Verified for software engineering problems.

The model particularly shines in long-context tasks. On OpenAI's MRCR benchmark with 128,000 tokens, M1 scored 73.4%. When pushed to the full million-token context, it maintained 56.2% accuracy.

These numbers place M1 ahead of other open-weight models like the original DeepSeek-R1 and Qwen3-235B on several complex tasks. Closed models like OpenAI's o3 and Google's Gemini 2.5 Pro still hold leads in some areas, but the gap has narrowed considerably.

Architecture that matters for deployment

MiniMax built M1 on their earlier MiniMax-Text-01 foundation. The model contains 456 billion parameters total, with 45.9 billion activated per token. This Mixture-of-Experts approach means the model can maintain high capability while keeping inference costs manageable.

For deployment, MiniMax recommends vLLM as the serving backend. The model works with standard tools like Transformers, making integration into existing infrastructure straightforward.

The company released M1 under an Apache 2.0 license. This means businesses can modify, use commercially, and deploy the model without restrictions or ongoing payments.

What this means for the AI landscape

MiniMax belongs to China's "Little Dragons" - a group of AI startups backed by internet giants Tencent and Alibaba. These companies raised billions in venture funding over the past year, but DeepSeek's success forced most to cut fundamental research and focus on applications.

M1's release suggests MiniMax found a different path. Instead of retreating from model development, they doubled down on efficiency and open access.

The timing isn't coincidental. This release kicks off what MiniMax calls "MiniMaxWeek" on social media, with more announcements expected. The company appears to be making a coordinated push for developer mindshare.

For enterprises evaluating AI options, M1 offers several advantages. The massive context window reduces preprocessing needs for large documents. The open-source license eliminates vendor lock-in concerns. The efficiency gains could translate to lower operational costs.

Why this matters:

A $534,700 training budget proves frontier AI models don't require massive corporate resources anymore
Open-source models with million-token context windows change the cost equation for enterprise AI deployments

❓ Frequently Asked Questions

Q: What does "open source" mean for businesses using MiniMax-M1?

A: The Apache 2.0 license lets businesses use, modify, and deploy M1 commercially without paying fees or royalties. Companies can train custom versions, integrate it into products, and even sell services built on top of it without restrictions.

Q: How much does it cost to run MiniMax-M1 compared to other AI models?

A: M1 uses only 25% of the computing power that DeepSeek R1 needs for the same tasks. At 100,000 tokens, this translates to roughly 75% lower inference costs. The exact savings depend on your cloud provider and usage patterns.

Q: What's the difference between the M1-40k and M1-80k versions?

A: The numbers refer to "thinking budgets" - how many tokens the model can use for internal reasoning before giving an answer. M1-80k can think longer about complex problems, leading to slightly better performance on math and coding tasks.

Q: Can MiniMax-M1 really process a million tokens at once?

A: Yes. A million tokens equals roughly 750,000 words - about three average novels. The model can analyze entire books, legal documents, or code repositories in a single session without losing context.

Q: What hardware do I need to run MiniMax-M1?

A: The model has 456 billion parameters total but only activates 45.9 billion per token. You'll need high-end GPUs with substantial VRAM. MiniMax recommends using vLLM for deployment, which optimizes memory usage and batch processing.

Q: Who are the "Little Dragons" that MiniMax belongs to?

A: The Little Dragons are six Chinese AI startups backed by Tencent and Alibaba. They raised billions in venture funding but most cut research after DeepSeek's success. MiniMax appears to be taking a different approach by focusing on efficiency.

Q: How does the CISPO training algorithm work?

A: CISPO clips importance sampling weights instead of token updates during reinforcement learning. This technical change helps the model learn more efficiently, contributing to the remarkably low $534,700 training cost compared to typical frontier model budgets.

Q: What other products is MiniMax announcing during "MiniMaxWeek"?

A: The company hasn't revealed specifics yet. MiniMax already offers Hailuo for video generation and various AI companion apps. Based on their pattern, expect tools that complement M1's reasoning capabilities or expand their developer ecosystem.