Beijing-based AI startup Zhipu announced Friday it will accept only 20% of new daily subscriptions to its GLM Coding Plan. The reason: too many developers showed up at once after the company released GLM-4.7, and the servers buckled.
If you want to understand Chinese AI right now, this is the image to hold: a company that just went public, stock up 80%, telling customers to get in line.
Key Takeaways
• Zhipu limits new coding subscriptions to 20% of daily volume after GLM-4.7 demand overwhelms servers
• GLM-4.7-Flash scores 59.2% on SWE-bench with 30B parameters, outperforming Qwen3-Coder's 480B model at 55.4%
• Zhipu claims first state-of-the-art model trained entirely on Huawei chips, no Nvidia hardware
• Chinese AI labs face structural compute disadvantage as each new Nvidia generation widens the gap
The constraint no one talks about
Zhipu released a multimodal model this month that it claims was trained entirely on Huawei's Ascend chips. First state-of-the-art Chinese AI model to complete training without touching Nvidia hardware. That's not marketing spin. That's what defiance looks like when you're building under sanctions.
American export controls have forced Chinese AI labs to work with chips that require two to four times more computational power to achieve equivalent training results, according to Lian Jye Su, chief analyst at Omdia. Zhipu can't buy H100s. Neither can DeepSeek or MiniMax. So they build around the gap, and there's a kind of emboldened stubbornness in the way they talk about it. We trained on domestic silicon. We shipped anyway.
But emboldened isn't the same as comfortable. The Huawei-trained model matters less for what it proves technically than for what it reveals about the mood in Beijing's AI labs: cornered, resourceful, and acutely aware that the compute disadvantage isn't going away.
And when those models ship, sometimes the servers can't handle what follows.
Thirty billion parameters, three billion active
Here's why the architecture matters before you see the numbers. If a smaller model can outperform a larger one on the benchmark that matters most for code generation, the entire economics of inference changes. Local deployment becomes viable. Cloud dependency becomes optional.
GLM-4.7-Flash, released January 19, is Zhipu's bet on that shift. Think of the model as a building with 64 floors where only five are lit at any given moment. The architecture stores 30 billion parameters but activates just 3 billion per token. The rest stay dark until needed.
Stay ahead of the curve
Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.
No spam. Unsubscribe anytime.
On SWE-bench Verified, which measures how well models fix real GitHub issues, GLM-4.7-Flash scored 59.2%. Qwen3-Coder's full 480 billion parameter model manages 55.4%. A model one-sixteenth the size, outperforming.
EXO Labs tested it on an M4 Max MacBook Pro: 82 tokens per second. Other users reported speeds between 43 and 81 tokens per second across various hardware setups. A frontier-class coding model running locally on consumer laptops.
You could argue the benchmark scores don't tell the full story. One developer testing through LM Studio described the experience as "dramatically worse than gpt-oss-20b," with the model producing invalid code and getting stuck in loops. Pure reasoning still lags. But for practical coding tasks, local deployment is catching up to cloud faster than anyone predicted a year ago.
The open-source trap
Zhipu and its Chinese competitors face a strategic puzzle that American AI labs have mostly avoided. DeepSeek, Zhipu, and MiniMax have built their user bases on free, open-source technology. Users flood in. Revenue trickles.
Analyst Poe Zhao put it bluntly to AFP: both companies are burning cash faster than they can generate sustainable revenue. Zhipu's coding subscription costs $0.07 per million input tokens through its API. Anthropic charges roughly ten times that for Claude. Every user Zhipu acquires at those margins is a user they're paying to serve.
OpenAI doesn't expect profitability until 2029, and that's with closed-source models commanding premium pricing. The Chinese approach prioritizes adoption over monetization. Developers love it. Accountants don't.
Beijing is betting public money that they can survive the gap. The government announced plans to deploy three to five general-purpose large AI models in manufacturing by 2027. Subsidies are flowing. But subsidies are a bridge, not a destination. At some point, the models have to pay for themselves.
What the demand surge reveals
DeepSeek went through the same growing pain a year ago. After the company's low-cost model stunned the industry by matching American performance at a fraction of the compute budget, demand overwhelmed its infrastructure. API access got restricted. Users complained. The company scrambled to expand capacity.
Now Zhipu follows the same trajectory. Strong model release triggers user surge triggers resource constraints triggers access limits. Too many people trying to enter the building at once, and not enough floors lit to serve them.
Daily at 6am PST
Don't miss tomorrow's analysis
No breathless headlines. No "everything is changing" filler. Just who moved, what broke, and why it matters.
Free. No spam. Unsubscribe anytime.
Zhipu's co-founder Tang Jie offered a sobering read at a Beijing conference. Despite Chinese achievements in large open-source models, he said, "the gap with the United States may actually be widening." Large language models in America remain mostly closed-source. The economic incentives differ. The compute access differs.
The Chinese LLM market is projected to reach $14.5 billion by 2030, according to Frost and Sullivan. China's engineering talent base runs deep. Electricity costs less there than in the United States.
But here's where I come down: the optimistic projections assume the compute gap stays constant. It won't. Every generation of Nvidia chips that Chinese labs can't access widens the training efficiency divide. Tang Jie isn't being modest. He's being honest about a structural disadvantage that won't vanish because Beijing writes checks.
The rationing problem
Twenty percent of new daily subscriptions. That's the bottleneck Zhipu announced. Existing subscribers keep their access. New users wait in line. The building is full.
This constraint doesn't show up in press releases about benchmark performance or IPO valuations. Compute is finite. Demand is not. When a model performs well enough to attract serious adoption, infrastructure stops being a detail and starts being the whole story.
Zhipu trained its image generation model on Huawei chips. It released an efficient coding model that runs on MacBooks. It went public and watched the stock price surge.
Now it's telling developers that the product they want has a waiting list. The company can build models that compete with American labs. Serving them at scale is a different problem. The resources aren't easy to acquire. The constraints can't be engineered around. And the walls aren't coming down.
Chinese AI in January 2026: technically capable, commercially promising, structurally rationed.
Frequently Asked Questions
Q: What is GLM-4.7-Flash and why does it matter?
A: GLM-4.7-Flash is Zhipu's new coding model using a Mixture-of-Experts architecture. It stores 30 billion parameters but activates only 3 billion per token. On SWE-bench Verified, it scores 59.2%, beating Qwen3-Coder's 480B model at 55.4%. The efficiency means it runs locally on consumer hardware at 43-82 tokens per second.
Q: Why is Zhipu limiting new subscriptions?
A: Demand after the GLM-4.7 release overwhelmed Zhipu's computing resources. The company will accept only 20% of its current daily new subscriptions to its GLM Coding Plan starting Friday. Existing subscribers keep their access. This mirrors DeepSeek's experience a year ago when their API access got restricted after a model release.
Q: What makes Zhipu's Huawei-trained model significant?
A: Zhipu claims its GLM-Image model is the first state-of-the-art Chinese AI model trained entirely on Huawei's Ascend chips without using any Nvidia hardware. This matters because US export controls have blocked Chinese labs from buying H100s, forcing them to work with domestic chips that typically require 2-4x more computational power.
Q: How does Zhipu's pricing compare to US competitors?
A: Zhipu charges $0.07 per million input tokens and $0.40 per million output tokens through its API. Anthropic charges roughly ten times that for Claude. Chinese AI labs have built user bases on open-source, low-cost strategies, but analyst Poe Zhao notes they're burning cash faster than they generate revenue.
Q: What's the outlook for China's AI chip gap with the US?
A: Zhipu co-founder Tang Jie said at a Beijing conference that "the gap with the United States may actually be widening." Each new Nvidia chip generation that Chinese labs can't access widens the training efficiency divide. The Chinese LLM market is projected at $14.5 billion by 2030, but those projections assume the compute gap stays constant.
Related stories


IMPLICATOR