GLM-5.2 vs Kimi K2.7 Code: Early Coding Tests Compared

The two strongest Chinese open models to launch this month, Z.ai's GLM-5.2 and Moonshot's Kimi K2.7 Code, have now been run side by side by five independent reviewers, and the early verdict gives GLM-5.2 a narrow edge. On quick one-shot tasks the two often traded wins, and Kimi sometimes finished faster. Where a gap showed up, it tended to appear when reviewers inspected the generated code, where GLM's builds held up better.

Both launched in mid-June within days of each other. GLM-5.2 carries 744 billion parameters with 40 billion active, ships under an MIT license, and runs a one-million-token context window. Artificial Analysis named it the top open-weight model on its Intelligence Index this month with a score of 51, up 11 points from GLM-5.1, and it beat GPT-5.5 on the GDPval benchmark and took first place on Design Arena's single-turn HTML web design leaderboard, the first model to top the Claude line there, including Fable 5. Kimi K2.7 Code is the narrower instrument: a trillion parameters with 32 billion active, built for agentic coding with thinking mode always on, and the only one of the two that reads images. Its context window runs about 256,000 tokens, which several reviewers flagged as tight for production code.

Key Takeaways

Across five independent reviews this month, GLM-5.2 came out narrowly ahead of Kimi K2.7 Code on real coding tasks, mostly on design, cost, and builds that held up to a second look.
On fast single-prompt tasks the two often traded wins; Kimi sometimes finished faster and added unprompted features, but several of its builds carried more bugs on closer inspection.
GLM-5.2 (744B/40B, MIT, 1M context) leads on design and cost, near 50 cents a task; Kimi K2.7 Code (1T/32B, ~256K context) is the only one that reads images and starts at $15 a month.
The comparisons are single, subjective runs by individual reviewers, not standardized benchmarks, so the pattern across them matters more than any one test.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

On fast, single-prompt tasks, the models traded wins. Fahd Mirza, running both inside the Hermes agent against a planted goal-difference tiebreak bug in a World Cup standings app, found each model diagnosed the error and built a new round-of-32 bracket in one shot, with Kimi finishing faster, in just over five minutes, and adding a tournament-progression detail on its own. He scored the pair neck and neck. Samuel Gregory reached the same word after watching Kimi's agent swarm propagate a redesign across a site, calling both fantastic and putting them near Opus 4.5 in trustworthiness. A reviewer testing through Ollama Cloud and the Pi agent split a sorting-visualizer task, preferring Kimi's design and GLM's functionality, then watched both build a working Rust desktop file application from a single prompt, though Kimi shipped a dependency-file mistake that needed a manual fix.

Where a gap showed up, it was usually on a second look. Web3 Wesley ran both on a website, a game and social copy, then had Claude inspect the output. Kimi's coffee-subscription site looked more polished at a glance, but the code review found more serious bugs in it, a broken mobile menu and a game whose difficulty came from a missing delta-time calculation rather than design, and he scored GLM higher on two of three tasks with the copy a tie. A reviewer at Better Stack reached a similar split: GLM produced a working Three.js racing game in one prompt using about 40,000 tokens while Kimi needed a follow-up prompt and ran to roughly 110,000, and on a full-stack finance dashboard GLM wired a Next.js and Prisma stack together without errors where Kimi reached for a React and Express setup writing to a local SQL file, which the reviewer judged less scalable.

Get Implicator.ai in your inbox

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

Kimi K2.7 Code held advantages of its own. It is the only one of the two that reads images, its agent swarm can split a task across several files at once, and on raw speed several reviewers found it quick, sometimes quicker than GLM. On the other side, the reviewers who inspected its code found more bugs in it than in GLM's builds, its context window runs about a quarter of GLM's million tokens, and the cheaper Kimi plans hit their usage limits sooner.

Know someone who'd find this useful? ✉️ Email it to a friend in one click, or they can subscribe free here.

None of this rests on a standardized coding benchmark. The five comparisons are individual reviewers running their own prompts once each, scoring design and playability by eye or handing the code to Claude to grade, and several called their judgments subjective. GLM-5.2 is token-hungry, averaging about 43,000 tokens per task, but cheap enough that Artificial Analysis clocked it near 50 cents a task, the lowest cost at its intelligence level, and the Better Stack reviewer said he could swap it in for Sonnet or Opus on simpler work and not notice. Moonshot lists Kimi K2.7 Code at about 75 cents per million input tokens and $3.50 per million output, with subscription plans starting at $15 a month.

GLM-5.2 is MIT-licensed and available through Z.ai, and Kimi K2.7 Code is available now through Moonshot's API and the Kimi Code subscription. For developers choosing between the two Chinese open models, the reviewers gave GLM-5.2 the slight edge for general coding, while Kimi K2.7 Code is the one to reach for when a job needs image input or its agent swarm.

Frequently Asked Questions

Which is better for coding, GLM-5.2 or Kimi K2.7 Code?

Across five independent reviews in June 2026, most reviewers gave GLM-5.2 a narrow edge as the more consistent coder, particularly after inspecting the generated code. Kimi K2.7 Code traded wins on some quick single-prompt tasks and sometimes ran faster, but several of its builds carried more bugs.

How do the two models differ technically?

GLM-5.2 has 744 billion parameters with 40 billion active, an MIT license, a one-million-token context window, and is text-only. Kimi K2.7 Code has a trillion parameters with 32 billion active, a context window around 256,000 tokens, thinking mode always on, and can read images.

What does each model cost?

Reviewers clocked GLM-5.2 near 50 cents a task on Artificial Analysis, the lowest cost at its intelligence level. Moonshot lists Kimi K2.7 Code at about 75 cents per million input tokens and $3.50 per million output, with subscription plans starting at $15 a month.

Are these results from standardized benchmarks?

No. The five comparisons are individual reviewers running their own prompts once each and scoring design and playability by eye or handing the code to Claude to grade. Several called their judgments subjective, so the pattern matters more than any single test.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

ai-news

Marcus Schuler

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: editor@implicator.ai

GLM-5.2 Edges Kimi K2.7 Code in Early Coding Tests

Marcus Schuler

Get the Morning Briefing in your inbox.

Related Stories

Breaking News: Anthropic Confidentially Files for IPO After $965 Billion Valuation