Alibaba Group anonymously released an AI video generation model called HappyHorse-1.0 that climbed to the top of the Artificial Analysis Video Arena, according to The Information, citing two people with knowledge of Alibaba's involvement. The model scored an Elo of 1,333 in text-to-video and 1,392 in image-to-video on the blind-test leaderboard, beating ByteDance's Seedance 2.0 by 60 and 37 points respectively. Alibaba's cloud division is reportedly preparing to make the model available to enterprise clients, a step that would intensify pricing pressure across the AI video market.

Key Takeaways

AI-generated summary, reviewed by an editor. More on our AI guidelines.

A pseudonymous entry that broke records

HappyHorse-1.0 appeared on the Artificial Analysis leaderboard in early April with no launch event, no published research paper, and no named team. Artificial Analysis described the submission as "pseudonymous." Within days, it had accumulated enough blind user preference votes to displace every competitor, including Seedance 2.0, Kling 3.0, and PixVerse V6.

Elo works the same way chess does. Two videos, same prompt, side by side. Users pick the better one without knowing which model made which. Sixty points at this tier translates to roughly a 58 percent win rate in direct comparisons. That gap is real.

One category where HappyHorse trails: audio-inclusive generation, where Seedance 2.0 holds a narrow 14-point lead.

Following the trail to Sand.ai

Speculation about the model's origin started within hours. The language order on HappyHorse's website listed Mandarin and Cantonese ahead of English. The name references the Year of the Horse in the 2026 Chinese lunar calendar, echoing the "Pony Alpha" pseudonym that Z.ai used for its GLM-5 stealth launch earlier this year.

X user Vigo Zhao provided the strongest link. He compared HappyHorse-1.0's published benchmark data point by point against known models and found a near-perfect match with daVinci-MagiHuman, an open-source model that Sand.ai and Shanghai Innovation Institute's GAIR Lab released on GitHub on March 23.

Visual quality scores, text alignment metrics, physical consistency ratings, voice error rates. All matched item by item. Both share the same skeleton. Fifteen billion parameters, one Transformer, audio and video generated together. Before daVinci-MagiHuman, no open-source project had trained audio-video generation jointly from scratch. Everyone else bolted audio onto existing video pipelines after the fact.

36Kr's conclusion: HappyHorse is likely a tuned version of daVinci-MagiHuman, dropped anonymously onto the leaderboard to prove commercial viability through real user votes before a formal launch.

Trading architectural complexity for raw speed

Standard video models split the work. Separate encoders for text, video, and audio, stitched together through cross-attention. HappyHorse skips all of that. Text, video, and audio tokens go into one 40-layer self-attention Transformer as a single sequence. No cross-attention. No modality-specific sub-networks.

That design yields a concrete advantage: the model requires only eight denoising steps, compared to 20 to 50 for standard diffusion models. Combined with the absence of classifier-free guidance, it produces a five-second 1080p clip in roughly 38 seconds on a single Nvidia H100 GPU.

Speed at that level matters for enterprise unit economics. And Alibaba's cloud division is reportedly the one preparing to sell it.

What the Elo scores hide

Portrait generation and voice-over content account for more than 60 percent of the arena's test samples, according to 36Kr. Since daVinci-MagiHuman was trained specifically for single-character portrait scenarios, HappyHorse carries a structural advantage in this testing environment. That advantage does not survive contact with complex multi-character scenes, extended sequences, or dynamic camera movements.

Chinese social media blogger @JACK's AI World deployed the underlying model and found it requires H100-class hardware. Consumer GPUs cannot handle it. Generation quality degrades beyond 10-second clips, and high-definition output still depends on super-resolution plugins. The comprehensive usability, JACK concluded, falls short of LTX 2.3 until quantization work is complete.

But those caveats matter less than the signal.

The pricing power question

Open-source video models have been chasing closed-source quality for years without catching it. This is the first time one matched on a blind-test leaderboard built entirely on real user perception. For you, if you're evaluating video generation APIs right now, that changes the math.

Providers like ByteDance and Kuaishou should be nervous. They built their pricing power on the visible gap between open and closed. Kling 3.0 Pro charges $13.44 per minute. SkyReels V4 charges $7.20. If an open-source model can deliver comparable quality, the downstream effects on self-hosting costs and data control will squeeze every paid endpoint in the market.

BABA stock reflected early optimism. Shares climbed nearly 5 percent from the April 7 close of $119.72 to $125.32, though gains partially reversed in extended trading. Alibaba has consolidated its AI operations into a division called Token Hub and reports triple-digit growth in AI product revenue.

HappyHorse's weights still are not publicly downloadable. GitHub and Hugging Face links on the model's website return 404 errors or "coming soon" pages. The anonymous team behind it may reveal itself within a week, according to edgen.tech. Until then, the highest-ranked AI video model on the most credible leaderboard remains one nobody can actually run.

Frequently Asked Questions

What is HappyHorse-1.0?

An AI video generation model that anonymously appeared on the Artificial Analysis Video Arena in early April 2026, reaching #1 in both text-to-video and image-to-video. It beat ByteDance's Seedance 2.0 by 60 and 37 Elo points. The Information reports Alibaba is behind it.

Who made HappyHorse-1.0?

The Information cites two sources identifying Alibaba's involvement. Technical analysis by X user Vigo Zhao links it to daVinci-MagiHuman, an open-source model from Sand.ai and Shanghai Innovation Institute. The prevailing theory is it is an optimized version submitted for blind testing.

Can you use HappyHorse-1.0 right now?

No. As of April 9, 2026, the model's GitHub and Hugging Face links return 404 errors or coming-soon pages. No public API, downloadable weights, or pricing exists. Alibaba's cloud division is reportedly preparing enterprise access but has not announced a timeline.

How does HappyHorse compare to Seedance 2.0?

HappyHorse leads in text-to-video (Elo 1,333 vs 1,273) and image-to-video (1,392 vs 1,355) without audio. Seedance 2.0 holds a narrow lead in audio-inclusive categories. The arena's heavy focus on portrait content may partly explain the gap.

What are HappyHorse-1.0's limitations?

Quality degrades in multi-character scenes and sequences beyond 10 seconds. The model requires H100-class hardware. Consumer GPUs cannot run it. Independent testers found its overall usability falls short of LTX 2.3 until the community completes quantization work.

AI-generated summary, reviewed by an editor. More on our AI guidelines.

ByteDance's Seedance 2.0 Goes Viral as AI Celebrity Deepfakes Alarm Hollywood
ByteDance's new AI video generator Seedance 2.0 went viral this week with clips that reproduce Hollywood intellectual property in startling detail, Reuters reported. Users generated a Tom Cruise versu
Runway Raises $315 Million to Build AI World Models at $5.3 Billion Valuation
AI video startup Runway closed a $315 million Series E on Tuesday, nearly doubling its valuation to $5.3 billion, according to Bloomberg. General Atlantic led again, its second consecutive Runway roun
YouTube deploys Veo 3 across Shorts platform
💡 TL;DR - The 30 Seconds Version 🎥 YouTube deployed Google's Veo 3 AI video generator to millions of Shorts creators Tuesday, making previously $20/month tools free to compete with TikTok. 📊
AI News

San Francisco

Editor-in-Chief and founder of Implicator.ai. Former ARD correspondent and senior broadcast journalist with 10+ years covering tech. Writes daily briefings on policy and market developments. Based in San Francisco. E-mail: [email protected]