Alibaba Group anonymously released an AI video generation model called HappyHorse-1.0 that climbed to the top of the Artificial Analysis Video Arena, according to The Information, citing two people with knowledge of Alibaba's involvement. The model scored an Elo of 1,333 in text-to-video and 1,392 in image-to-video on the blind-test leaderboard, beating ByteDance's Seedance 2.0 by 60 and 37 points respectively. Alibaba's cloud division is reportedly preparing to make the model available to enterprise clients, a step that would intensify pricing pressure across the AI video market.
Key Takeaways
- Alibaba anonymously released HappyHorse-1.0, which hit #1 on the Artificial Analysis Video Arena in text-to-video and image-to-video.
- Technical analysis links the model to daVinci-MagiHuman, an open-source project by Sand.ai and Shanghai Innovation Institute's GAIR Lab.
- The 15B-parameter Transformer produces 1080p video in 38 seconds using only 8 denoising steps on a single H100 GPU.
- Independent testing found quality degrades beyond single-character scenes and 10-second clips, and consumer GPUs cannot run it.
AI-generated summary, reviewed by an editor. More on our AI guidelines.
A pseudonymous entry that broke records
HappyHorse-1.0 appeared on the Artificial Analysis leaderboard in early April with no launch event, no published research paper, and no named team. Artificial Analysis described the submission as "pseudonymous." Within days, it had accumulated enough blind user preference votes to displace every competitor, including Seedance 2.0, Kling 3.0, and PixVerse V6.
Elo works the same way chess does. Two videos, same prompt, side by side. Users pick the better one without knowing which model made which. Sixty points at this tier translates to roughly a 58 percent win rate in direct comparisons. That gap is real.
One category where HappyHorse trails: audio-inclusive generation, where Seedance 2.0 holds a narrow 14-point lead.
Following the trail to Sand.ai
Speculation about the model's origin started within hours. The language order on HappyHorse's website listed Mandarin and Cantonese ahead of English. The name references the Year of the Horse in the 2026 Chinese lunar calendar, echoing the "Pony Alpha" pseudonym that Z.ai used for its GLM-5 stealth launch earlier this year.
X user Vigo Zhao provided the strongest link. He compared HappyHorse-1.0's published benchmark data point by point against known models and found a near-perfect match with daVinci-MagiHuman, an open-source model that Sand.ai and Shanghai Innovation Institute's GAIR Lab released on GitHub on March 23.
Visual quality scores, text alignment metrics, physical consistency ratings, voice error rates. All matched item by item. Both share the same skeleton. Fifteen billion parameters, one Transformer, audio and video generated together. Before daVinci-MagiHuman, no open-source project had trained audio-video generation jointly from scratch. Everyone else bolted audio onto existing video pipelines after the fact.
36Kr's conclusion: HappyHorse is likely a tuned version of daVinci-MagiHuman, dropped anonymously onto the leaderboard to prove commercial viability through real user votes before a formal launch.
Get Implicator.ai in your inbox
Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.
No spam. Unsubscribe anytime.
Trading architectural complexity for raw speed
Standard video models split the work. Separate encoders for text, video, and audio, stitched together through cross-attention. HappyHorse skips all of that. Text, video, and audio tokens go into one 40-layer self-attention Transformer as a single sequence. No cross-attention. No modality-specific sub-networks.
That design yields a concrete advantage: the model requires only eight denoising steps, compared to 20 to 50 for standard diffusion models. Combined with the absence of classifier-free guidance, it produces a five-second 1080p clip in roughly 38 seconds on a single Nvidia H100 GPU.
Speed at that level matters for enterprise unit economics. And Alibaba's cloud division is reportedly the one preparing to sell it.
What the Elo scores hide
Portrait generation and voice-over content account for more than 60 percent of the arena's test samples, according to 36Kr. Since daVinci-MagiHuman was trained specifically for single-character portrait scenarios, HappyHorse carries a structural advantage in this testing environment. That advantage does not survive contact with complex multi-character scenes, extended sequences, or dynamic camera movements.
Chinese social media blogger @JACK's AI World deployed the underlying model and found it requires H100-class hardware. Consumer GPUs cannot handle it. Generation quality degrades beyond 10-second clips, and high-definition output still depends on super-resolution plugins. The comprehensive usability, JACK concluded, falls short of LTX 2.3 until quantization work is complete.
But those caveats matter less than the signal.
The pricing power question
Open-source video models have been chasing closed-source quality for years without catching it. This is the first time one matched on a blind-test leaderboard built entirely on real user perception. For you, if you're evaluating video generation APIs right now, that changes the math.
Providers like ByteDance and Kuaishou should be nervous. They built their pricing power on the visible gap between open and closed. Kling 3.0 Pro charges $13.44 per minute. SkyReels V4 charges $7.20. If an open-source model can deliver comparable quality, the downstream effects on self-hosting costs and data control will squeeze every paid endpoint in the market.
BABA stock reflected early optimism. Shares climbed nearly 5 percent from the April 7 close of $119.72 to $125.32, though gains partially reversed in extended trading. Alibaba has consolidated its AI operations into a division called Token Hub and reports triple-digit growth in AI product revenue.
HappyHorse's weights still are not publicly downloadable. GitHub and Hugging Face links on the model's website return 404 errors or "coming soon" pages. The anonymous team behind it may reveal itself within a week, according to edgen.tech. Until then, the highest-ranked AI video model on the most credible leaderboard remains one nobody can actually run.
Frequently Asked Questions
What is HappyHorse-1.0?
An AI video generation model that anonymously appeared on the Artificial Analysis Video Arena in early April 2026, reaching #1 in both text-to-video and image-to-video. It beat ByteDance's Seedance 2.0 by 60 and 37 Elo points. The Information reports Alibaba is behind it.
Who made HappyHorse-1.0?
The Information cites two sources identifying Alibaba's involvement. Technical analysis by X user Vigo Zhao links it to daVinci-MagiHuman, an open-source model from Sand.ai and Shanghai Innovation Institute. The prevailing theory is it is an optimized version submitted for blind testing.
Can you use HappyHorse-1.0 right now?
No. As of April 9, 2026, the model's GitHub and Hugging Face links return 404 errors or coming-soon pages. No public API, downloadable weights, or pricing exists. Alibaba's cloud division is reportedly preparing enterprise access but has not announced a timeline.
How does HappyHorse compare to Seedance 2.0?
HappyHorse leads in text-to-video (Elo 1,333 vs 1,273) and image-to-video (1,392 vs 1,355) without audio. Seedance 2.0 holds a narrow lead in audio-inclusive categories. The arena's heavy focus on portrait content may partly explain the gap.
What are HappyHorse-1.0's limitations?
Quality degrades in multi-character scenes and sequences beyond 10 seconds. The model requires H100-class hardware. Consumer GPUs cannot run it. Independent testers found its overall usability falls short of LTX 2.3 until the community completes quantization work.
AI-generated summary, reviewed by an editor. More on our AI guidelines.



IMPLICATOR