New AI Training Method Lets Models Teach Themselves

A Slovenian AI tutor is burning through 50 billion tokens a month. Can the math work?

Astra AI just won Slovenia's Startup of the Year. The numbers look great: 170,000 users, expansion into Germany, rave reviews. But their claimed 50 billion monthly tokens raises a question no one's asking: can a €24/month subscription cover those API bills?

GLM-4.7 and the Economics of AI Arbitrage

Zhipu AI's GLM-4.7 benchmark chart excludes its strongest competitor. The data table tells a different story. But the real signal isn't the missing bar—it's a €30 annual subscription designed to snap into tools Western labs built.

Apple Just Turned a Software Update Into a $730,000 Discount on AI Infrastructure

Apple buried RDMA over Thunderbolt in a beta update. Now four Mac Studios can run trillion-parameter AI models at conversational speed. The cost: $50,000. Equivalent NVIDIA hardware: $780,000. No press release. No keynote. Just a checkbox in recovery mode.

New AI Training Method Lets Models Teach Themselves

AI models just got smarter at teaching themselves. A breakthrough method called Test-Time Reinforcement Learning (TTRL) lets AI improve its skills without human guidance, marking a shift in how machines learn.

Researchers from Tsinghua University and Shanghai AI Lab developed TTRL to help AI models learn from their own mistakes. The method works like a study group where models check each other's work, rather than waiting for a teacher to grade them.

The results are striking. When tested on complex math problems, an AI model called Qwen2.5-Math-1.5B more than doubled its accuracy – jumping from 33% to 80%. It achieved this purely through self-learning, without seeing any correct answers.

This matters because current AI models need massive amounts of human-labeled data to improve. TTRL breaks this dependency by letting models generate their own feedback through a clever voting system. When multiple versions of the model agree on an answer, they treat that consensus as a potential learning signal.

Challenging Traditional AI Learning Models

The method's success challenges conventional wisdom about how AI systems learn. Traditional thinking suggests models need precise, human-verified feedback to improve. TTRL shows they can make progress with rough estimates, much like how humans often learn through trial and error.

"AI doesn't need perfect feedback to learn," explains lead researcher Yuxin Zuo. "It just needs signals pointing roughly in the right direction." This insight builds on what we know about human learning – we often improve through practice even without an expert constantly checking our work.

Limitations in Unfamiliar Territory

But TTRL isn't perfect. The method struggles when models tackle completely unfamiliar problems. It's like trying to learn quantum physics without knowing basic math – there's not enough foundation to build on. The researchers found this limitation when testing the system on extremely advanced math problems.

The timing of this breakthrough is significant. As AI systems handle more complex tasks, the old approach of relying on human-labeled training data becomes increasingly impractical. TTRL offers a path around this bottleneck.

The research team is now exploring ways to apply TTRL to real-time learning scenarios. Imagine AI assistants that get better at their jobs simply by doing them, learning from each interaction without waiting for human feedback.

From Static Models to Adaptive Systems

This development fits into a broader trend in AI research: moving from systems that learn in controlled training environments to ones that improve through direct experience. It's a shift from classroom-style learning to something more like on-the-job training.

The implications extend beyond just making better AI. TTRL could change how we think about machine learning. Instead of front-loading all the training, we might see AI systems that continuously evolve and adapt to new challenges.

Risks, Competitors, and the Road Ahead

Other tech labs are taking notice. While Google and OpenAI haven't commented directly on TTRL, similar self-improvement techniques are likely in development at major AI companies. The race is on to create systems that can teach themselves effectively.

The study also revealed some surprising findings about how AI learns. The researchers discovered that sometimes, lower-performing models improved more dramatically than their better-trained counterparts. They theorize this happens because making mistakes actually generates more useful learning signals.

Critics point out valid concerns. Without human oversight, how can we ensure AI systems don't learn harmful behaviors? The researchers acknowledge this challenge but argue that TTRL's consensus-based approach provides some built-in safeguards.

Looking ahead, the team plans to test TTRL on more diverse tasks beyond math problems. They're particularly interested in seeing how the method performs on tasks involving reasoning and decision-making.

Why this matters:

We're watching AI cross a threshold from being purely taught to being able to teach itself. This shift could dramatically speed up AI development while reducing the need for massive labeled datasets.
The success of TTRL suggests that future AI systems might improve naturally through use, like muscles getting stronger with exercise. This could lead to AI that gets better at helping us simply by doing its job.

Read on, my dear:

arXiv: TTRL: Test-Time Reinforcement Learning

Google's Quiet Conquest: What Cloudflare's Data Actually Reveals About AI's Power Grab

Cloudflare's 2025 data shows Googlebot ingests more content than all other AI bots combined. Publishers who want to block AI training face an impossible choice: lose search visibility entirely. The structural advantage runs deeper than most coverage acknowledged.

Maria Garcia Dec 15, 2025

AI Research

Stanford Study Finds AI Agent Outperformed 90% of Human Pentesters, Failed on GUI Tasks

Stanford's AI hacker cost $18/hour and beat 9 of 10 human pentesters. The headlines celebrated a breakthrough. The research paper reveals an AI that couldn't click buttons, mistook login failures for success, and required constant human oversight.

Maria Garcia Dec 11, 2025

Microsoft AI Study Shows Users Want Therapy, Not Copilot

AI Research

Microsoft Publishes World's Largest AI Usage Study

Microsoft analyzed 37.5M Copilot conversations. Health queries dominated mobile usage every hour of every day. Programming's share collapsed. The data shows users want a confidant, not a productivity tool. The industry built for the boardroom anyway.

Robert Brown Dec 10, 2025

The Class Divide in Teen AI: Who Gets Which Chatbot

AI Research

The Class Divide in Teen AI: Who Gets the Tutor, Who Gets the Companion

64% of teens use AI chatbots. But which ones? Higher-income teens cluster around ChatGPT for productivity. Lower-income teens are twice as likely to use Character.ai—the companion bot facing wrongful death lawsuits. The technology is sorting kids by class.

Marcus Schuler Dec 10, 2025

New AI Training Method Lets Models Teach Themselves

Read next