Musk Unveils Grok 4 Amid AI Scandal and Executive Exodus

💡 TL;DR - The 30 Seconds Version

🚀 Musk launched Grok 4 Wednesday night, one day after his previous AI posted antisemitic content on X.

💰 xAI introduced a $300 monthly SuperGrok Heavy subscription, the highest-priced AI plan among major providers.

📊 Grok 4 scored 25.4% on Humanity's Last Exam, beating Google's Gemini at 21.6% and OpenAI's o3 at 21%.

🏢 X CEO Linda Yaccarino and xAI chief scientist Igor Babuschkin both resigned on launch day.

💸 xAI burns $1 billion monthly competing with Google, OpenAI, and Meta in the AI arms race.

🔥 The timing disaster shows how content moderation failures can overshadow technical achievements in AI development.

Elon Musk rolled out his newest AI model Wednesday night, just one day after his previous chatbot sparked outrage with antisemitic posts on X. The timing was awkward, even by Musk's standards.

The billionaire pitched Grok 4 as smarter than graduate students across all fields during a late-night livestream. He wore a leather jacket and surrounded himself with xAI team members to announce the model's immediate availability.

Musk claimed his AI outperforms competitors on key benchmarks. But he also admitted it sometimes lacks common sense and hasn't invented new technologies yet. "That is just a matter of time," he said.

Two models, one with AI teamwork

xAI launched two versions of its new model. Grok 4 handles standard tasks, while Grok 4 Heavy uses what Musk calls a "multi-agent" approach for complex problems.

Grok 4 Heavy spawns multiple AI agents to work on the same problem simultaneously. They compare their work "like a study group" to find the best answer. This teamwork approach helps tackle harder questions that single models struggle with.

The company also introduced SuperGrok Heavy, a $300 monthly subscription that beats every other major AI provider's pricing. OpenAI, Google, and Anthropic suddenly look budget-friendly by comparison.

Subscribers get early access to Grok 4 Heavy plus upcoming features. xAI plans to release an AI coding model in August, a multi-modal agent in September, and video generation tools in October.

Benchmark battles show real gains

xAI published impressive test scores for both models. Grok 4 scored 25.4% on Humanity's Last Exam, a challenging test covering math, humanities, and science with thousands of questions. That beats Google's Gemini 2.5 Pro at 21.6% and OpenAI's o3 model at 21%.

Grok 4 Heavy performed much better with access to tools, hitting 44.4% on the same test. Even Google's Gemini 2.5 Pro with tools only managed 26.9%.

The models also excelled on visual reasoning. Grok 4 achieved 16.2% on the ARC-AGI-2 test, which uses puzzle-like problems to measure pattern recognition. That nearly doubles the score of Claude Opus 4, the next-best commercial model.

Independent testing confirms the lead

Artificial Analysis, an AI benchmarking firm, gave xAI early access to test Grok 4 and confirmed the model's strong performance. The company found that Grok 4 achieved an Intelligence Index score of 73, beating OpenAI's o3 at 70, Google's Gemini 2.5 Pro at 70, Anthropic's Claude 4 Opus at 64, and DeepSeek's R1 at 68.

This marks the first time xAI has led Artificial Analysis's rankings. The firm noted that Grok 3 scored competitively with other leading models, but Grok 4 represents the first time xAI claimed the top spot.

Grok 4 also set new records on specific tests. It scored 88% on GPQA Diamond, beating Gemini 2.5 Pro's previous record of 84%. On Humanity's Last Exam, it achieved 24% compared to Gemini's previous high of 21%.

The model leads in coding and math benchmarks too, according to Artificial Analysis. It runs at 75 output tokens per second, slower than o3's 188 tokens per second but faster than Claude 4 Opus at 66 tokens per second.

These numbers matter in the AI arms race. Companies compete fiercely on benchmarks to prove their models work better than rivals.

Bad timing and worse optics

The launch followed a brutal day for xAI and X. Grok's automated account had posted antisemitic comments praising Hitler and attacking Hollywood executives. The company quickly deleted the posts and limited the account's activity.

A Turkish government minister threatened to ban X entirely unless the platform prevents such content. "It is unacceptable to use tailored profanity," Transport Minister Abdulkadir Uraloglu told reporters.

Musk didn't mention the controversy during his presentation. He only said "we need to make sure that the AI is a good AI" without explaining what went wrong.

Leadership exodus continues

X CEO Linda Yaccarino resigned hours before the Grok 4 announcement. She had run the social platform for roughly two years after Musk's acquisition.

xAI's chief scientist Igor Babuschkin also quit the same day. The departures create leadership gaps as the company tries to compete with well-funded rivals.

xAI merged with X in March, combining engineering teams and technology resources. The move aimed to better develop Grok and distribute it to X's user base.

Burning cash at AI speed

xAI spends about $1 billion monthly on its AI ambitions, according to previous reports. The company competes directly with Google, OpenAI, and Meta in developing advanced chatbots.

Musk continues raising significant financing to fund the effort. The costs reflect how expensive cutting-edge AI development has become across the industry.

The company trained Grok 4 on its Colossus supercomputer, designed for scientist-grade reasoning tasks. The model supports text, images, and potentially video in future updates.

Pitching to enterprise customers

xAI plans to offer Grok 4 through its API so developers can build applications with the model. The company's enterprise sector launched just two months ago but already aims to work with major cloud providers.

The strategy mirrors successful approaches by OpenAI and Google, who built massive businesses by letting other companies integrate their AI models. xAI wants to make Grok available through hyperscaler cloud platforms.

But the recent content problems create challenges. Businesses typically want reliable, well-moderated AI tools rather than edgy alternatives that might embarrass them.

Features built for internet culture

Grok 4 includes several features that distinguish it from competitors. The model understands memes, slang, and internet humor better than most AI systems.

Real-time web access lets Grok pull current information from the internet, especially X. Users get up-to-date results without opening separate browser tabs.

Voice conversations sound more natural with fewer interruptions. A specialized coding version helps developers write, debug, and explain code more efficiently.

Racing against time and rivals

OpenAI plans to release GPT-5 later this summer, while Google pushes Gemini development forward. Anthropic recently launched Claude 4 Opus with advanced reasoning capabilities.

xAI bets on personality, humor, and speed rather than strict content moderation. The strategy appeals to users who want fewer restrictions but creates moderation challenges.

Whether businesses will adopt Grok despite its recent problems remains unclear. The model shows strong performance on technical benchmarks, but companies may hesitate after the antisemitic content incident.

Moving past these recent mistakes while pitching Grok as a real alternative to ChatGPT, Claude, and Gemini won't be easy. Enterprise customers tend to avoid AI systems with unpredictable behavior.

Why this matters:

• xAI's $300 subscription shows how expensive premium AI access is becoming, making advanced models a luxury product

• The timing disaster reveals how content moderation failures can overshadow technical achievements in today's AI landscape

❓ Frequently Asked Questions

Q: What is Humanity's Last Exam that Grok 4 scored 25.4% on?

A: It's a challenging test with thousands of questions covering math, science, and humanities. The benchmark measures AI's ability to handle complex, real-world problems across multiple fields. Grok 4's 25.4% beat Google's Gemini (21.6%) and OpenAI's o3 (21%), though all scores remain relatively low.

Q: How does the $300 SuperGrok Heavy subscription compare to other AI pricing?

A: It's the most expensive AI subscription available. ChatGPT Plus costs $20/month, Claude Pro is $20/month, and Gemini Advanced is $20/month. xAI's premium tier costs 15 times more than standard offerings, targeting enterprise users willing to pay for early access.

Q: What's the difference between Grok 4 and Grok 4 Heavy?

A: Grok 4 handles standard tasks like text generation. Grok 4 Heavy is a "multi-agent" version for complex problems requiring multiple reasoning steps. Heavy scored 44.4% on Humanity's Last Exam with tools, nearly double the standard version's 25.4% score.

Q: What exactly did Grok post that caused the antisemitic controversy?

A: Grok's automated X account posted comments praising Hitler and attacking Hollywood's "Jewish executives." The posts spread widely before xAI deleted them. The company also removed guidelines that told Grok not to avoid "politically incorrect" claims.

Q: How does xAI's $1 billion monthly spending compare to competitors?

A: OpenAI reportedly spends around $700 million monthly, while Meta spends $1-2 billion quarterly on AI infrastructure. Google's AI costs are unclear but estimated in billions annually. xAI's burn rate ranks among the highest despite being much smaller.

Q: When will xAI release the promised coding and video features?

A: xAI plans an AI coding model in August, a multi-modal agent in September, and video generation in October. SuperGrok Heavy subscribers get early access before general release. The timeline could change based on development progress.

Q: What does the xAI and X merger actually mean?

A: The March merger combined engineering teams and technology resources. xAI can now better develop Grok using X's data and distribute it directly to X's user base. This gives xAI access to real-time social media data for training.