RLHF

Zuckerberg Rebrands Old AI Promises as “Personal Superintelligence”

Zuckerberg promises "personal superintelligence" after spending billions poaching AI talent from competitors. His vision sounds remarkably similar to what OpenAI already offers. Is this innovation or desperation?

Google Bows to EU AI Rules as Meta Refuses to Blink

Google will sign EU's AI rules while Meta refuses, creating a strategic split in Big Tech. The divide reveals two different approaches to Europe's regulatory push as new AI laws take effect August 2.

Israeli Startup Tackles AI’s Data Bottleneck With $50 Million Bet on Light

Israeli startup Teramount raised $50M to solve AI's hidden bottleneck: connecting processors. While everyone builds faster chips, these physicists found the real problem in the wires between them. Their optical solution promises 100x speed gains.

Category: Safety & Ethics

Definition

RLHF (Reinforcement Learning from Human Feedback) trains AI systems using human ratings of good and bad responses.

How It Works

First, train an AI model on text data. Then, humans rate its outputs as helpful, harmful, or neutral. The AI learns to produce responses that get better ratings.

This process makes AI more helpful and less likely to generate harmful content than models trained only on raw text data.

Why It Matters

RLHF makes AI systems behave better. ChatGPT, Claude, and other consumer AI products use this technique to be more helpful and less harmful.

Without RLHF, AI systems often produce accurate but unhelpful or inappropriate responses.

← Back to Safety & Ethics | All Terms