GitHub Reverses Copilot Policy, Will Train AI on User Data

GitHub announced on March 25 that it will begin using interaction data from Copilot Free, Pro, and Pro+ subscribers to train its AI models, reversing a policy that previously excluded all plan tiers from AI model training. The change takes effect April 24 and enrolls millions of individual developers by default. Users who want out must manually disable the setting in their account privacy controls before that date.

Key Takeaways

GitHub will use Copilot interaction data from Free, Pro, and Pro+ users to train AI models starting April 24, enrolled by default
Code from private repos processed during active Copilot sessions is eligible for collection, though stored code at rest is excluded
Copilot Business and Enterprise customers are fully exempt, creating a two-tier privacy system
Users must manually opt out at github.com/settings/copilot/features before April 24 to prevent data collection

What GitHub actually wants

The scope of collection goes beyond what most developers would call "usage data." GitHub's updated terms of service spell it out in granular detail: accepted or modified code outputs, every prompt typed into Copilot Chat, surrounding code context when the assistant generates suggestions, comments and documentation written alongside AI-assisted work, file names, repository structure, navigation patterns, and thumbs-up/thumbs-down feedback on suggestions.

Chief Product Officer Mario Rodriguez justified the shift by pointing to internal testing. Models trained on Microsoft employee interaction data showed "increased acceptance rates in multiple languages" compared to those built on public code and synthetic samples alone, he wrote in the announcement. Microsoft employees have been the guinea pigs since early 2025.

Rodriguez offered no specific metrics. No percentages, no benchmark comparisons, no third-party audits. GitHub has not published benchmarks, A/B comparisons, or independent evaluations of the gains. "Meaningful improvements" is doing a lot of work without a number attached.

That argument is familiar by now. Better data makes better models. But the execution matters more than the pitch. GitHub is not asking developers to contribute. It is enrolling them and leaving a toggle buried in account settings for anyone who objects.

The private repo problem

GitHub draws a careful line between code "at rest" and code processed during active Copilot sessions. Stored private repository content will not feed the training pipeline, the company says. But any code from private repos that passes through Copilot while a developer is working becomes eligible for collection.

The distinction is thinner than it sounds. Picture a developer using Copilot Chat to debug proprietary business logic at 2 AM before a release deadline. Every prompt they type, every code snippet surrounding their cursor, every suggestion they accept or reject generates interaction data that GitHub can now harvest. The FAQ confirms this directly: "code snippets from private repositories can be collected and used for model training while the user is actively engaged with Copilot while working in that repository."

For freelancers and small teams running sensitive client projects on individual Copilot plans, this changes the calculation entirely. A contractor building internal tools for a bank, using Copilot to speed up the work, is now feeding that bank's code patterns into GitHub's training set unless they knew to check a settings page they probably never visit. The contractor's client almost certainly has no idea this is happening.

Upgrading to Business or Enterprise now carries a benefit beyond feature access: complete exclusion from the training data pool. That is not accidental. GitHub is creating a two-tier privacy system where protection costs more.

A policy that will not sit still

This is not the first time GitHub has adjusted who gets swept into Copilot's training apparatus. DevelopersIO traced the full arc: initially, only Free users were subject to training. Then GitHub pulled back and excluded all plans from training data collection. Now the company is reversing course a second time, enrolling Free, Pro, and Pro+ users by default.

The whiplash makes it hard to trust whatever policy comes next.

And the reversal lands at a moment when GitHub Copilot faces real competitive pressure. Anthropic's Claude Code, Google's Gemini code assistants, and Amazon's Q Developer are all fighting for the same developer attention. More training data from real workflows offers a path to staying competitive. GitHub said as much, framing the change as catching up to "established industry practices."

The Register noted that GitHub's FAQ explicitly cites Anthropic, JetBrains, and Microsoft itself as operating similar opt-out data use policies. The comparison is accurate but defensive. It is also the kind of defense that only works if you accept the industry baseline as reasonable. In Europe, where opt-in consent is the standard under GDPR, that baseline looks very different. GitHub structured this rollout according to US norms, The Register observed, "as opposed to European norms where opt-in is commonly required."

Get Implicator.ai in your inbox

Strategic AI news from San Francisco. No hype, no "AI will change everything" throat clearing. Just what moved, who won, and why it matters. Daily at 6am PST.

No spam. Unsubscribe anytime.

Enterprise gets a wall, everyone else gets a toggle

Copilot Business and Enterprise customers sit behind a hard boundary. Their data never touches the training pipeline. The setting to enable data collection for training does not even exist in Business accounts, DevelopersIO confirmed.

Individual developers get a different deal entirely. Enrolled by default. Collection covers everything they type into Copilot. Data shared with Microsoft as a corporate affiliate. Used to train models that serve the broader user base. The opt-out requires finding github.com/settings/copilot/features and flipping a dropdown under Privacy.

GitHub says it will not share interaction data with third-party AI model providers. But the data flows freely within the Microsoft corporate family. That family includes the closest financial partner of OpenAI, the company whose Codex model originally powered Copilot. Whether the boundary between "affiliate" and "third-party" holds as Microsoft deepens its AI partnerships is a question GitHub has not addressed.

Students and teachers who access Copilot are also exempt from the data collection, according to The Register. GitHub has not clarified whether this exemption applies automatically or requires some form of verification.

The community reaction tells the story

Developer response has been lopsided. In the GitHub community discussion responding to the changelog entry, users posted 59 thumbs-down emoji votes against just three rocket emojis, The Register reported. Among 39 comments at the time of that count, nobody besides Martin Woodward, GitHub's VP of developer relations, endorsed the change.

The anger connects to a longer grievance. GitHub Copilot's original models were trained on publicly available code hosted on the platform, a decision that triggered a class-action lawsuit. Developers who contributed open-source code watched it become training data for a commercial product without explicit consent. Now their real-time interactions with that product get folded back into the same machine. The value extraction happens twice: once when the code is published, again when the developer's behavior around that code gets harvested to improve the assistant.

gHacks raised several gaps in the announcement that feed the skepticism. GitHub has not specified a minimum interaction threshold or explained how data gets anonymized before training, if it does at all. No technical controls for preventing sensitive code patterns from surfacing in model outputs served to other users were mentioned. The only safeguard on offer is the opt-out toggle itself.

That silence lands differently when you remember GitHub hosts one of the world's largest collections of proprietary code. Developers working on authentication systems, payment processors, or medical software have no way to verify that their code patterns will not echo through Copilot suggestions served to strangers.

What opting out actually means

Disabling the setting stops future collection. GitHub says prior opt-out preferences from the old "product improvements" toggle carry over. If you already told GitHub to leave your data alone, that choice should stick. Should. The company recommends double-checking anyway, a concession that even GitHub knows its privacy settings have grown into a maze of overlapping toggles across product tiers.

For everyone else, the default flips on April 24. No pop-up will interrupt your coding session to ask for consent. The setting simply activates, and your interactions start flowing into GitHub's training pipeline. You will not know it happened unless you go looking for it.

Opting out after that date stops collection going forward but cannot claw back data already collected. Once interaction patterns enter a training dataset and influence model weights, functional deletion becomes impossible. The data lives on in the model's behavior even after the source records disappear. The cleanest approach is to disable the setting before the deadline hits.

The steps are straightforward. Go to github.com/settings/copilot/features. Find "Allow GitHub to use my data for AI model training" under Privacy. Set the dropdown to Disabled. Do it for every GitHub account you maintain. It takes two minutes, which is about 118 minutes less than reading the updated privacy statement and terms of service.

GitHub's move fits a pattern spreading across the AI industry. Companies that built their products on public data now want private data too, and they are using opt-out frameworks to acquire it at scale. The pitch always sounds reasonable. Real-world usage data makes models better. Participation improves the product for everyone. Anthropic drew similar criticism last year when it asked Claude users to choose between five-year data retention for training or the existing 30-day deletion policy. The pattern is consistent: expand collection first, defend it as industry standard second.

What sets GitHub apart is the population affected. GitHub has reported over 100 million developers on its platform, and Copilot has become embedded in the daily workflow of millions of them through Visual Studio Code alone. Every one of those developers on Free, Pro, or Pro+ plans will be opted in unless they take action before April 24.

Most will never see the policy change. They will never find the settings page. They will never realize their 11 PM debugging sessions and code reviews are feeding the next Copilot update. GitHub knows this. The 30-day heads-up before April 24 qualifies as notice. Barely.

It acknowledges what is coming. It does not make the coming easy to avoid.

Frequently Asked Questions

What data will GitHub collect from Copilot users for AI training?

GitHub will collect code snippets, prompts, cursor context, comments, file names, repository structure, navigation patterns, and feedback on suggestions. This includes everything typed into Copilot Chat and all code surrounding your cursor during active sessions.

Who is affected by the GitHub Copilot training data policy change?

Copilot Free, Pro, and Pro+ individual account users are affected and enrolled by default. Copilot Business, Enterprise, enterprise-owned repositories, students, and teachers are exempt.

Will code from private repositories be used for training?

Stored private repository content at rest will not be used. However, code from private repos that passes through Copilot during active sessions, including prompts and surrounding context, is eligible for collection.

How do I opt out of GitHub Copilot data collection?

Go to github.com/settings/copilot/features, find 'Allow GitHub to use my data for AI model training' under Privacy, and set the dropdown to Disabled. Repeat for each GitHub account you maintain.

Will my data be shared with third parties?

GitHub says interaction data will not be shared with third-party AI model providers. However, data will be shared with GitHub affiliates, which includes Microsoft. The boundary between affiliate and third-party sharing has not been fully clarified.

AI News

Harkaram Grewal

New Delhi

Maps the India–Germany–U.S. AI triangle from New Delhi. Background in cross-market operations and business development. Writes about supply chains, enterprise adoption, and talent—the unsexy forces that actually move global AI.

GitHub Reverses Policy, Will Train AI on Copilot User Data Starting April 24

What GitHub actually wants

The private repo problem

A policy that will not sit still

Enterprise gets a wall, everyone else gets a toggle

The community reaction tells the story

What opting out actually means

Opt-out is not consent

Harkaram Grewal

All our articles are free to read.

Related Stories

WhatsApp Now Uses AI to Draft Reply Suggestions Based on Your Conversations

Apple Adds Four Partners to U.S. Manufacturing Program, Pledges $400 Million

Meta and YouTube Lose First Social Media Addiction Trial, Jury Awards $3 Million