Views: 55
A court filing just confirmed what mist web publishers feared: while Google offered site owners a way to opt out of having their content used to train AI, it continued training its Search-based AI products on that same content! The only way to opt out is to to opt out of being indexed in Google Search entirely.
The numbers reveal the scale of Webmaster Resistance:
50% of Content Opted Out!
During ongoing federal antitrust proceedings, a Department of Justice lawyer presented an internal Google document titled “Search GenAI <> Gemini v3“. According to the document, Google removed 80 billion out of 160 billion tokens – snippets of text – from its AI training data after filtering out content from publishers who opted out.
That means half the dataset was removed due to publisher objections! The volume alone shows that a massive segment of the web did not consent to having their content used in this way.
Judge Mehta, overseeing the trial, asked for clarification.
“That is correct,” said Eli Collins, VP at DeepMind.
Here’s the Catch: Google Still Used the Rest!
The opt-out mechanism was touted as a tool for publisher control. Now we find out that it only applies to Gemini/DeepMind’s AI models. It does not apply to the Search’s use of content for Search AI and other search-specific applications. Confused yet? So are we. However, this apparently was clear as muddy glasses and was confirmed in testimony from Collins himself.
So while half the web tried to opt out, Google retained the ability to train its Search-based AI models using that very content. If you didn’t also block Googlebot from indexing your site, your content was fair game.
Google Is Using More Than Just Web Pages
- The internal document also listed other sources of training data, including:
- Search session data – behavioral logs of user interactions with Search
- YouTube videos
- Additional content signals tied to Google’s ecosystem
Obviously, this kind of proprietary behavioral data gives Google a major edge over competitors. It is clearly part of the monopoly that Google has been convicted of being. It also raises questions about what data is truly “opt-outable” and whether publishers or users have any say at all in how it’s used.
What This Means for SEOs
If you’re in SEO, this moment deserves your attention:
- A massive percentage of publishers said no to AI training.
- Google moved forward anyway.
- The only “real” way to stop Google from training AI on your content is to block indexing completely – sigh – cutting yourself off from search traffic in the process.
This isn’t just about privacy or copyright. It’s about control. Google is replacing traditional search results with AI summaries trained on the very web it destroyed – and it’s doing so with only wink-n-a-nod selective respect for consent.
What Can You Do?
- Audit your robots.txt and AI opt-out headers
Make sure you’ve applied any controls you intend to use – realizing their limitations and that is really questionable if Google will follow it. - Monitor your traffic for zero-click erosion
Look for signs of lost traffic to AI-generated summaries. - Build outside the Google funnel
Focus on email, brand, and content ecosystems that don’t depend on Google’s benevolence.
Links:
- Google trains AI search tools on publisher content despite opt-outs
- Google can train search AI on web content even if publishers opt out
- Google Can Train Search AI With Web Content After AI Opt-Out

As the CEO and founder of Pubcon Inc., Brett Tabke has been instrumental in shaping the landscape of online marketing and search engine optimization. His journey in the computer industry has spanned over three decades and has made him a pioneering force behind digital evolution. Full Bio
Visit Pubcon.com