Brussels Wants Google’s Search Data

Here’s a surveillance angle we probably didn’t see coming. And why you shouldn’t use search directly, and use at least a privacy focused search engine. Unfortunately, can you trust some of these big tech companies that claim private search? My favorite option is SearxNG, which is a privacy proxy that allows you to use several search engines at once, and it filters what is sent and received for your privacy, even hiding your IP behind its own.

https://reclaimthenet.org/brussels-wants-googles-search-data

Brussels wants the data shared at the same speed Google reads it itself, to recipients the proposal hasn’t finished naming.
Blue and black digital globe made of vertical rows of binary and numeric code with yellow stars suggesting the EU flag

By Ken Macon

The European Commission wants Google to hand over a forensic-level record of how Europeans use search, packaged into an API and refreshed as quickly as Google itself queries the data.

The preliminary proposal, published as a Digital Markets Act compliance measure under Article 6(11), is framed as helping rival search engines compete. However, what it actually builds is a pipeline that streams the most intimate behavioral dataset on the internet to parties the Commission hasn’t finished naming.

Every query input by end users, including the original wording and any subsequent modifications. That includes timestamps, user location, language, and device type; whether the query came through the Chrome omnibox, the Google app, Gemini, Google Lens, or Circle to Search. It also includes whether it was typed, spoken, or submitted as a photo. Search is rarely just words anymore, and the Commission’s measure captures all of it.

Everything Google shows you after the query goes too, organic results, paid results, Knowledge Panels, Short Answers, plus the contents of every tab from Web through Images, Videos, News, Forums, Books, and Short Videos. Each result carries identifiers for type, format, and position on the page. Ranking data follows the same logic, recording where a URL appeared, which column held it, its ordinal position relative to neighbors, and which page of results it landed on.

The interaction data is where the surveillance turns granular. The API would deliver timing, order, and duration of clicks on URLs and ad blocks, alongside scrolling, hovering, swiping, and the act of expanding results.

Click-back behavior gets logged when users return to the results page after visiting a link, grouped into time intervals. The system tracks how long users viewed each URL or block and how long they remained on a given screen. Paid search URLs are excluded from the click data, a carve-out that protects advertiser revenue while leaving the rest of the user’s behavior fully exposed.

A search query is one of the most revealing things a person produces. People type questions into Google that they wouldn’t say out loud to a doctor, a lawyer, or a partner. Symptoms, debts, fears about a marriage, suspicions about an employer, names of people they’re trying to find. Pair the query with location, device, language, and the precise sequence of clicks and pauses that followed, and you have a behavioral fingerprint that’s hard to mistake for anyone else.

The Commission’s answer to this is “anonymization.” That’s supposed to mean personal identifiers stripped out, precise timestamps removed, and rare or identifying queries filtered.

However, look at how anonymization actually performs in the wild. Research on supposedly anonymized datasets has found that re-identification is usually trivial, often requiring just a few data points to single someone out from millions. Search data is particularly vulnerable because queries themselves can be identifying.

The 2006 AOL release demonstrated this when journalists matched an “anonymous” user number to a specific 62-year-old woman in Georgia by combining her queries with public phonebook listings. Filtering rare queries reduces but doesn’t eliminate the problem, and behavioral signals like click patterns and dwell times add their own re-identification surface.

There’s also a structural question the proposal doesn’t resolve. Google currently holds this data because Google operates the search engine, and the company is bound by GDPR, by its own published policies, and by user expectations that searches stay between the user and the service. An API that pipes the same data to other entities multiplies the points of failure. Recipients becomes a potential breach vector, a potential subject of law enforcement requests, a potential target for state-aligned hackers, and a potential reseller. The Commission says recipients will be vetted and access controlled, but the document doesn’t define who they are.

The frequency requirement also compounds the exposure. Google would have to share the data via an API at the same cadence that Google itself accesses it. Real-time or near-real-time streaming of European search behavior to an undefined set of third parties is a different proposition from a static research dataset. Real-time data enables real-time inference, including about individual users if anonymization fails.

A consultation opened on April 16 and closes on May 1, and will shape the final scope, including how much data is shared, who receives it, and what anonymization standards apply. The Commission expects to issue a final decision on the measures Google must implement by July 27, 2026.

What’s being proposed is a regulatory mechanism for redirecting one of the most sensitive data streams in Europe into channels the public hasn’t been told about, justified by competition objectives that sit awkwardly next to the GDPR framework the same institution wrote. The two-week consultation window is the only point at which any of this is up for discussion before the architecture starts being built.