AI Doxxing: When Chatbots Share Your Phone Number with Strangers

The phone calls began out of nowhere and continued, unsolicited, for over a month. Each caller was a different person seeking help – everything from legal advice to being locked out of a home. The one thing the strangers had in common was that they had found the phone number through Google’s AI.

This is the reported experience of one victim of a new trend known as “AI doxxing”, which involves popular platforms like Gemini or ChatGPT sharing people’s private information without their consent. In this instance, the victim’s personal phone number appears to have been used as a placeholder whenever users asked the AI to provide contact details for a company or service.

“Strangers are calling me constantly looking for a lawyer, a product designer, a locksmith – you name it,” they wrote in a post to Reddit’s r/Google forum. “Every single one of them tells me: ‘I got your number from Google’s AI’. This is a massive privacy violation and data leak. My phone doesn’t stop ringing with random people expecting a service, and my daily life is being completely disrupted.”

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Other reported instances of AI doxxing include Elon Musk’s Grok chatbot exposing home addresses of non-public figures, Meta’s WhatsApp AI assistant mistakenly sharing people’s private numbers, and ChatGPT hallucinating incriminating information about an individual.

Privacy experts warn of amplified data collection issues

Privacy experts warn that the cases highlight how generative artificial intelligence systems have amplified long-standing problems surrounding online data collection. Large language models (LLMs) generate responses from material gathered from across the internet, including outdated records, forum posts and scraped databases that can potentially surface incorrect or private information.

“Gemini’s problem is not a defect. It’s the result of unchecked years of data brokerage practices that meet generative AI,” a spokesperson for data removal service ClearNym told The Independent. “For a decade and counting, many organisations have been discreetly harvesting information such as personal phone numbers, addresses, familial relationships, and other personally identifiable details from public databases and opt-outs. This information was sold, traded, and thrown into machine learning training sets. It now returns as accurate copies or even fabrications and, most recently, as ‘placeholder’ phone numbers for any number of strangers.”

ClearNym researchers claim that the arrival of newer, more powerful AI models that are trained on even more data means the problem will likely get even worse, saying it could be one of the biggest privacy stories of the year. The lack of rules, such as the right to be forgotten legislation that allows a person to remove private information from results on search engines like Google, mean that victims also have very few resources to protect themselves.

“They cannot order AI to forget this information, they cannot go after all the data brokers feeding the algorithms, and there is no regulatory oversight,” the spokesperson said.

Google responds to AI doxxing concerns

When contacted by The Independent, a Google spokesperson said: “We have safeguards in place to prevent personal content from surfacing on Search AI features, along with dedicated tools to request its removal. We review all requests and take action when we have sufficient information to verify that the content appeared and that it violates our policies.”

A report last month from Virgin Media O2 found that millions of Brits have been served with fake customer service numbers via AI tools, with criminals now exploiting this issue by injecting their own phone numbers into LLM-powered systems in order to influence the results. By posing as trusted brands, they are able to steal data, perpetuate fraud, and lure victims into scams.

“Criminals know when people search for help, they’re often looking for a quick answer,” said Murray Mackenzie, director of fraud prevention at Virgin Media O2. “AI tools are creating new opportunities for fraudsters to create realistic-looking fake numbers that appear through search results or chatbots, putting people at risk of calling a criminal rather than their trusted provider.”

Pickt after-article banner — collaborative shopping lists app with family illustration

How scammers poison AI data

Scammers are able to do this by “seeding poisoned content” across the web in places like Yelp reviews or YouTube comments, according to separate research from AI security firm Aurascape. By including keywords like “official British Airways reservations number”, the fake phone numbers are picked up by AI web crawlers that are used to train the LLMs.

“Attackers are quietly rewriting the web that AI systems read,” said Qi Deng, lead security researcher at Aurascape’s Aura Labs. “When you ask an assistant how to call your airline, it does exactly what it was designed to do, but with a customer support and reservations number that leads straight to a scammer instead of the real company.”

Security experts say people can avoid falling victim to such scams by only using numbers listed on official company websites. But for those whose phone numbers end up in the answers of chatbot queries accidentally, there seems to be little that can be done to prevent it from happening.

“Standard support forms are a complete dead end,” the person whose number is being served up through Google’s Gemini and AI overviews said. “I submitted an official legal removal/ privacy request to Google, asking them to urgently blacklist my number from their LLM outputs. I haven’t received a single response, and the harassment continues daily.”

This difficulty of fixing an LLM when it has already been trained was evident this week when OpenAI was forced to acknowledge ChatGPT’s goblin obsession. Whether it’s hallucinations turned into harassment or poisoned data leading to scammers, there is currently no easy answer to this problem. While search engines can “forget”, AI models cannot simply unlearn.