AI Voice Clones More Understandable Than Real Humans, Study Finds
AI Voice Clones More Understandable Than Real Humans

Our voices are nearly as unique as our fingerprints—but can you distinguish a real voice from one generated by artificial intelligence? AI voice clones represent a new generation of synthetic voices capable of recreating a person's speech from just a few seconds of recorded audio.

AI Clones Clearer Than Human Voices

A recent study from University College London has discovered that these synthetic clones are actually clearer and easier to understand than the real individuals they mimic. The research team initially anticipated that voice clones would be poor representations of human voices, but the findings were the complete opposite.

'I thought initially that voice clones would be less intelligible because they were unfamiliar,' said lead author Professor Patti Adank. 'I found they were up to 20 per cent more intelligible, which was quite shocking.'

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

How Voice Clones Work

In the past, voice assistants like Siri or satnav systems relied on 'synthetic voices' that required voice actors to spend hours in recording studios, meticulously sampling all necessary words and phrases. Voice clones, however, have revolutionised this process by using AI to digitally recreate speech patterns. These clones can be generated with as little as a few seconds of recorded audio, even using clips from social media or snippets of conversation as raw material.

This advancement has raised concerns that criminals could use AI to impersonate friends, family, or colleagues to manipulate targets. According to the National Trading Standards, criminals are already using AI to clone voices and set up unauthorised direct debits over the phone.

Study Methodology

In the study, researchers created voice clones of human participants using just 120 pre-recorded sentences. Participants listened to 80 unique sentences—40 spoken by a real person and 40 by an AI voice clone. They were asked to transcribe exactly what they heard, allowing researchers to assess intelligibility. Participants also rated how clear the voice was, how strong the regional accent seemed, and whether they believed it was AI-generated.

To the scientists' surprise, AI-generated voices were consistently rated as easier to understand. This contradicted previous research, and the researchers were baffled as to why. Professor Adank noted: 'A small part of our paper is talking about that experiment, and then a large part is me and my collaborator frantically trying to find out what it is that makes those voice clones more intelligible.'

Further Experiments

The team repeated the experiment with elderly participants and a filter mimicking cochlear implant effects to test hearing-impaired conditions. They also tried the test with Americans to see if British accents caused confusion. Regardless of the conditions, AI clones were consistently rated 13 per cent more intelligible than their human counterparts. Interestingly, participants were rarely tricked—they correctly identified the human voice 70.4 per cent of the time—yet still rated the AI voices as clearer.

After examining over 100 acoustic measurements, the researchers remain stumped. Professor Adank believes the only way to solve the mystery is to collaborate with engineers who build voice clones to understand how the AI systems truly work. 'I am now going to try and recreate [the effect] by studying how synthesisers work and how they use digital signal processing to generate those voices, just to get a bit of a handle on this,' she said.

Pickt after-article banner — collaborative shopping lists app with family illustration