Inaudible Audio Can Hack Smart Speakers, Researchers Warn

Cybercriminals can exploit inaudible background sounds in audio and video files to hack smart speakers and AI assistants, gaining access to personal information, according to a new study. Modern voice assistants rely on large language models that integrate audio and text, making them susceptible to a novel threat called auditory prompt injection.

Audio Jailbreaks: A New Frontier in Hacking

Researchers from China and Singapore have uncovered that cleverly crafted prompts, known as "jailbreaks," can bypass safety guidelines and ethical restrictions in AI assistants. While text-based jailbreaks are well-documented, audio jailbreaks have remained underexamined. The team developed a method called Audiohijack, which uses imperceptible audio to hijack audio-based AI models like smart speakers.

High Success Rates and Potential Harm

Testing Audiohijack on 13 state-of-the-art audio-based AI models, the researchers achieved average success rates of 79 to 90 percent. The attack could induce misbehaviours ranging from simple prompt refusal to complex tool misuse, such as downloading malicious files or revealing user information via email. "In this work, we reveal a previously overlooked threat, auditory prompt injection," the researchers note in a yet-to-be-peer-reviewed study posted on arXiv.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Although audio jailbreaks are more constrained than text jailbreaks, the researchers warn they can be "potentially more harmful" due to their covert nature. The threat is particularly concerning as on-device AI integration becomes common in smartphones and smart speakers.

No Dedicated Defenses Exist

The study highlights fundamental vulnerabilities in the audio-text integration of AI models. "No dedicated defences exist for this new threat," the researchers caution. They urge future work to extend evaluation to system-level applications and real devices to better assess practical risks. As AI assistants become ubiquitous, this research underscores the urgent need for robust security measures against audio-based attacks.