{"id":1811,"date":"2025-02-07T05:03:43","date_gmt":"2025-02-07T05:03:43","guid":{"rendered":"https:\/\/serisec.com\/index.php\/2025\/02\/07\/ais-and-robots-should-sound-robotic-html\/"},"modified":"2025-02-07T05:03:43","modified_gmt":"2025-02-07T05:03:43","slug":"ais-and-robots-should-sound-robotic-html","status":"publish","type":"post","link":"https:\/\/serisec.com\/index.php\/2025\/02\/07\/ais-and-robots-should-sound-robotic-html\/","title":{"rendered":"AIs and Robots Should Sound Robotic"},"content":{"rendered":"\n<div>AIs and Robots Should Sound Robotic<\/div>\n<p> \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p>Most people know that <a href=\"https:\/\/spectrum.ieee.org\/tag\/robots\">robots<\/a> no longer sound like tinny trash cans. They sound like <a href=\"https:\/\/spectrum.ieee.org\/tag\/siri\">Siri<\/a>, <a href=\"https:\/\/spectrum.ieee.org\/tag\/alexa\">Alexa<\/a>, and <a href=\"https:\/\/spectrum.ieee.org\/tag\/gemini\">Gemini<\/a>. They sound like the voices in labyrinthine customer support phone trees. And even those robot voices are being made obsolete by new <a href=\"https:\/\/spectrum.ieee.org\/chatgpt-multimodal\">AI-generated voices<\/a> that can mimic every vocal nuance and tic of human speech, down to specific regional accents. And with just a few seconds of audio, <a href=\"https:\/\/spectrum.ieee.org\/tag\/ai\">AI<\/a> can now <a href=\"https:\/\/spectrum.ieee.org\/digital-afterlife\">clone someone\u2019s specific voice<\/a>.<\/p>\n<p>This technology will replace humans in many areas. Automated customer support will save <a href=\"https:\/\/spectrum.ieee.org\/tag\/money\">money<\/a> by cutting staffing at <a href=\"https:\/\/spectrum.ieee.org\/tag\/call-centers\">call centers<\/a>. <a href=\"https:\/\/spectrum.ieee.org\/ai-agents\">AI agents<\/a> will make calls on our behalf, conversing with others in <a href=\"https:\/\/spectrum.ieee.org\/tag\/natural-language\">natural language<\/a>. All of that is happening, and will be commonplace soon.<\/p>\n<p>But there is something fundamentally different about talking with a bot as opposed to a person. A person can be a friend. An AI cannot be a friend, despite how people might treat it or react to it. AI is at best a tool, and at worst a means of manipulation. Humans need to know whether we\u2019re talking with a living, breathing person or a robot with an agenda set by the person who controls it. That\u2019s why robots should sound like robots.<\/p>\n<p>You can\u2019t just label AI-generated speech. It will come in many different forms. So we need a way to recognize AI that works no matter the modality. It needs to work for long or short snippets of audio, even just a second long. It needs to work for any language, and in any cultural context. At the same time, we shouldn\u2019t constrain the underlying system\u2019s sophistication or language complexity.<\/p>\n<p>We have a simple proposal: all talking AIs and robots should use a ring <a href=\"https:\/\/spectrum.ieee.org\/tag\/modulator\">modulator<\/a>. In the mid-twentieth century, before it was easy to create actual robotic-sounding speech synthetically, ring modulators were used to make actors\u2019 voices sound robotic. Over the last few decades, we have become accustomed to robotic voices, simply because text-to-speech systems were good enough to produce intelligible speech that was not human-like in its sound. Now we can use that same technology to make robotic speech that is indistinguishable from human sound robotic again.<\/p>\n<p>A ring modulator has several advantages: It is computationally simple, can be applied in real-time, does not affect the intelligibility of the voice, and\u2014most importantly\u2014is universally \u201crobotic sounding\u201d because of its historical usage for depicting robots.<\/p>\n<p>Responsible <a href=\"https:\/\/spectrum.ieee.org\/tag\/ai-companies\">AI companies<\/a> that provide <a href=\"https:\/\/spectrum.ieee.org\/tag\/voice-synthesis\">voice synthesis<\/a> or AI <a href=\"https:\/\/spectrum.ieee.org\/tag\/voice-assistants\">voice assistants<\/a> in any form should add a ring modulator of some standard frequency (say, between 30-80 Hz) and of a minimum amplitude (say, 20 percent). That\u2019s it. People will catch on quickly.<\/p>\n<p>Here are a couple of examples you can listen to for examples of what we\u2019re suggesting. The first clip is an AI-generated \u201cpodcast\u201d of this article made by <a href=\"https:\/\/g.co\/kgs\/FyCQAGX\">Google\u2019s NotebookLM<\/a> featuring two AI \u201chosts.\u201d Google\u2019s NotebookLM created the podcast script and audio given only the text of this article. The next two clips feature that same podcast with the AIs\u2019 voices modulated more and less subtly by a ring modulator:<\/p>\n<h5>Raw audio sample generated by Google\u2019s NotebookLM<\/h5>\n<p><audio style=\"width: 100%;\" controls=\"controls\"><source src=\"https:\/\/www.schneier.com\/wp-content\/uploads\/2025\/01\/robots-article.mp3\" type=\"audio\/mpeg\"><\/source>Your browser does not support the audio element.<\/audio><\/p>\n<h5>Audio sample with added ring modulator (30 Hz-25%)<\/h5>\n<p><audio style=\"width: 100%;\" controls=\"controls\"><source src=\"https:\/\/www.schneier.com\/wp-content\/uploads\/2025\/01\/robots-article-30hz-25percent.mp3\" type=\"audio\/mpeg\"><\/source>Your browser does not support the audio element.<\/audio><\/p>\n<h5>Audio sample with added ring modulator (30 Hz-40%)<\/h5>\n<p><audio style=\"width: 100%;\" controls=\"controls\"><source src=\"https:\/\/www.schneier.com\/wp-content\/uploads\/2025\/01\/robots-article-30hz-40percent.mp3\" type=\"audio\/mpeg\"><\/source>Your browser does not support the audio element.<\/audio><\/p>\n<p>We were able to generate the audio effect with a 50-line <a href=\"https:\/\/spectrum.ieee.org\/tag\/python\">Python<\/a> script generated by <a href=\"https:\/\/claude.ai\/\">Anthropic\u2019s Claude<\/a>. One of the most well-known robot voices were those of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Dalek\">the Daleks from Doctor Who<\/a> in the 1960s. Back then robot voices were difficult to synthesize, so the audio was actually an actor\u2019s voice run through a ring modulator. It was set to around 30 Hz, as we did in our example, with different modulation depth (amplitude) depending on how strong the robotic effect is meant to be. Our expectation is that the AI industry will test and converge on a good balance of such parameters and settings, and will use better tools than a 50-line Python script, but this highlights how simple it is to achieve.<\/p>\n<p>Of course there will also be nefarious uses of AI voices. <a href=\"https:\/\/spectrum.ieee.org\/tag\/scams\">Scams<\/a> that use <a href=\"https:\/\/spectrum.ieee.org\/tag\/voice-cloning\">voice cloning<\/a> have been getting easier every year, but they\u2019ve been possible for many years with the right know-how. Just like we\u2019re learning that we can no longer trust images and videos we see because they could easily have been AI-generated, we will all soon learn that someone who sounds like a family member urgently requesting money may just be a scammer using a voice-cloning tool.<\/p>\n<p>We don\u2019t expect scammers to follow our proposal: They\u2019ll find a way no matter what. But that\u2019s always true of <a href=\"https:\/\/spectrum.ieee.org\/tag\/security\">security<\/a> standards, and a rising tide lifts all boats. We think the bulk of the uses will be with popular voice <a href=\"https:\/\/spectrum.ieee.org\/tag\/apis\">APIs<\/a> from major companies\u2014and everyone should know that they\u2019re talking with a robot.<\/p>\n<p><em>This essay was written with Barath Raghavan, and originally appeared in <a href=\"https:\/\/spectrum.ieee.org\/audio-deepfake-fix\">IEEE Spectrum<\/a>.<\/em><\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Bruce Schneier<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/www.schneier.com\/blog\/archives\/2025\/02\/ais-and-robots-should-sound-robotic.html\">Go to bruce schneier<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>AIs and Robots Should Sound Robotic Most people know that robots no longer sound like tinny trash cans. They sound like Siri, Alexa, and Gemini. They sound like the voices in labyrinthine customer support phone trees. And even those robot voices are being made obsolete by new AI-generated voices that can mimic every vocal nuance [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[167,57,1],"tags":[87],"class_list":["post-1811","post","type-post","status-publish","format-standard","hentry","category-ai","category-bruce-schneier","category-uncategorized","tag-bruce-schneier"],"_links":{"self":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/1811"}],"collection":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/comments?post=1811"}],"version-history":[{"count":0,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/posts\/1811\/revisions"}],"wp:attachment":[{"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/media?parent=1811"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/categories?post=1811"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/serisec.com\/index.php\/wp-json\/wp\/v2\/tags?post=1811"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}