Researchers can hijack Siri with inaudible ultrasonic waves

AppleInsider · March 4, 2020 3:26PM

Security researchers have discovered a way to covertly hijack Siri and other smartphone digital assistants using ultrasonic waves, sounds that cannot be normally heard by humans.

The attack, inaudible to the human ear, can be used to read messages, make fraudulent phone calls or take pictures without a user's knowledge.

Dubbed SurfingAttack, the exploit uses high-frequency, inaudible sound waves to activate and interact with a device's digital assistant. While similar attacks have surfaced in the past, SurfingAttack focuses on the transmission of said waves through solid materials, like tables.

The researchers found that they could use a $5 piezoelectric transducer, attached to the underside of a table, to send these ultrasonic waves and activate a voice assistant without a user's knowledge.

Using these inaudible ultrasonic waves, the team was able to wake up voice assistants and issue commands to make phone calls, take pictures or read a message that contained a two-factor authentication passcode.

To further conceal the attack, the researchers first sent an inaudible command to lower a device's volume before recording the responses using another device hidden underneath a table.

SurfingAttack was tested on a total of 17 devices and found to be effective against most of them. Select Apple iPhones, Google Pixels and Samsung Galaxy devices are vulnerable to the attack, though the research didn't note which specific iPhone models were tested.

All digital assistants, including Siri, Google Assistant, and Bixby, are vulnerable.

Only the Huawei Mate 9 and the Samsung Galaxy Note 10+ were immune to the attack, though the researchers attribute that to the different sonic properties of their materials. They also noted the attack was less effective when used on tables clovered by a tablecloth.

The technique relies on exploiting the nonlinearity of a device's MEMS microphone, which are used in most voice-controlled devices and include a small diaphragm that can translate sound or light waves into usable commands.

While effective against smartphones, the team discovered that SurfingAttack doesn't work on smart speakers like Amazon Echo or Google Home devices. The primary risk appears to be covert devices, hidden in advance underneath coffee shop tables, office desks, and other similar surfaces.

The research was published by a multinational team of researchers from Washington University in St. Louis, Michigan State University, the Chinese Academy of Sciences, and the University of Nebraska-Lincoln. It was first presented at the Network Distributed System Security Symposium on Feb. 24 in San Diego.

SurfingAttack is far from the first time that inaudible sound waves have been used to exploit vulnerabilities. The research piggybacks on several previous research projects, including the similarly named DolphinAttack.

mplsp · March 4, 2020 3:53PM

Impressive. I’m always surprised and impressed at the ingenuity that goes into these attacks (as opposed to showing a printed photo to fake-out facial recognition software...)

I actually surprised that the microphones could transduce at those frequencies. Given the attenuation that occurs of ultrasonic waves at any air-solid interface this attack would be limited to the example given above where the phone is placed directly on a solid surface to conduct the sound waves. I also assume that many cases would effectively block the attack by reducing the transmission (the article implies this when it talks about the Mate 9 and Note 10+)

Presumably this could also be blocked universally by altering the phone software to recognize and block the frequencies used in the attack.

arlotimetraveler · March 4, 2020 3:58PM

So, let me understand, they're not telling you which iPhone device, or which version of the IOS. Even though later iOS versions can recognize the differences between one individual voice to another. I thought when individual research occurs, they provide testing configurations to legitimize their conclusions.

edited March 2020

randominternetperson · March 4, 2020 4:01PM

ArloTimetraveler said:

So, let me understand, they're not telling you which iPhone device, or which version of the IOS. Even though later iOS versions can recognize the differences between one individual voice to another. I thought when individual research occurs, they provide testing configurations to legitimize their conclusions.

Right, I thought my iPhone only responds to my voice. So would this require a recording of my voice to activate? In that case, I'm not worried about the coffee shop scenario.

arlotimetraveler · March 4, 2020 4:03PM

randominternetperson said:

ArloTimetraveler said:

So, let me understand, they're not telling you which iPhone device, or which version of the IOS. Even though later iOS versions can recognize the differences between one individual voice to another. I thought when individual research occurs, they provide testing configurations to legitimize their conclusions.

Right, I thought my iPhone only responds to my voice. So would this require a recording of my voice to activate? In that case, I'm not worried about the coffee shop scenario.

"Coffee Shop Scenario" is the perfect comment.

edited March 2020

seanismorris · March 4, 2020 4:05PM

If you don’t use Bluetooth. Disable it.
If you don’t use WiFi. Disable it.
If you don’t use Siri. Disable it.

If I need any one of them turn them on... it only takes a second.

You never know when a new vulnerability will be found. There’s no reason to stress about, but might as well lock things down as much as possible.

lkrupp · March 4, 2020 4:05PM

So you would have to sneak into my home to clandestinely place the $5 piezoelectric transducer so you can issue commands to my iPhone or Echo Dot IF it happens to be lying on the same hard surface you stick the transducer to? As a single user I'm not too scared about this. It may/might have uses in industrial espionage or CIA spook stuff. So there's that.

ihatescreennames · March 4, 2020 4:44PM

Maybe I’m missing something here. I just used Siri to lower the volume to 0%, Siri replied, “Media volume set to 0%” at a volume I could hear. Then I asked Siri to read my most recent message and it was read back at a volume I could hear, even though volume is set to 0% (and, no, it was not a message with a two-factor code in it, ha ha, that just makes this more “scary”).

Assuming my phone was on a table, what good photos could this method take? It’s either gonna be the ceiling or the table. And then what? The photo is saved to my camera roll, I guess.

Still, I find this kind of research kinda neat but ultimately kinda useless (at this stage anyway).

randominternetperson · March 4, 2020 5:19PM

What's the over/under on how long it will take iOS to fix this (minor) vulnerability? It shouldn't be too difficult to add a frequency filter to Siri inputs.

soli · March 4, 2020 6:06PM

randominternetperson said:

ArloTimetraveler said:

So, let me understand, they're not telling you which iPhone device, or which version of the IOS. Even though later iOS versions can recognize the differences between one individual voice to another. I thought when individual research occurs, they provide testing configurations to legitimize their conclusions.

Right, I thought my iPhone only responds to my voice. So would this require a recording of my voice to activate? In that case, I'm not worried about the coffee shop scenario.

I would assume that any biometric, wether it's a fingerprint, face, retina, voice, etc, looks for specific elements that are familiar and if there are a certain amount that line up well enough it makes a determination that it's most likely the correct individual and then proceeds from there. I haven't seen any white papers on how Siri, Alexa, Cotana, or Google's Assistant verify a user's voice, but since these are done for ease of use, not secure authentication, I'd assume that they are considerably less strict than the algorithms for allowing a fingerprint or face to authenticate a device for access.

temperor · March 4, 2020 6:52PM

Not impressive, more impressive is laser beam being shot from 100 meters or more trough a window to activate digital assistance ... you can send an open door command trough the window to your smart speaker, now that is cool

anantksundaram · March 4, 2020 7:20PM

Good thing Siri’s totally clueless so we don’t use it much anyway. Whew.

coolfactor · March 4, 2020 7:27PM

ihatescreennames said:

Maybe I’m missing something here. I just used Siri to lower the volume to 0%, Siri replied, “Media volume set to 0%” at a volume I could hear. Then I asked Siri to read my most recent message and it was read back at a volume I could hear, even though volume is set to 0% (and, no, it was not a message with a two-factor code in it, ha ha, that just makes this more “scary”).

Assuming my phone was on a table, what good photos could this method take? It’s either gonna be the ceiling or the table. And then what? The photo is saved to my camera roll, I guess.

Still, I find this kind of research kinda neat but ultimately kinda useless (at this stage anyway).

Yes, a video demonstration is needed to back this up. I had the same experience. The audio of Siri's responses don't change when you ask her to change the volume to 0. Only media playback volume is adjusted. Even asking her to turn on Do Not Disturb mode gives audible feedback that it was done.

While maybe not entirely realistic, all research teaches us things we didn't know before, so it's all beneficial.

libertyforall · March 4, 2020 8:17PM

I want to see a video of this in action. Also I want to download the audio files used to try and validate this myself.

dysamoria · March 4, 2020 8:32PM

Just another reason not to enable “hey Siri” or whatever it is... which I’ve never enabled (or I’ve disabled it, if it’s on by default).

ihatescreennames · March 4, 2020 8:36PM

dysamoria said:

Just another reason not to enable “hey Siri” or whatever it is... which I’ve never enabled (or I’ve disabled it, if it’s on by default).

I mean, I guess. Someone could throw a rock through a window or sliding glass door to gain entry to my house but that doesn't mean I've eliminated glass. Also, it would be a lot easier for someone to break a window than it would be to successfully pull this off.

filemakerfeller · March 5, 2020 12:01AM

temperor said:

Not impressive, more impressive is laser beam being shot from 100 meters or more trough a window to activate digital assistance ... you can send an open door command trough the window to your smart speaker, now that is cool

https://www.xkcd.com/530/

mplsp · March 5, 2020 2:12AM

ArloTimetraveler said:

So, let me understand, they're not telling you which iPhone device, or which version of the IOS. Even though later iOS versions can recognize the differences between one individual voice to another. I thought when individual research occurs, they provide testing configurations to legitimize their conclusions.

If you take the time to read the article linked above you’ll find all the details you’re asking about. You kinda criticized AI for providing a summary that didn’t have all the details.

libertyforall said:

I want to see a video of this in action. Also I want to download the audio files used to try and validate this myself.

The attack used ultrasonic sound - an audio file won’t help you.randominternetperson said:

ArloTimetraveler said:

So, let me understand, they're not telling you which iPhone device, or which version of the IOS. Even though later iOS versions can recognize the differences between one individual voice to another. I thought when individual research occurs, they provide testing configurations to legitimize their conclusions.

Right, I thought my iPhone only responds to my voice. So would this require a recording of my voice to activate? In that case, I'm not worried about the coffee shop scenario.

Yeah, except Siri gets activated on a regular basis by other people, the radio, etc, so any biometric identification used is pretty weak.

leehamm · March 5, 2020 4:06AM

"Security researchers have discovered a way to covertly hijack Siri and other smartphone digital assistants using ultrasonic waves, sounds that cannot be normally heard by humans. " These sound waves are inaudible by definition. This is like saying human skin may sustain damage when exposed to ultraviolet light, that is not normally visible to the human eye. So we don't know if the ultrasonics just wake up the device or if the phones' microphone is responsive directly to these sound waves. Most camera sensors [and snake eyes] have a different spectrum of response than the human eye. See, for example, infrared pictures taken with commercial camera sensors. It makes sense that most microphones respond to a different sound spectrum than the human ear. We know dogs can hear higher frequencies than humans.

soli · March 5, 2020 4:15AM

leehamm said:

"Security researchers have discovered a way to covertly hijack Siri and other smartphone digital assistants using ultrasonic waves, sounds that cannot be normally heard by humans. " These sound waves are inaudible by definition. This is like saying human skin may sustain damage when exposed to ultraviolet light, that is not normally visible to the human eye. So we don't know if the ultrasonics just wake up the device or if the phones' microphone is responsive directly to these sound waves. Most camera sensors [and snake eyes] have a different spectrum of response than the human eye. See, for example, infrared pictures taken with commercial camera sensors. It makes sense that most microphones respond to a different sound spectrum than the human ear. We know dogs can hear higher frequencies than humans.

By the layman definition it is, but that definition isn't complete as it will vary from person to person. The quoted sentence is correct, because the frequencies are not normally heard by humans (but there definitely people that can hear sounds we usually attribute with ultrasonic frequencies.

From a scientific standpoint it would probably be better if we define ultrasonic specifically from 20,000 Hertz to either infinity or some upper level where another term gets used. Note that medical ultrasounds usually operate at around 10,000 Hertz.

beowulfschmidt · March 5, 2020 1:01PM

randominternetperson said:

ArloTimetraveler said:

So, let me understand, they're not telling you which iPhone device, or which version of the IOS. Even though later iOS versions can recognize the differences between one individual voice to another. I thought when individual research occurs, they provide testing configurations to legitimize their conclusions.

Right, I thought my iPhone only responds to my voice. So would this require a recording of my voice to activate? In that case, I'm not worried about the coffee shop scenario.

Siri has speech recognition, but I don't think it has anything like what I think of when I use the term "voice recognition", i.e. the ability to discern an individual's voice from others' voices. It doesn't actually recognize my voice. My wife and I regularly invoke each other's Siri when we're in close proximity. Even watching TV or random YouTube videos sometimes triggers it.

Yes, there is the "training" that one does when setting up a new phone, but from what I've seen, that just sets Siri's expectations for speech patterns, not the voice itself.

edited March 2020

Researchers can hijack Siri with inaudible ultrasonic waves

Comments