Eavesdropping with Amazon Alexa

If you’re using an Amazon Echo, your life is undoubtedly made easier. Instead of searching on your phone the “old fashioned” way, you can simply ask Alexa what the weather is like, to play your favorite song, or to dim the lights. For the Echo, similar to the Google Home with voice assistant, listening is key. The device is continuously listening to catch you say its wake word (e.g. “Alexa”), so that it can give you what you need instantly. Without its continuous listening, such voice assistants would require activation buttons and would understandably not be the incredibly effortless helpers that they are today. However, with this device’s rise in popularity, one of today’s biggest fears in connection to such devices is privacy. Especially when it comes to a user’s fear of being unknowingly recorded. With this in mind, Maty Siman and Shimi Eshkenazi from the Checkmarx Research Lab decided to test the idea of turning their own Amazon Echo into a tapping device. The team’s first challenge was activating the Echo, given that the audio from Intelligent Personal Assistant (IPA) devices is only streamed to the cloud after the wake-up word is detected (“Alexa”). Therefore, the only option left for the team was to try to turn the device into a recording device after the wake-up word is detected. Afterwards, the wake-up word is detected and Alexa launches the requested capability or application (“Skill”), making the next step identifying how a harmless-looking “malicious” skill could be built, while secretly recording and transcribing what the user is saying, and then sending everything directly to the hacker. There were two challenges in the team’s way to reaching their goal:

  1. They had to ensure the Alexa recording session would stay alive after the user received a silent response from the device.
  2. They wanted the listening device to accurately transcribe the voice received by the skill.

First, they needed to find a way to keep the Alexa recording session alive after the user received a response from the benign part of the skill, and do so without providing any audial indication to disclose that the device was still “listening.” This was not completely straightforward, given the Echo device needs to be prompted by users between cycles, otherwise the session ends after each response to protect users’ privacy. Second, they needed to find a way to accurately transcribe the voice received by the skill application. Skills perform well when they are configured to accept a specific sentence format with placeholders (“slots”) for closed lists of values, such as colors, places or movie names (e.g. “What is the weather in {City}?”). Since they didn’t want to limit ourselves to specific conversations, we set out to find a way for the Echo to accept any text. The Checkmarx Research Lab disclosed this attack scenario to Amazon Lab126 and worked closely with their team to mitigate the risk. Some of the measures that were put in place are:

  1. Setting specific criteria to identify (and reject if necessary) eavesdropping skills during certification
  2. Detecting empty-reprompts and taking appropriate actions
  3. Detecting longer-than-usual sessions and taking appropriate actions

To discover more about how our security research team achieved what they did, continue reading with their full research paper, available here.

The following two tabs change content below.

Arden Rubens

Social Media Manager & Content Writer at Checkmarx
Arden is the social media manager and a content writer at Checkmarx. Her blogs focus on cyber security trends and the latest developments in the world of AppSec. She aims to educate and inspire developers, security professionals, and organizations to find the best defense against online threats.

Jump to Category