When you use the voice recognition software on your smartphone (like Siri on the iPhone) all of your voice commands are recorded and retained by the software manufacturer (like Apple, Samsung, or Google). You might already have known this, but what you probably didn’t realize is those voice files aren’t just sitting on servers somewhere in Silicon Valley; some of them are being listened to by total strangers on the internet. In fact, I listened to almost 200 of them today.
This week, a Reddit post picked up steam when a user wrote that she had started a new job at a company called Walk N’Talk Technologies—a company neither she nor I was able to verify online. Her job was to listen to voice recordings to see if they match up with text in order to evaluate the accuracy of a voice-to-text program.
“At first, I thought these sound bites were completely random. Then I began to notice a pattern. Soon, I realized that I was hearing people’s commands given to their mobile devices,” FallenMyst wrote.
FallenMyst later told me the work she was doing was through CrowdFlower, a crowdsourced data mining company that pays low wages for people to do easy, repetitive data analysis for various third-party companies.
I signed up for CrowdFlower myself and tracked down the post enlisting workers to analyze the voice-to-text examples, for one cent per 10 voice recordings they analyzed.
An example of some of the sound and text files.
While none of the clips had any specific identifying information, it seemed very likely they were obtained from individuals using their phone’s voice recognition. The clips ranged from asking questions like “what time is it?” and “what’s the weather like today?” to giving commands like “text Dakota: ‘Sorry, I’m bored.” There were also plenty of examples of the silly, joking-around babble that many people spew at their phones from “will you marry me?” to “show me some boobies.”
While the project page did not say where the voice recordings came from, many clips I listened to were addressed to Galaxy phones:
“Galaxy, can you turn on the music?”
“Hi Galaxy, who sold more rock records than anybody else?”
“Galaxy, will you have sex with me?”
Samsung’s terms of service state that the company may collect voice information “such as recordings of your voice that we make (and may store on our servers) when you use voice commands to control a service,” and that Samsung works with a third-party speech-to-text conversion service that may receive and store this information. Though I didn’t identify any Siri-specific voice clips, Apple has similar terms of service when it comes to collecting voice data, but it does not reference sending that data to a third party.
Apple spokesperson Trudy Muller told Wired that the company strips personal information from voice recordings before storing it for analysis it within Apple to improve the software. I reached out to both Apple and Samsung but did not hear back by the time this story was published. We will update this story if either company provides a response.
But while it may be within the legal limits of the companies to farm out these short, anonymous voice clips to strangers online, it’s certainly not a well-known practice, explained Christopher Soghoian, the principal technologist at the American Civil Liberties Union.
“Many Americans would probably be shocked to learn that the contents of what they’re saying is even being transmitted to Apple or Google or Samsung. I think many people probably think that Siri’s only on their phone,” Soghoian told me.
But Soghoian stopped short of calling it an invasion of privacy, telling me that the company’s motivations seem to be to improve their voice recognition software, not to broadcast personal information about its customers. He added that if we want voice recognition software to get better, the companies creating it will need to collect data and, until the software improves, a human will have to listen to that data.
The problem arises when most people dictating at their phone don’t realize that another person might someday listen to it, Soghoian said.
“Consumers have a certain expectation about what’s happening when they interact with a company. People don’t like it when they think they’re talking to a computer and they’re not or vice versa,” Soghoian told me.
And as one of the people who did listen to these recordings, I can understand why. There was no personal information or any way to identify the voices in the brief, one-to-five-second snippets. But there’s something about hearing a total stranger say “love you. Send.” that feels transgressive.
“People text really sensitive things sometimes,” Soghoian said.
“Just because the text message doesn’t begin with ‘my name is and my social security number is’ doesn’t mean there isn’t going to be sensitive information in there.”