FYI.

This story is over 5 years old.

Tech

The Parts of Speech

Like, literally, the parts of words that influence our cognition.

​Not all mispronunciations are created equal: Some mistakes can be glossed right over. Others lead to an endless loop of repeating yourself to ever louder "what?"s and "huh?"s. And thanks to a team of Paris-based researchers, we now have a better idea which ones are which.

The researchers had 48 people listen to deliberately, calculatingly mispronounced, two-syllable words and observed what the listeners could and could not understand. They found that basically, if you're going to screw up a consonant, screw up the voicing; the manner and the place of the syllable are both more important to being understood.

Advertisement

To parse the differences between all three of these, I emailed one of the researchers, Alexander Martin. He told me that "voicing" refers to whether or not the sound requires the speaker to vibrate the vocal folds—the difference between how we say "s" and how we say "z." For some reason this one wasn't a big deal. The team published their results in the Journal of the Acoustical Society of America.

Martin said manner was what screwed people up the most when they were just listening to the recordings, without any visual cues. If you're wondering what manner is, it's "the degree of turbulence we cause the airstream to have when articulating the sound. A 'stop' consonant like /p/ is articulated by completely closing the vocal tract (with both lips in this case) while a 'fricative' consonant, like its name implies, allows air to pass, creating friction (like in the sound /s/)," he said.

When subjects could watch a video of the speaker, however, "place" caused the most problems. "Place of articulation is basically where in the vocal tract we articulate the sound," Martin said. "Labial" placement is on the lips, as opposed to "coronal" which is on the tip of the teeth, or "dorsal" which is in the middle of mouth, where we say "k."If you're lip reading on a video, I guess it makes sense that this would be disruptive.

Of course, any consonant can be described using all three features, Martin explained. "The sound /p/ is a voiceless (no vocal fold vibration) labial (articulated with the lips) stop (the airflow is temporarily cut off)," he gave as an example.

Advertisement

Martin is working on a PhD in Cognitive Science at the École Normale Supérieure in Paris. An important part of understanding the brain is understanding how people mishear things. And then, you can study "how people are able to recover information that does not correspond to what they physically heard," Martin said.

"Phenomena like this have been studied extensively in psycholinguistics as they allow us to gain insight into the way speech sounds are processed," he wrote. "If as listeners we were completely dependent on perfect input, we would be unable to function in our very noisy world…In the end though, listeners are incredibly flexible and able to recover information with 'degraded' input."

Sometimes, the degradation is deliberate.

"When we whisper, we do not tense our vocal folds at all, so much acoustic information about voicing (especially in a language like French) is lost, yet we are able to recognize words all the same," Martin said.

It's a lesson in what's essential for hearing and understanding, a topic of interest not just for cognitive and linguistic scientists, but also for the world of tech. We're going to be talking to our computers more and our computers are going to be talking back. Just last month, Wired declared that "voice control will force an overhaul of the whole internet," usingGPUs that are modeled after the human brain.Yesterday, NPR's Marketplace reported about "NewsHedge Squawk," a new desktop app that "delivers market information" to traders "that's notable and relevant in real time, but it does so audibly—the method of receiving information that human beings react to fastest."

The fastest—if they hear it right, that is.