MIT Researchers Taught An Algorithm to Generate Sound Based on Silent Videos
Hit an object enough times and even an algorithm can tell you what it sounds like.
Only when the new algorithm sees an object being hit does it properly work to determine exactly what type of sound is being "When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it," explained CSAIL PhD candidate and research paper lead author Andrew Owens to MIT News. "An algorithm that simulates such sounds can reveal key information about objects' shapes and material types, as well as the force and motion of their interactions with the world." The program uses "drop learning," in which the artificial intelligence is taught pattern recognition while sorting through large amounts of data. In this case, that data consisted of approximately 1,000 videos covering about 46,000 different sounds. The database of sounds, titled "Greatest Hits," has been made freely available for other researchers.
"To then predict the sound of a new video, the algorithm looks at the sound properties of each frame of that video, and matches them to the most similar sounds in the database," Owens elaborated. "Once the system has those bits of audio, it stitches them together to create one coherent sound."
It's not perfect, though: Objects visibly hitting other things, or as the MIT students say it, "visually indicated sounds," are not the entirety of what we take in with our senses, of course. The algorithm is limited by the quality of both the "performance" and the video. A drum stick moving in a less patterned way can cause issues, and the algorithm obviously can't detect ambient noises. Still, this is a huge advancement in computers being able to show critical thinking abilities."A robot could look at a sidewalk and instinctively know that the cement is hard and the grass is soft, and therefore know what would happen if they stepped on either of them," Owens noted. "Being able to predict sound is an important first step toward being able to predict the consequences of physical interactions with the world."