Your Words Can Be Recovered From a Video of a Chip Bag Next to You

When you speak, your voice makes sound waves. Sound waves are just tiny movements in the air around you, so when you speak, the objects around you also move, ever so slightly. A potted plant, a glass of water, or a bag of chips will vibrate in response to the pressure changes caused by your voice. Might it be possible to reconstruct what someone was saying from video of nearby objects alone? A team of MIT computer scientists have figured out how to do just that, turning a chip bag into a “visual microphone.”

This amazing video shows three demonstrations of sound being recovered through video: one where a melody is captured in the vibrations of leaves on a plant, another where speech is captured in the vibration of a chip bag, and a third where a song is identified solely through a video of the ear buds through which the song was playing.

The technique involves computations on pixel-to-pixel differences over time that effectively magnify small movements. Those worried about the potential eavesdropping or spying applications of the technique can take comfort in the fact that it works best with very high rate, memory intensive video capture—but not too much comfort. As shown in the video, it is possible to get a much better than expected result with a regular consumer camera by taking advantage of artifacts resulting from a “rolling shutter” capture. So watch what you say if the cameras are rolling, or at least clean up your snack wrappers before you say it.

There’s more at the pages of lead researchers Abe Davis and Michael Rubenstein. Rubenstein also has a fascinating TEDx talk about the motion magnifying technique that shows how regular video can be transformed to show blood pumping behind skin, a baby’s breathing, or a wine glass pulsating to a singer’s voice.