A new artificial intelligence system can now lip read better than humans, according to New Scientist. Though films and pop culture usually show lip reading as some incredible tool that allows you to decode what anyone says, in practice, it’s fairly spotty: Even for experienced lip readers, one estimate puts the amount of speech you can interpret from someone’s lip movements at a mere 30 percent.
But artificial intelligence researchers from Google’s DeepMind and the University of Oxford’s engineering department have been working on a network that transcribes natural sentences just from visuals of people talking with no audio. It can also transcribe audio with no video. Their pre-publication paper is posted on arXIV [PDF].
The system recognizes syllables and short phrases, and has learned on a far-reaching database called “Lip Reading Sentences,” drawn from a half-dozen BBC programs and containing more than 100,000 sentences and 17,500 words. It works independently with both audio and video, helping it decode speech even if the audio stream is noisy or if the audio and video aren’t perfectly aligned.
This model was significantly more accurate than professional lip readers in a comparative test. The experimenters commissioned professional lip readers from a company that provides transcription services, each with around 10 years of experience lip reading in situations as diverse as videos for court use and national events like the British royal wedding. These lip readers could correctly decipher just 12 percent of the words they saw, while the computer model could decipher almost half of the words accurately. Aside from providing more accurate transcription services, “it is possible that research of this type could discern important discriminative cues that are beneficial for teaching lip reading to the hearing impaired,” the researchers write.
Try out your lip-reading skills with the video below:
[h/t New Scientist]