Scientists develop way to ‘listen’ to silent videos

5 Aug 2014

A collaboration of research between MIT, Microsoft and Adobe has developed a method of extracting sound from silent video using high-speed cameras and specially designed algorithms.

By focusing a camera with a high rate of frames per second (fps) at, say, a packet of crisps, the researchers will be able to read the minute vibrations within the packet of crisps made by the sound surrounding it which are invisible to the naked eye.

By running the video feed through the program they have developed, vibrations are turned into sound humans can understand. In the case of the experiment, Mary Had a Little Lamb played through a speaker beside the packet of crisps.

The camera itself was kept 15 metres away from the packet, behind a wall of soundproof glass to make sure no sound whatsoever was coming through to the camera. From the video the team posted online, the audio is relatively clear when considering the circumstances.

According to MIT’s official release on the breakthrough, reconstructing audio from video requires that the frequency of the video samples — the number of frames of video captured per second — be higher than the frequency of the audio signal, with the cameras used to capture the sounds running at an fps of 2,000-6,000.

This puts it immeasurably faster than the much more common speeds seen in smartphones of 60fps and yet, the researchers were able to successfully test the same principles to this speed, albeit it at a much less lower quality with regular commercial digital cameras.

They have attributed this ability to a particular ‘quirk’ within the camera’s sensors which meant the scientists were able to infer information about high-frequency vibrations from the lower fps footage.

What possible uses?

While its uses may seem limited, one could theorise that it could be incredibly useful for spies looking to eavesdrop on information in a room by focusing the camera on, say, a plant within a room from another building and being able to hear conversations.

Alexei Efros, an associate professor of electrical engineering and computer science at the University of California at Berkeley said it’s a fantastic discovery.

“We’re scientists, and sometimes we watch these movies, like James Bond, and we think, ‘This is Hollywood theatrics. It’s not possible to do that. This is ridiculous.’ And suddenly, there you have it. This is totally out of some Hollywood thriller. You know that the killer has admitted his guilt because there’s surveillance footage of his potato chip bag vibrating.”

Colm Gorey was a senior journalist with Silicon Republic

editorial@siliconrepublic.com