Artificial Intelligence

Nvidia Can Artificially Create Slow Motion That Is Better Than a 300,000 FPS Camera

Nvidia's software and Tesla V100 GPUs generate extra frames to fill in video gaps.

Ian Birnbaum

There's a real joy to capturing slow-motion video of anything, whether it's a great sports moment, a hilarious surprise, or a rock-solid punch. The trouble is knowing ahead of time that something slow-motion-worthy is about to happen, since converting regular video into slow-mo satisfaction is a choppy process.

Enter: the Nvidia research lab's super slow-motion AI framework. Researchers used a set of high-powered Tesla V100 GPUs and a deep-learning neural network to generate a nearly perfect, smooth slow-motion video out of any standard video clip.

A quick primer on slow-mo video: Most video you record on your phone's camera, for example, captures images at 30 frames per second, or fps. That video is then played back at the same speed, resulting in "normal" video. If you used a high-speed camera rig to capture 240fps, then played that footage back at 30fps, the difference in timescales would give you the slow-motion effect. Slow-motion effects come from how footage is recorded and how it is played back.

The difficult thing about converting regular video to slow-motion video, then, is finding extra frames to show. No matter what you do, a four-second video clip recorded at 30fps only has 120 frames in it. If you play them back really slow, there's a fine line between where they look like slow-motion video and where they stutter into a stop-motion slideshow.

Nvidia fixed that problem by asking their AI to create the missing frames based on its best guess. "The team trained their system on over 11,000 videos of everyday and sports activities shot at 240 frames-per-second," Nvidia's research blog writes. "Once trained, the convolutional neural network predicted the extra frames."

It's a cool effect—especially when they take clips from The Slow Mo Guys and slow them down even further—but I can't help but feel a sinister edge to this kind of image manipulation. We're already well past the point where an innocent Instagram photo can be turned into porn without the subject's consent. Image tracking and voice synthesizing, again using neural networks and deep learning, can create a false presidential speech and animate lips to match it. Is a video "real" if every tenth frame is computer generated? How about every third frame? Does it count as CGI if the computer is generating every other frame?

I'm just asking questions here, because I haven't got a damn clue about the answers.