Skip to main content

Facebook’s DeepFovea AI promises power-efficient VR foveated rendering

DeepFovea uses AI to reconstruct pixels in your peripheral vision rather than rendering them at lower fidelity, saving considerable processing time.
DeepFovea uses AI to "reconstruct" pixels in your peripheral vision rather than rendering them at lower fidelity, saving considerable processing time.
Image Credit: Facebook

Foveated rendering addresses a growing challenge for VR headsets, rendering sharp details for your eye’s visual sweet spot — the fovea — and a simpler, blurrier version for your peripheral vision. Now engineers at Facebook Reality Labs have come up with DeepFovea, an AI-assisted alternative that creates “plausible peripheral video” rather than actually rendering accurate peripheral imagery. The new process is known as “foveated reconstruction,” and Facebook says it achieves more than 14 times compression on RGB video with no significant degradation in user-perceived quality.

When capturing a video stream, DeepFovea samples only 10% of the pixels in each video frame, focusing largely but not exclusively on the area where the user’s eye is focused, represented by the lizard head above. By comparison, the peripheral area is sampled only by scattered dots that become less dense further from the eye’s focus area. The system then uses trained generative adversarial neural networks to reconstruct each frame from the tiny samples, while relying on the stream’s temporal and spatial content to fill in details in a stable rather than jittery manner.

As the images above show, the heavily but not fully sampled lizard head is essentially indistinguishable from frame to frame, while the adjacent tree bark in the “reconstructed” image isn’t as sharp or detailed as the “reference” pixels. But it’s not supposed to be. A traditional foveated rendering system would depict those pixels as low-resolution flat-shaded blocks, while DeepFovea preserves — or more accurately, approximates — more of the original shapes and colors.

The key reason DeepFovea matters is that it offers a superior combination of power efficiency and image quality compared with standard foveated rendering. Facebook’s claim is that the 14x reduction in rendering means that it will be able to deliver real-time, low-latency video to displays that depend upon gaze detection — a necessary step in building lightweight VR and AR headsets that will display high-resolution graphics originally rendered in the cloud. All-day wearable Oculus AR headsets are said to be impractical until mobile chip power consumption drops as dramatically for real-time 3D mapping as we’re seeing for streamed video.

Facebook’s Michael Abrash first hinted at the concepts underlying DeepFovea last year at Oculus Connect 5, suggesting that in the future — at some point in the next five years — deep learning-based foveation and good eye tracking would come together to enable higher-resolution VR headsets such as its prototype Half Dome. At Oculus Connect 6 this year, Abrash said that the company will be testing next-generation Half Dome hardware in its own offices before deploying it to the public.

Rather than keeping DeepFovea solely to itself while it works on next-generation headsets, Facebook is releasing a sample version of the network architecture for researchers, VR engineers, and graphics engineers. The company is presenting the underlying research paper at Siggraph Asia tonight, and will make the samples available thereafter.