Skip to main content

Ambient.ai is building an AI-powered video analysis system

Watch all the Transform 2020 sessions on-demand here.


A new startup is trying to upend the balance of power among AI companies with what it sees as a new approach to providing machine intelligence for video footage.

Ambient.ai launched today with the promise of providing developers with a way to determine the contents of a video automatically through the application of deep learning. The company’s technology processes video and can then provide captions for its contents, ranging from broad context about what’s taking place in a chunk of footage (for example, saying “this is a busy street”) while also captioning specific actions, like saying “there is a man walking.”

All of that information is coming out of one algorithm, which means specific captions can benefit from the broader context, and vice versa.

“We think it’s the first functional equivalent to the human visual system,” Ambient.ai cofounder and CEO Shikhar Shrestha said in an interview. “Because we don’t detect chairs, and detect coffee cups, and detect people, then try to piece that together. We actually understand the scene more holistically, and we use the time dimension.”


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


While some video intelligence services will determine the contents of videos on a frame-by-frame basis, Ambient.ai says that its technology analyzes video files, which means that it can work faster and better than other systems.

There are no shortage of challenges ahead for Ambient.ai, however. First and foremost, it’s hard to know if the company’s product works as advertised. While Ambient has videos showing that it works, the company was unwilling to provide a live demonstration of the tech.

Video intelligence is also one of the areas on which major technology companies are focusing their development efforts. Microsoft, Google, and Amazon all have cloud services available that provide insights based on the content of videos. Still, Shrestha said he thinks his company has a head start over the giants, thanks to its unified model and native processing of video.

The company has processed more than 20,000 hours of video from its customers, though it won’t disclose who those customers are. The data Ambient.ai uses to train its video processing algorithm is annotated using crowdsourcing services before being fed into the system.

Ambient.ai was founded by a pair of Stanford-trained AI experts. Shrestha holds a double Master of Science degree in engineering from the university, where he focused on neuroscience, imaging, robotics, and AI. He previously worked at Google helping the Project Tango team.

Vikesh Khanna, the company’s cofounder and CTO, holds a MS in computer science, focused on artificial intelligence and machine learning. He previously worked at Dropbox building that company’s large-scale analytics system. Ambient.ai grew out of the research they did while at Stanford.

The company was a part of Y Combinator’s latest class, and has financial backing from YC, SV Angel, Stanford, Inevitable Ventures, Western Technology Investment, and others. Jyoti Bansal, one of Ambient.ai’s investors, said in an interview he was drawn to the company because of its systematic approach.

Looking towards the future, Shrestha said that he believes his company will have a video intelligence algorithm that’s as accurate as a human being within five years.