Skip to main content

Google’s AI learns how to navigate environments from limited data

3D Rendering, Robot and laptop, stock exchange trading
Image Credit: Getty Images

Watch all the Transform 2020 sessions on-demand here.


Carnegie Mellon, Google, and Stanford researchers write in a paper that they’ve developed a framework for using weak supervision — a form of AI training where the model learns from large amounts of limited, imprecise, or noisy data — that enables robots to efficiently explore a challenging environment. By teaching the robots to reach only the areas of their surroundings that are relevant, the researchers say their approach speeds up training on various robot manipulation tasks.

The team’s framework — Weakly-Supervised Control (WSC) — learns a corpus with which a software agent can generate its own goals and perform exploration. It incorporates reinforcement learning, a form of training that spurs agents to accomplish goals via rewards. But unlike traditional reinforcement learning, which requires hand-designed rewards that are computationally expensive to obtain, WSC frames the weakly supervised learning problem in a way that provides a form of supervision scalable with the collection of data — and doesn’t require labels in the reinforcement learning loop.

In experiments, the researchers sought to determine whether weak supervision was necessary for learning a disentangled state representation — i.e., a set of features influenced by the actions of the agent. They tasked several models with simulated vision-based, goal-conditioned manipulation tasks of varying complexity. In one environment, agents were tasked with moving a specific object to a goal location, while in another the agents had to open a door to match a goal angle.

The coauthors report that WSC learned more quickly than prior state-of-the-art goal-conditioned reinforcement learning methods, particularly as the complexity of the agents’ various environments grew. Moreover, they say that WSC attained a higher correlation between latent goals and final states, indicating that it learned a more interpretable goal-conditioned policy.


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


However, the researchers concede that WSC isn’t without its limitations. It requires a user to indicate the factors relevant for downstream tasks, which might require expertise, and it only uses weak supervision during pretraining, which might produce representations that don’t generalize to new interactions encountered by the agent. That said, they hope in future work to investigate other forms of weak supervision that can provide useful signals to agents, as well as other ways to leverage these labels.

“Given the promising results in increasingly complex environments, evaluating this approach with robots in real-world environments is an exciting future direction,” wrote the coauthors. “Overall, we believe that our framework provides a new perspective on supervising the development of general-purpose agents acting in complex environments.”