OpenAI's CoinRun tests the adaptability of reinforcement learning agents

Watch all the Transform 2020 sessions on-demand here.

Reinforcement learning (RL) — an artificial intelligence (AI) training technique that uses rewards or punishments to drive agents toward goals — has a problem: It doesn’t result in highly generalizable models. Trained agents struggle to transfer their experience to new environments. It’s a well-understood limitation of RL, but one that hasn’t prevented data scientists from benchmarking their systems within the environments on which they were trained. That makes overfitting — a modeling error that occurs when a function is too closely fit to a dataset — challenging to quantify.

Nonprofit AI research company OpenAI is taking a stab at the problem with an AI training environment — CoinRun — that provides a metric for an agent’s ability to transfer its experience to unfamiliar scenarios. It’s basically like a classic platformer, complete with enemies, objectives, and stages of varying difficulty,

It follows on the heels of the launch of OpenAI’s Spinning Up, a program designed to teach anyone deep reinforcement learning.

“CoinRun strikes a desirable balance in complexity: the environment is much simpler than traditional platformer games like Sonic the Hedgehog, but it still poses a worthy generalization challenge for state of the art algorithms,” OpenAI wrote in a blog post. “The levels of CoinRun are procedurally generated, providing agents access to a large and easily quantifiable supply of training data.”

June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

As OpenAI explains, prior work in reinforcement learning environments has focused on procedurally generated mazes, community projects like the General Video Game AI framework, and games like Sonic the Hedgehog, with generalization measured by training and testing agents on different sets of levels. CoinRun, by contrast, offers agents a single reward at the end of each level.

OpenAI CoinRun

AI agents have to contend with stationary and moving obstacles, collision with which results in immediate death. It’s game over when the aforementioned coin is collected, or after 1,000 time steps.

As if that weren’t enough, OpenAI developed two additional environments to investigate overfitting: CoinRun-Platforms and RandomMazes. The first contains several coins randomly scattered across platforms, forcing agents to actively explore levels and occasionally do some backtracking. RandomMazes, meanwhile, is a simple maze navigation task.

To validate CoinRun, CoinRun-Platforms, and RandomMazes, OpenAI trained 9 agents, each with a different number of training levels. The first 8 trained on sets ranging from 100 to 16,000 levels, and the final agent trained on an unrestricted set of levels — roughly 2 million in practice — so that it never saw the same one twice.

The agents experienced overfitting at 4,000 training levels, and even at 16,000 training levels; the best-performing agents turned out to be the ones trained with the unrestricted set of levels. And in CoinRun-Platforms and RandomMazes, the agents strongly overfit in all cases.

The results provide valuable insight into the challenges underlying generalization in reinforcement learning, OpenAI said.

“Using the procedurally generated CoinRun environment, we can precisely quantify such overﬁtting,” the company wrote. “With this metric, we can better evaluate key architectural and algorithmic decisions. We believe that the lessons learned from this environment will apply in more complex settings, and we hope to use this benchmark, and others like it, to iterate towards more generalizable agents.”

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

The AI insights you need to lead