A look back at some of AI's biggest video game wins in 2018

Watch all the Transform 2020 sessions on-demand here.

For decades, games have served as benchmarks for artificial intelligence (AI).

In 1996, IBM famously set loose Deep Blue on chess, and it became the first program to defeat a reigning world champion (Garry Kasparov) under regular time controls. But things really kicked into gear in 2013 — the year Google subsidiary DeepMind demonstrated an AI system that could play Pong, Breakout, Space Invaders, Seaquest, Beamrider, Enduro, and Q*bert at superhuman levels. In March 2016, DeepMind’s AlphaGo won a three-game match of Go against Lee Sedol, one of the highest-ranked players in the world. And only a year later, an improved version of the system (AlphaZero) handily defeated champions at chess, a Japanese variant of chess called shogi, and Go.

The advancements aren’t merely advancing game design, according to folks like DeepMind cofounder Demis Hassabis. Rather, they’re informing the development of systems that might one day diagnose illnesses, predict complicated protein structures, and segment CT scans. “AlphaZero is a stepping stone for us all the way to general AI,” Hassabis told VentureBeat in a recent interview. “The reason we test ourselves and all these games is … that [they’re] a very convenient proving ground for us to develop our algorithms. … Ultimately, [we’re developing algorithms that can be] translate[ed] into the real world to work on really challenging problems … and help experts in those areas.”

With that in mind, and with 2019 fast approaching, we’ve taken a look back at some of 2018’s AI in games highlights. Here they are for your reading pleasure, in no particular order.

June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

Montezuma’s Revenge

Above: Map of level one in Montezuma’s Revenge.

Image Credit: Wikimedia Foundation

In Montezuma’s Revenge, a 1984 platformer from publisher Parker Brothers for the Atari 2600, Apple II, Commodore 64, and a host of other platforms, players assume the role of intrepid explorer Panama Joe as he spelunks across Aztec emperor Montezuma II’s labyrinthine temple. The stages, of which there are 99 across three levels, are filled with obstacles like laser gates, conveyor belts, ropes, ladders, disappearing floors, and fire pits — not to mention skulls, snakes, spiders, torches, and swords. The goal is to reach the Treasure Chamber and rack up points along the way by finding jewels, killing enemies, and revealing keys that open doors to hidden stages.

Montezuma’s Revenge has a reputation for being difficult (the first level alone consists of 24 rooms), but AI systems have long had a particularly tough go of it. DeepMind’s groundbreaking Deep-Q learning network in 2015 — one which surpassed human experts on Breakout, Enduro, and Pong — scored a 0 percent of the average human score of 4,700 in Montezuma’s Revenge.

Researchers peg the blame on the game’s “sparse rewards.” Completing a stage requires learning complex tasks with infrequent feedback. As a result, even the best-trained AI agents tend to maximize rewards in the short term rather than work toward a big-picture goal — for example, hitting an enemy repeatedly instead of climbing a rope close to the exit. But some AI systems this year managed to avoid that trap.

DeepMind

In a paper published on the preprint server Arxiv.org in May (“Playing hard exploration games by watching YouTube“), DeepMind described a machine learning model that could, in effect, learn to master Montezuma’s Revenge from YouTube videos. After “watching” clips of expert players and by using a method that embedded game state observations into a common embedding space, it completed the first level with a score of 41,000.

In a second paper published online the same month (“Observe and Look Further: Achieving Consistent Performance on Atari“), DeepMind scientists proposed improvements to the aforementioned Deep-Q model that increased its stability and capability. Most importantly, they enabled the algorithm to account for reward signals of “varying densities and scales,” extending its agents’ effective planning horizon. Additionally, they used human demonstrations to augment agents’ exploration process.

In the end, it achieved a score of 38,000 on the game’s first level.

OpenAI

Above: An agent controlling the player character.

Image Credit: OpenAI

In June, OpenAI — a nonprofit, San Francisco-based AI research company backed by Elon Musk, Reid Hoffman, and Peter Thiel — shared in a blog post a method for training a Montezuma’s Revenge-beating AI system. Novelly, it tapped human demonstrations to “restart” agents: AI player characters began near the end of the game and moved backward through human players’ trajectories on every restart. This exposed them to parts of the game which humans had already cleared, and helped them to achieve a score of 74,500.

In August, building on its previous work, OpenAI described in a paper (“Large-Scale Study of Curiosity-Driven Learning“) a model that could best most human players. The top-performing version found 22 of the 24 rooms in the first level, and occasionally discovered all 24.

What set it apart was a reinforcement learning technique called Random Network Distillation (RND), which used a bonus reward that incentivized agents to explore areas of the game map they normally wouldn’t have. RND also addressed another common issue in reinforcement learning schemes — the so-called noisy TV problem — in which an AI agent becomes stuck looking for patterns in random data.

“Curiosity drives the agent to discover new rooms and find ways of increasing the in-game score, and this extrinsic reward drives it to revisit those rooms later in the training,” OpenAI explained in a blog post. “Curiosity gives us an easier way to teach agents to interact with any environment, rather than via an extensively engineered task-specific reward function that we hope corresponds to solving a task.”

On average, OpenAI’s agents scored 10,000 over nine runs with a best mean return of 14,500. A longer-running test yielded a run that hit 17,500.

Uber

OpenAI and DeepMind aren’t the only ones that managed to craft skilled Montezuma’s Revenge-playing AI this year. In a paper and accompanying blog post published in late November, researchers at San Francisco ride-sharing company Uber unveiled Go-Explore, a family of so-called quality diversity AI models capable of posting scores of over 2,000,000 and average scores over 400,000. In testing, the models were able to “reliably” solve the entire game up to level 159 and reach an average of 37 rooms.

To reach those sky-high numbers, the researchers implemented an innovative training method consisting of two parts: exploration and robustification. In the exploration phase, Go-Explore built an archive of different game states — cells — and the various trajectories, or scores, that lead to them. It chose a cell, returned to that cell, explored the cell, and, for all cells it visited, swapped in a given new trajectory if it was better (i.e., the score was higher).

This “exploration” stage conferred several advantages. Thanks to the aforementioned archive, Go-Explore was able to remember and return to “promising” areas for exploration. By first returning to cells (by loading the game state) before exploring from them, it avoided over-exploring easily reached places. And because Go-Explore was able to visit all reachable states, it was less susceptible to deceptive reward functions.

The robustification step, meanwhile, acted as a shield against noise. If Go-Explore’s solutions were not robust to noise, it robustified them into a deep neural network with an imitation learning algorithm.

“Go-Explore’s max score is substantially higher than the human world record of 1,219,200, achieving even the strictest definition of ‘superhuman performance,'” the team said. “This shatters the state of the art on Montezuma’s Revenge both for traditional RL algorithms and imitation learning algorithms that were given the solution in the form of a human demonstration.”

Dota 2

Above: OpenAI employees gathered to watch a match.

Image Credit: OpenAI

Valve’s Dota 2 — a follow-up to Defense of the Ancients (DotA), a community-created mod for Blizzard’s Warcraft III: Reign of Chaos — debuted to great fanfare in 2013. It’s what’s known as a multiplayer online battle arena, or MOBA. Two groups of five players, each of which are given a base to occupy and defend, attempt to destroy a structure — the Ancient — at the opposing team’s base. Player characters (heroes) have a distinct set of abilities, and collect experience points and items which unlock new attacks and defensive moves.

It’s more complex than it sounds. The average match contains 80,000 individual frames, during which each character can perform dozens of 170,000 possible actions. Heroes on the board finish an average of 10,000 moves each frame, contributing to the game’s more than 20,000 total dimensions.

OpenAI’s been chipping away at the Dota 2 dilemma for a while now, and demoed an early iteration of a MOBA-playing bot — one which beat one of the world’s top players, Danil “Dendi” Ishutin, in a 1-on-1 match — in August 2017. But it kicked things up a notch in June with OpenAI Five, an improved system capable of playing five-on-five matches with top-ranking human opponents. It beat five groups of players — an OpenAI employee team, a team of audience members who watched the OpenAI employee match, a Valve employee team, an amateur team, and a semi-pro team — in early summer, and in August won two out of three matches against a team ranked in the 99.95th percentile.

To self-improve, OpenAI Five plays 180 years’ worth of games every day — 80 percent against itself and 20 percent against past selves — on 256 Nvidia Tesla P100 graphics cards and 128,000 processor cores on Google’s Cloud Platform. It’s made up of five single-layer, 1,024-unit long short-term memory (LSTM) recurrent neural networks assigned to a single hero and trained using a deep reinforcement model, which rewards the “hero” networks for achieving goals like maximizing kills, minimizing deaths, and assisting fellow teammates.

Fully trained OpenAI Five agents are surprisingly sophisticated. Despite being unable to communicate with each other (a “team spirit” hyperparameter value determines how much or how little each agent prioritizes individual rewards over the team’s reward), they’re masters of basic strategies like lane defense and farming, and even of advanced tactics like rotating heroes around the map and stealing runes from opponents.

“Games have really been the benchmark [in AI research],” Brockman told VentureBeat in an earlier interview. “These complex strategy games are the milestone that we … have all been working towards because they start to capture aspects of the real world.”

Starcraft II

Above: StarCraft II: Wings of Liberty launch

Image Credit: Blizzard

Blizzard’s StarCraft II was released in three parts over roughly four years. It’s a real-time strategy game that’s been hailed as one of the genre’s greatest (though it never gained the following that the original built), owing in large part to its difficulty. In-game resources to maintain and construct units and buildings have to be constantly collected and protected, and while match objectives ultimately depend on the selected game type, effective StarCraft strategies typically require players to juggle not only unit quantities and movements but economics and upgrades.

It’s a lot for an AI system to handle, but Chinese tech giant Tencent made some progress in September. In a white paper, researchers at the company described two AI agents — TSTARBOT1 and TSTARBOT2 — that together were trained to play one-on-one games pitting opposite teams of the same race (Zerg) against each other. On a notoriously tricky stage called Abyssal Reef, the AI system managed to defeat the game’s “Cheater AI,” which has full knowledge of resource and unit locations.

It took training. Lots of training. According to the paper’s authors, more than 1,920 parallel actors with 3,840 processors across 80 machines generated replay transitions at 16,000 frames per second. They processed billions of frames of video over entire days.

The results spoke for themselves. The TSTARBOTs — one of which kept track of the overall strategy while the other performed lower-level tasks like unit management — beat StarCraft II’s AI on the highest difficulty — level 10 — 90 percent of the time. Moreover, they held their own against human players who’d achieved the rank of Platinum and Diamond, the latter of which is two tiers below the highest (Grandmaster).

Quake III Arena

Quake III Arena, unlike StarCraft II and Dota 2, is a first-person shooter notable for its minimalist design; advanced locomotion features such as strafe-jumping and rocket-jumping; range of unique weapons; and speedy pace of play; and its emphasis on multiplayer gameplay. Up to 16 players face off against each other in arenas, or two battle it out head-to-head in tournament stages.

In a blog post in July, DeepMind shared the results of its research and experiments in Quake III. It revealed that it taught an AI agent — cheekily dubbed “For the Win (FTW)” — to beat “most” human players it played against. After completing nearly 450,000 matches involving multiple agents (as many as 30, in some cases, and up to four games concurrently), it went undefeated against human-only teams in Capture the Flag and won 95 percent of games against teams in which humans played with a machine partner.

“We train agents that learn and act as individuals, but which must be able to play on teams with and against any other agents, artificial or human,” the paper’s authors wrote. “From a multi-agent perspective, [Capture the Flag] requires players to both successfully cooperate with their teammates as well as compete with the opposing team, while remaining robust to any playing style they might encounter.”

The AI agents weren’t provided the rules of the game beforehand, and the only reinforcement signal used to was a victory condition — i.e., capturing the most flags within five minutes. But over time, as DeepMind researchers modulated parameters like terrain type, elevations, and locomotion, FTW began to learn strategies like home base defense, following a teammate, and camping out in an opponent’s base to tag them after a flag has been captured. It even got the hang of tagging — i.e., touching an opponent to send them back to their spawn point.

Bonus round: AI in game design

Above: An Nvidia system models digital environments on video footage.

Image Credit: Nvidia

This year’s state-of-the-art game-playing algorithms didn’t just beat the pants off of humans. They also demonstrated a knack for game design.

For instance, researchers at the Politecnico di Milano in Italy described a system which can automatically generate Doom levels.

To “teach” their two-GAN system how to create new stages, they sourced a public database containing all official levels from Doom and Doom 2 and over 9,000 levels contributed by the community. From these, they produced a 1) set of images — one per level — which captured features including the wall, objects, floor heights, walkable areas, and 2) vectors representing in numerical form key level characteristics like size, area, number of rooms.

After 36,000 iterations, the model was able to generate new levels that “captured [the] intrinsic structure of [handcrafted] Doom levels” — a possible step toward systems that might one day free up human designers to focus on “high-level features.”

“Our promising results, although preliminary, represent an excellent starting point for future improvements and highlight a viable alternative to classical procedural generation,” they wrote. “Most generated levels have proved to be interesting to explore and play due to the presence of typical features of Doom maps (like narrow tunnels and large rooms).”

They aren’t the only ones to achieve some success at AI level generation. In December, Nvidia took the wraps off of a system that’s capable of automatically crafting digital environments from video sources.

The development team accomplished the feat by training object classification algorithms to recognize specific objects in scenes, such as buildings, pedestrians, trees, and cars. Next, they used a GAN to model those objects virtually, in three dimensions.

“It’s a new kind of rendering technology, where the input is basically just a sketch, a high-level representation of objects and how they are interacting in a virtual environment,” Nvidia vice president of applied deep learning Bryan Catanzaro told VentureBeat in a phone interview. “Then the model actually takes care of the details, elaborating the textures, and the lighting, and so forth, in order to make a fully rendered image.”

Such a model promises to take a load off of game developers’ shoulders. Currently, blockbusters like Red Dead Redemption 2 and Grand Theft Auto V take teams of hundreds of people years — sometimes close to a decade — to create.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.