Skip to main content

Amazon researchers’ method adds classes to AI classifiers more quickly

Image Credit: Shutterstock

Watch all the Transform 2020 sessions on-demand here.


Classifiers are a staple of modern-day machine learning. Simply put, they categorize input data — photos, videos, objects, and recordings — by type, and do it very efficiently. However, problems arise when a classifier needs a new class — that is, a new category. Adding even one new class is traditionally arduous and involves lots of data collection and model retraining.

But scientists at Amazon’s Alexa research division say it doesn’t have to be that way.

In a new blog post and accompanying paper (“Transfer Learning for Sequence Labeling using Source Model and Target Data”), researchers at Amazon’s Alexa division describe a method for updating a classifier using only training data for the new class. This, they say, demonstrates that it’s possible to transfer an AI system and its learned parameters —  the values used to control certain properties of the model — into a new system trained to identify an additional class.

“The problem of adapting an existing network to new classes of data is an interesting one in general, but it’s particularly important to Alexa,” wrote Alessandro Moschitti, principal scientist on the Alexa Search team. “Alexa scientists and engineers have poured a great deal of effort into Alexa’s core functionality, but through the Alexa Skills Kit, we’ve also enabled third-party developers to build their own Alexa skills — 70,000 and counting. The type of adaptation — or ‘transfer learning’ — that we study in the new paper would make it possible for third-party developers to make direct use of our in-house systems without requiring access to in-house training data.”


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


In the course of their research, the team set about adding a class to a neural network — layers of mathematical functions modeled after neurons in the brain — trained to identify people and organizations in online news articles. They kept the original classifier, but passed its output through a separate network — a “neural adapter” — the outputs of which they fed into a second, parallel classifier trained on data for the new class. Finally, they trained the adapter and new classifier together.

The result: a new location-classifying network trained with the people- and organization-categorizing network and its parameters.

The team tested two network architectures, one of which had a conditional random field (CRF) — a class of statistical modeling method often applied in pattern recognition and used for structured prediction. Additionally, they tried two different transfer learning methods: one that relied on the aforementioned neural adapter, and another than expanded the size of the trained classifer’s output layer of functions and the layers of functions immediately beneath it.

In the end, they found that the AI system with the CRF retained high accuracy on the original data — 91.08 percent — and achieved 90.73 percent accuracy on the new data.

“Our experiments … show that … the learned knowledge in the source model can be effectively transferred when the target data contains new categories,” Moschitti and colleagues wrote, ” and … our neural adapter further improves such transfer.”