Watch all the Transform 2020 sessions on-demand here.
Amazon today announced Inferentia, a chip designed by AWS especially for the deployment of large AI models with GPUs, that’s due out next year.
Inferentia will work with major frameworks like TensorFlow and PyTorch and is compatible with EC2 instance types and Amazon’s machine learning service SageMaker.
“You’ll be able to have on each of those chips hundreds of TOPS; you can band them together to get thousands of TOPS if you want,” AWS CEO Andy Jassy said onstage today at the annual re:Invent conference.
Inferentia will also work with Elastic Inference, a way to accelerate deployment of AI with GPU chips that was also announced today.
June 5th: The AI Audit in NYC
Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.
Elastic Inference works with a range of 1 to 32 teraflops of data. Inferentia detects when a major framework is being used with an EC2 instance, and then looks at which parts of the neural network would benefit most from acceleration; it then moves those portions to Elastic Inference to improve efficiency.
The two major processes for what it requires to launch AI models today are training and inference, and inference eats up nearly 90 percent of costs, Jassy said.
“We think that the cost of operation on top of the 75 percent savings you can get with Elastic Inference, if you layer Inferentia on top of it, that’s another 10x improvement in costs, so this is a big game changer, these two launches across inference for our customers,” he said.
The release of Inferentia follows the debut Monday of a chip by AWS purpose-built to carry out generalized workflows.
The debut of Inferentia and Elastic Inference was one of several AI-related announcements made today. Also announced today: the launch of an AWS marketplace for developers to sell their AI models, and the introduction of the DeepRacer League and AWS DeepRacer car, which runs on AI models trained using reinforcement learning in a simulated environment.
A number of services that require no prior knowledge of how to build or train AI models were also made available in preview today, including Textract for extracting text from documents, Personalize for customer recommendations, and Amazon Forecast, a service that generates private forecasting models.