Back in March, Amazon’s eponymous Amazon Web Services division announced that it would tap Nvidia’s Tesla T4 graphics chips for AI inference, and it said it’d make up to eight of them available per customer via G4 instances in Amazon Elastic Compute Cloud (Amazon EC2). Today, it made good on that promise with the launch in general availability of said G4 instances, which it described as instances optimized to accelerate machine learning and graphics-intensive workloads.
Starting today, customers can launch G4 instances — which are available as on-demand instances, reserved instances, or spot instances — using Windows, Linux, or AWS Marketplace AMIs from Nvidia with Nvidia Quadro Virtual Workstation software preinstalled. A bare metal version will be available in the coming months in the US East (N. Virginia, Ohio), US West (Oregon, N. California), Europe (Frankfurt, Ireland, London), and Asia Pacific (Seoul and Tokyo) regions, with availability in additional regions to follow.
“We focus on solving the toughest challenges that hold our customers back from taking advantage of compute intensive applications,” said AWS compute services VP Matt Garman in a statement. “AWS offers the most comprehensive portfolio to build, train, and deploy machine learning models powered by Amazon EC2’s broad selection of instance types optimized for different machine learning use cases. With new G4 instances, we’re making it more affordable to put machine learning in the hands of every developer. And with support for the latest video decode protocols, customers running graphics applications on G4 instances get superior graphics performance over G3 instances at the same cost.”
In addition to Nvidia’s T4 chips, which pack 2,560 CUDA cores and 320 Tensor cores, the new instances have up to 100 Gbps of networking throughput and feature custom 2nd Generation Intel Xeon Scalable (Cascade Lake) processors paired with up to 1.8 TB of local NVMe storage. They deliver up to 65 TFLOPs of mixed-precision performance (where a TFLOP refers to the calculation of one trillion floating-point operations per second), according to Amazon, and they offer up to a 1.8 times increase in graphics performance and up to 2 times video transcoding capability over the previous generation G3 instances.
June 5th: The AI Audit in NYC
Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.
Amazon says the G4 instances are well-suited to tasks like building and running graphics-intensive applications, such as remote graphics workstations, video transcoding, photorealistic design, and game streaming in the cloud. That’s in addition to AI inferencing tasks like adding metadata to an image, object detection, recommender systems, automated speech recognition, and language translation. To this end, the instances support Amazon SageMaker or AWS Deep Learning AMIs, including popular machine learning frameworks such as Google’s TensorFlow, Nvidia’s TensorRT, MXNet, Facebook’s PyTorch and Caffe2, Microsoft’s Cognitive Toolkit, and Chainer. They’ll also play nicely with Amazon Elastic Inference in the coming weeks, which Amazon says will allow developers to reduce the cost of inference by up to 75%.
The G4 instances join AWS’ P3 instances, which feature Nvidia V100 Tensor Core chips similarly designed for machine learning training in the cloud. In a related development, Amazon last year unveiled Inferentia, a chip with AWS’ Elastic Inference feature that can automatically detect when an AI framework is being used and identify which parts of the algorithm would benefit most from acceleration. Inferentia is expected to become available in EC2 instance types and Amazon’s SageMaker machine learning service this year.