Skip to main content

MLPerf: Google’s Cloud TPUs and Nvidia’s Tesla V100 break AI training records

Nvidia
Nvidia
Image Credit: Khari Johnson / VentureBeat

Watch all the Transform 2020 sessions on-demand here.


Nvidia and Google Cloud set AI training time performance records, according to the latest round of benchmark results from the MLPerf benchmark consortium. Benchmarks help AI practitioners adopt common standards for measuring the performance and speed of hardware used to train AI models.

MLPerf v0.6 examines the training performance of machine learning acceleration hardware in 6 popular usage categories. Among results announced today: Nvidia’s Tesla V100 Tensor Core GPUs used an Nvidia DGX SuperPOD to complete on-premise training of the ResNet-50 model for image classification in 80 seconds. By contrast, the same task using a DGX-1 station in 2017 took 8 hours to complete model training. Reinforcement learning with Minigo, an open source implementation of AlphaGoZero model, took place in 13.5 minutes, also a new record.

At Nvidia, the latest training benchmark results are primarily the result of advances in software.

“In just a matter of seven months on the same DGX-2 station, our customers can now enjoy up to 80% more performance, and that’s due to all the software improvements, all the work that our ecosystem is doing,” a company spokesperson said in a phone call.


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


Google Cloud’s TPU v3 Pods also demonstrated record performance results in machine translation from English to German of the Transformer model in 51 seconds. TPU pods also achieved record performance in the image classification benchmark of the ResNet-50 model with the ImageNet data set, and model training in another object detection category in 1 minute and 12 seconds.

Google Cloud TPU v3 Pods capable of harnessing the power of more than 1,000 TPU chips were first made available in public beta in May.

Submissions to the latest round of training benchmark tests were made by Intel, Google, and Nvidia. Nvidia and Google demonstrated they make some of the fastest hardware for training AI models in the world when MLPerf shared the first training benchmark results in December 2018.

This news follows the launch of MLPerf’s inference benchmarks for computer vision and language translation last month. Results of the inaugural MLPerf inference benchmark will be reviewed in September and shared publicly in October, MLPerf Inference Working Group cochair David Kanter told VentureBeat in a phone interview.

MLPerf is a group of 40 organizations that play key roles in the AI hardware and model creation space, such as Amazon, Arm, Baidu, Google, and Microsoft.