Scale and nuTonomy release nuScenes, a self-driving dataset with over 1.4 million images

Watch all the Transform 2020 sessions on-demand here.

Datasets are the lifeblood of machine learning algorithms — they “teach” artificial intelligence (AI) facts about the world, in a manner of speaking. And in domains such as autonomous driving, it’s vitally important they’re of the highest quality.

That’s why nuTonomy today released a self-driving dataset called nuScenes that it claims surpasses in size and accuracy public datasets like KITTI, Baidu’s ApolloScape, and the Udacity Self-Driving Car library. Scale, a San Francisco-based data labeling startup, provided annotations.

“We’re proud to provide the annotations … as the most robust open source multi-sensor self-driving dataset ever released,” said Scale CEO Alexandr Wang. “We believe this will be an invaluable resource for researchers developing autonomous vehicle systems, and one that will help to shape and accelerate their production for years to come.”

NuTonomy compiled more than 1,000 scenes containing 1.4 million images, 400,000 sweeps of lidars (laser-based systems that judge the distance the distance between objects), and 1.1 million three-dimensional bounding boxes (objects detected with a combination of RGB cameras, radar, and lidar). They’ve been meticulously labeled through Scale’s Sensor Fusion Annotation API, which taps AI and teams of humans for data annotation, and they are open-sourced starting this week.

June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

Self-driving car datasets aren’t exactly a rare commodity — just this summer, Oregon-based Flir Systems released 10,000 labeled photos captured by its thermal camera system, Mapillary published 25,000 street-level images, and the University of California Berkeley uploaded 100,000 video sequences captured by RGB cameras. But Scale and nuTonomy claim that nuScenes is more comprehensive than any similar dataset that’s come before it.

As the website explains, it used a combination of six cameras, one lidar, five radars, GPS, and an inertial measurement sensor to capture the nuScenes data. And driving routes in Singapore and Boston were specifically chosen to showcase “challenging” locations, times, and weather conditions.

Scale, which competes against the likes of Mighty AI, Appen, Cloud Factory, Samasource, and Amazon’s Mechanical Turk, has labeled more than 200,000 million miles for clients that include Lyft, Voyage, General Motors, Zoox, and Embark since its founding in 2016. It recently expanded its work into robotics, drones, virtual assistants, and “other solutions” that depend heavily on AI, and in August Scale announced an $18 million funding round led by Index Ventures, with participation from Accel and Y Combinator.

The startup has raised $22.7 million to date and reports that revenue grew 15 times over the past year.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

The AI insights you need to lead