How federated learning could shape the future of AI in a privacy-obsessed world

testsetset

You may not have noticed, but two of the world’s most popular machine learning frameworks — TensorFlow and PyTorch — have taken steps in recent months toward privacy with solutions that incorporate federated learning.

Instead of gathering data in the cloud from users to train data sets, federated learning trains AI models on mobile devices in large batches, then transfers those learnings back to a global model without the need for data to leave the device.

As part of the latest release of Facebook’s popular deep learning framework PyTorch last month, the company’s AI Research group rolled out Secure and Private AI, a free two-month Udacity course on the use of methods like encrypted computation, differential privacy, and federated learning. The first course began last week and is being taught by Andrew Trask, a senior research scientist at Google’s DeepMind. He’s also the leader of Openmined, a privacy-focused open source AI community that in March released PySyft to bring PyTorch and federated learning together.

“It’s not just Facebook, I think the [AI] field in general is looking at this direction pretty seriously,” PyTorch creator Soumith Chintala told VentureBeat in an interview. “Yeah, I think you will absolutely see more effort, more direction, [and] more packages, both in terms of PyTorch and others, coming in this direction for sure.”

June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

As privacy becomes a selling point, federated learning is poised to grow in popularity among both tech giants and industries where privacy protection is required, like health care.

Building privacy into AI

Google AI researchers first introduced federated learning in 2017, and since then it’s been cited more than 300 times by research scientists, according to arXiv. In March, Google released TensorFlow Federated to make federated learning easier to perform with its popular machine learning framework.

At the Google I/O conference in May 2019, CEO Sundar Pichai talked about federated learning as part of his pitch to the world that Google is serious about privacy for all, alongside features like Incognito Mode in Google Maps and using your Android phone as a security key for two-step verification. Speed improvements with on-device machine learning will also be making Google Assistant up to 10 times faster in the coming months.

Back in 2017, Gboard, the Android device keyboard, began to use federated learning to learn new words from users and predict the next word or emoji to use.

“It’s still very early, but we are excited about the progress and the potential of federated learning across many more of our products,” Pichai said onstage during the 2019 keynote address.

Above: Federated learning depiction shared during Google I/O keynote address

Beyond giving Android users a smarter keyboard, Google is exploring the use of federated learning to improve security, Google head of account security Mark Risher told VentureBeat AI staff writer Kyle Wiggers in a recent phone interview. Federated learning will enable malicious third parties to test against on-device anti-phishing security models, so it’s not a great fit in security yet, but they’re working towards that goal, Risher said.

Federated learning still faces challenges, though, including an inability to inspect training examples, bandwidth issues, and the need for a WiFi connection, and for labeling to be naturally inferred from user interactions.

Why federated learning improves privacy

Updates sent from devices can still contain some personal data or tell you about a person, and so differential privacy is used to add gaussian noise to data shared by devices, Google AI researcher Brendan McMahan said in a 2018 presentation.

Distributing model training and predictions to devices instead of sharing data in the cloud also saves battery and bandwidth, since you would have to download the model on Wi-Fi, he said.

Use of federated learning, for example, led to a 50x decrease in the number of rounds of communication necessary to get a reasonably accurate CIFAR convolutional neural net for computer vision.

Looking at things in the aggregate means the server doesn’t need very much data from devices, McMahan said.

“In fact, all the server really needs to know is the average of the updates or the sum of those updates. It doesn’t care about any individual update,” he said in the presentation. “Wouldn’t it be great if Google could not see those individual updates and only got that aggregate?”

McMahan was coauthor of the influential 2017 research paper introducing federated learning to the world. A team of Google AI researchers including McMahan and Ian Goodfellow also authored a heavily cited 2016 paper titled “Deep Learning with Differential Privacy.” Goodfellow left Google in 2019 to be director of a machine learning special projects group at Apple.

In 2016, a year before Google introduced federated learning and differential privacy for Gboard, Apple did the same for QuickType and emoji suggestions in iOS 10.

Applications for protected data

Federated learning’s ability to mask data has led to exploration of its applications in industries like health care. The technique is powering a platform from Owkin, a company backed by GV. The platform helps medical professionals conduct tests and experiments to predict disease evolution and drug toxicity. In recent months, AI researchers from Harvard University, MIT’s CSAIL, and Tsinghua University’s Academy of Arts and Design devised a method to analyze electronic medical records with federated learning.

Training models with encrypted or protected data isn’t an altogether new thing. For example, Microsoft AI researchers applied neural networks to encrypted data for its CryptoNets model back in 2016.

However, federated learning and approaches that deliver machine intelligence without collection of raw data will likely grow in popularity as people care more about privacy and more device manufacturers turn to on-device machine learning.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Building privacy into AI

Why federated learning improves privacy

Applications for protected data

The AI insights you need to lead