testsetset
Google today announced the release of a large corpus of visual deepfakes produced in collaboration with Jigsaw, the tech giant’s internal technology incubator. It’s been incorporated into the Technical University of Munich and the University Federico II of Naples’ new FaceForensics benchmark — an effort that Google co-sponsors — where it’s freely available to researchers for use in developing synthetic video detection techniques.
The release follows on the heels of a corpus of speech containing phrases spoken by the Mountain View company’s text-to-speech models, as part of the AVspoof 2019 competition to develop systems that can distinguish between real and computer-generated speech. Google says it’s been downloaded by more than 150 research and industry organizations to date.
“Since [the] first appearance [of deepfakes] in late 2017, many open-source deepfake generation methods have emerged, leading to a growing number of synthesized media clips,” Google Research scientist Nick Dufour and Jigsaw technical research manager Andrew Gully wrote in a blog post. “While many are likely intended to be humorous, others could be harmful to individuals and society.”
According to Google, compiling the data set required working with paid and consenting actors to record “hundreds” of videos. The company and its partners, including the team behind FaceForensics, then produced “thousands” of deepfakes from the videos, resulting in the collection of real and fake samples.
June 5th: The AI Audit in NYC
Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.
Google says it’ll add to the corpus as deepfake technology evolves over time, and that it will continue to work with partners in the space. “We firmly believe in supporting a thriving research community around mitigating potential harms from misuses of synthetic media, and today’s release of our deepfake dataset in the FaceForensics benchmark is an important step in that direction,” added Dufour and Gully.
AI systems that can be used to generate misleading media have come under increased scrutiny recently. In September, members of Congress sent a letter to National Intelligence director Dan Coats requesting a report from intelligence agencies about the potential impact of deepfakes — videos made using AI that digitally grafts faces onto other people’s bodies — on democracy and national security. Members of Congress speaking with Facebook COO Sheryl Sandberg and Twitter CEO Jack Dorsey also expressed concern about the potential impact of manipulative deepfake videos in a Congressional hearing in late 2018.
There are plenty of reasons to be concerned. Chinese deepfakes-generating app ZAO went viral earlier this year, around the same time reports emerged of what might have been the first-ever use of a synthetic voice to impersonate the CEO of a major corporation.
Fortunately, the fight against deepfakes appears to be ramping up. Last summer, members of DARPA’s Media Forensics program tested a prototypical system that could automatically detect AI-generated videos in part by looking for cues like unnatural blinking. Startups like Truepic, which raised an $8 million funding round in July, are experimenting with deepfakes “detection-as-a-service” business models. And earlier this month, Facebook together with the Partnership on AI, Microsoft, and academics launched the Deepfake Detection Challenge, which will offer up to $10 million in grants and awards to spur the development of deepfake-detecting systems.