Skip to main content

Amazon Transcribe can now automatically redact personally identifiable data

Attendees at Amazon's annual cloud computing conference walk past the AWS logo
Attendees at Amazon's annual cloud computing conference walk past the AWS logo in Las Vegas, November 30, 2017.
Image Credit: Reuters / Salvador Rodriguez/File Photo

Watch all the Transform 2020 sessions on-demand here.


Amazon is adding a new privacy-focused feature to its business transcription service, one that automatically redacts personally identifiable information (PII), such as names, social security numbers, and credit card credentials.

Amazon Transcribe is part of Amazon’s AWS cloud unit and was launched in general availability in 2018. An automatic speech recognition (ASR) service, Transcribe enables enterprise customers to convert speech into text, which can help make audio content searchable from a database, for example. Contact centers can also use the tool to mine call data for insights and sentiment analysis. However, privacy issues have cast a spotlight on how technology companies store and manage consumers’ data.

Privacy

Text-to-speech services can be used to search for keywords and sentiment at a later date, but phone calls often feature significant private data that may be transcribed by Amazon and stored in a searchable database — even if that information is not necessary for analysis. Meanwhile, regulations are springing up around the world to protect consumer data — including the recently implemented California Consumer Privacy Act (CCPA) and Europe’s General Data Protection Regulation (GDPR).

Against this backdrop, Amazon Transcribe will now enable companies to automatically redact personal data, including credit/debit card numbers, expiration dates, CVV codes, PINs, social security numbers, bank account numbers, customer names, email addresses, phone numbers, and postal addresses. It’s worth noting that Google Cloud Platform offers a data loss prevention API that could be used in conjunction with its speech-to-text service to identify and redact sensitive data. But building automated redaction directly into Amazon Transcribe should make the process a lot easier to implement.


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


Companies using Amazon Transcribe can use automatic redaction as they see fit and can choose which PII elements they wish to obfuscate. The transcribed text will then display a [PII] tag in place of the sensitive information, and the corresponding timestamps mean anyone with sufficient system access will still be able to locate the necessary PII in the original audio file. This may also prove useful if a company wants to carry out extra audio processing to fully redact the information in the original recording.

Amazon Transcribe is available in 31 languages, six of which are supported by real-time transcription, though for now the automated redaction feature is limited to U.S. English. The feature is billed monthly at a rate of $0.00004 per second of content.