Skip to main content

Google launches TensorFlow.Text library for language AI models

testsetset

Google today introduced TensorFlow.Text, a library for preprocessing language models with TensorFlow. The open source machine learning framework created by the Google Brain team has seen more than 41 million downloads.

TensorFlow.Text can be installed using PIP and comes with the ability to utilize tokens to break apart and analyze text like words, numbers, and punctuation.

At launch, TensorFlow.Text can recognize white space, unicode script, and predetermined sequences of word fragments like suffixes or prefixes that Google calls wordpieces. Wordpieces are commonly used in approaches like BERT, a pretraining technique for language models Google open-sourced last fall.

The library also comes with ops for normalization, n-grams, and sequence constraints for labeling, according to a Medium post announcing the news.


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


TensorFlow.Text’s tokenizers use RaggedTensors, a new kind of tensor made for recognizing text. RaggedTensors and Unicode support for TensorFlow were first detailed by Google engineer Mark Omernick earlier this year at the TensorFlow Dev Summit.

The news comes just days after the beta release of TensorFlow 2.0. The latest version of Google’s open source framework was released in alpha in March at the TensorFlow Dev Summit. TensorFlow 2.0 uses fewer APIs, deeper Keras integration, and improvements to runtime for Eager Execution.

TensorFlow.Text is the latest dedicated library introduced by Google in the past few months to help people accomplish specific tasks with machine learning. TensorFlow Graphics was released last month and is designed to bring more deep learning to graphics and 3D models.

Perhaps the most popular is TensorFlow Lite for embedded devices, which is now used on more than 2 billion devices, Google said earlier this year. Google uses TensorFlow Lite to power things like speech detection on GBoard and edge detection in Google Photos.

In March, Google launched TensorFlow Privacy as well as TensorFlow Federated, an on-device machine learning method that can ensure better user privacy protections. The company sees them as a way to make privacy a priority for developers. TensorFlow.js and TensorFlow Swift, versions of the framework for JavaScript and iOS developers, also received upgrades this spring.