Amazon's AI generates images of clothing to match text queries

Watch all the Transform 2020 sessions on-demand here.

Generative adversarial networks (GANs) — two-part AI models consisting of a generator that creates samples and a discriminator that attempts to differentiate between the generated samples and real-world samples — have been applied to tasks ranging from video, artwork, and music synthesis to drug discovery and misleading media detection. They’ve also made their way into ecommerce, as Amazon revealed in a blog post this morning. Scientists at Amazon describe a GAN that generates clothing examples to match product descriptions, which they say could be used to refine customer text queries. For instance, if a shopper searched for “women’s black pants” and then add the word “petite” and then the word “capri,” the on-screen images would adjust accordingly with each new word.

It’s not unlike the GAN model commercialized by startup Vue.ai, which susses out clothing characteristics and learns to produce realistic poses, skin colors, and other features. From snapshots of apparel, it’s able to generate model images in every size up to 5 times faster than a traditional photoshoot.

Amazon’s proposed system — ReStGAN — is a modification of an existing system — StackGAN — that produces images by splitting them into two parts. Using a GAN, it first generates a low-resolution image directly from text, after which it upsamples the image with a GAN to a higher-resolution version with textures and natural coloration. The GANs are trained with a long short-term memory AI model that processes sequential inputs in order, enabling them to refine images as successive words are added to the inputs. And to make the task of synthesizing from the descriptions easier, the system is restricted to three product classes — pants, jeans, and shorts — for which the training images are standardized (i.e., the backgrounds are removed and the images are cropped and resized so that they’re alike in shape and scale).

ReStGAN

June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

The research team trained the system in an unsupervised fashion, meaning the training data consisted of product titles and images that didn’t require any additional human annotation. The team increased the system’s stability using an auxiliary classifier that categorized images generated by the model according to three properties: apparel type (pants, jeans, or shorts), color, and whether they depicted men’s, women’s, or unisex clothing. The researchers also grouped colors in a representational space called LAB, which was designed so that the distance between points corresponded to perceived color differences, forming the basis for a lookup table that maps visually similar colors to the same features of the textual descriptions.

The ability to retain old visual features while adding new ones is one of the novelties of the system, according to the researchers, the other being the color model, which yields images whose colors better match textual inputs. In experiments, the team reports that ReStGAN improved product classification by type up to 22% and gender up to 27%, compared with the previous best-performing models based on the StackGAN architecture. Color improved 100%.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

The AI insights you need to lead