On November 16, 2015, Lyst, an online fashion marketplace based in the United Kingdom, launched its first advertising campaign. Featuring a series of ironic headlines (one simply says “Rip-off”) etched over beautiful images, the campaign emphasizes the company’s identity as a “challenger brand,” whose success “has been driven by marrying insights from data science with the emotional nature of fashion.” (CEO Chris Morton)
Lyst provides fashion consumers with a central platform where they can mix and match millions of products from 11,500 different brands. In this context, data science serves as a virtual personal shopper, recommending products to users based upon insights from their behavior using the site. One might think these recommendations are powered by collaborative filtering, but the world of fashion is far too transient and fickle to support data models matching similar users. Matrices are sparse and inventory drains rapidly (consider flash sales sites like Gilt). Instead, recommendation algorithms align consumer behavior with product features. And as the world of fashion is one dominated by image and appearance, fashion data science has a lot to do with image analysis.
We interviewed Eddie Bell, Lyst’s lead data scientist, about his team’s current efforts to use deep learning to analyze images and personalize recommendations to consumers. We talked about his past, his team’s present, and the fashion industry’s future.
What is Lyst and what is its business model?
Lyst is a London-based online fashion marketplace similar to companies like Asos or Net-A-Porter. We don’t own any fashion products ourselves, but resell them on behalf of affiliated brands and retailers. We actually started off as a purely affiliate marketplace, where retailers rewarded us for driving web traffic to their sites. We scraped the web for semi-structured information about fashion products and then applied machine learning to present this information cleanly and elegantly for consumers from one central website. To make this experience more seamless, we eventually added an our own online checkout system where users could build a look combining different brands and then purchase everything directly from Lyst. We then send transaction information to individual retailers for shipping.
What’s your background and how did you end up working in fashion?
I studied machine learning and pure mathematics in graduate school, but was always drawn to applied and practical problems, to the possibility of building systems that seemed like magic to end users. I started my career working on financial trading systems, but was intrigued by the creativity of working with image data, which comprises the bulk of data we work with in the fashion and retail space. I’ve been at Lyst for the past three years.
How does your data team provide personalized product recommendations to Lyst users?
We started off with a collaborative filtering approach, basing recommendations to given user on items liked and purchased by similar users in the past. Unfortunately, this didn’t work well for our data set. As Lyst sells 11 million items that constantly go in and out of stock, our item-user interaction matrix wasn’t dense (i.e. where most elements are nonzero) and we could never be sure that a given item would be available for the next user.
So we needed a better approach. My colleague Maciej Kula lead research and engineering efforts to build a new recommendation model called LightFM that incorporates item and user metadata into a matrix factorization algorithm. LightFM represents each user and item as the sum of the latent representations of their features, allowing recommendations to generalize to new items (via item features) and to new users (via user features). If a registered Lyst user likes ten items with certain features, we recommend other items with similar features.
We also modulate the frequency and density of recommendations to match user engagement with the site. For example, if a user signs in and looks at three different styles of Wellington boots, we’ll serve up multiple images of similar boots during that browsing session and quickly decay the frequency in future sessions.
What about recommendations for consumers who don’t have a registered Lyst account?
We deal with this often given traffic to our site from Google advertisements or search results. Demographic segmentation across brands gives us a good place to start: if a user lands on a Gucci bag, we have a general idea that he or she is interested in luxury products, so we won’t show Zara street wear. Ultimately, however, the value of this social, demographic segmentation is limited. Someone who likes Gucci bags may prefer Prada dresses and Jimmy Choo shoes – one key value at Lyst is that we can aggregate recommendations to curate taste across different brands.
What machine learning techniques do you use to recognize features in fashion product images?
One of the key challenges in image processing for fashion is to recognize duplicates of the same item, as you don’t want to serve multiple images of the same product in different contexts (one Gucci dress on or off the model) or colors (a Uniqlo shirt in white, black, or orange) to the same user. Old school visual processing techniques analyzed images to look for areas of gradient change, which often correlated with product features. The algorithms converted points of interest into a vector, and we compared vectors from different images to find matches to signal duplicates. For example, we could match a picture of a woman wearing an earring with an image of that earring on a blank background. This approach worked for certain stark patterns or shapes, but didn’t perform well with features like the local texture of a pair of blue jeans. They were also entirely unsupervised, so we couldn’t fine tune them.
That was the old school approach. Have you started using deep learning methods for image analysis?
We’re actually in the process of training the Lyst convolutional neural network now, and predict it will provide a massive boost in accuracy and performance on feature detection and product duplicate identification. Employing supervised learning, we’re training the deep representation model on 5 million labeled images and fine tuning the network to boost accuracy. Our labeling team combines workers from Amazon’s Mechanical Turk, remote contract moderators, and internal full-time moderators. We have a consensus process whereby each moderator gets a score proportional to their agreement with other moderators. The higher a score a moderator has, the more weight he/she has. At Lyst, we have a lot of models that use the human-in-the-loop methodology. Ultimately, our consensus process becomes encoded as intelligence in the model and we need fewer moderators to have confident labels.
We are building a network that has one “deep” representation of each product, metaphorically the network’s master concept of a given product. Like a natural language concept, we need a many-to-one as opposed to a one-to-one relationship between images and representations. Take a Gucci dress as an example. The Lyst data set may have an image of the dress on the hanger in the store, on the runway, on a static model, on people on the streets, or even written product descriptions of varying length. We want to use all this data to create one representation: “Gucci dress.”
What’s one challenge you’ve encountered while building your deep learning model and how did you address it?
One challenge is that we have different quantities of text and image data for different items. To address this, we found additional fashion data from the internet, and are playing with the idea of synthetically generating labeled data to train our network. This hack was inspired by talks I saw at the 2015 ICLR (the International Conference on Learning Representations) in San Diego. As it can be cost prohibitive for small research groups to acquire data sets with millions of labeled images, people have started to synthetically increase the size of their data sets through multisampling (where each image is cropped in multiple ways, or flipped horizontally and vertically) or introducing noise into the input every time a piece of data is shown to the network (which improves performance on rotated or zoomed in images).
Another technique we’re using to improve duplicate detection is called a region-based convolutional neural network, or R-CNN. We try to estimate bounding boxes around items or features in images to isolate garments or parts of items instead of looking at the images globally.
When we built Pictograph, our deep learning prototype, we chose to manage low confidence labels by leveraging semantic relationships in WordNet, which is the backbone for ImageNet. Are you employing anything similar to manage low confidence labels in your network?
Yes, we do use analogous techniques. We have a fashion ontology affiliated with the Lyst network. This forms a semantic tree with men and women at the top of the tree, followed by item types (e.g. shirts, shoes, or jewelry) as branches, and specific items (e.g. leather wedge sandals) as leaves. The network is trained on each individual, specific level as well as the full path through the tree. So if there is low confidence in classifying an item as “leather wedge sandal,” there may be higher confidence in saying it’s a ladies shoe, not a men’s suit. Our human in the loop consensus method also helps train the network across these semantic lines.
What will the fashion industry will look like in the future?
The key question is whether traditional retailers will incorporate technology into operations and strategy or derive value from data and technologies through partnerships with companies like Lyst. Our partnership team is working hard to educate high-end, traditional brands of the value modern data science techniques can provide to their business. A key obstacle is their fear that they would lose control over their branding strategy: a high-end retailer like Prada doesn’t want its image mixed up with H&M or Zara on a central platform like Lyst. But we’ve seen tremendous adoption by consumers who like to have their own control to mix and match products and create their own look. With capabilities like deep learning really taking off, we’re excited to see what comes next!
More from the Blog
Nov 17 2015
Possibly true statement: the Fast Forward Labs dog is the cutest dog in the world. Our General Counsel Ryan picked up the puppy a month ago and we’ve yet to name him. Ryan likes Renfield, which, as Bram Stoker fans know, evokes slightly different thoughts than “super cute,” particularly when played by the ever-guttural Tom Waits. But the fact that we’re in no rush to name him tells us somethi...
Dec 10 2015
We’re excited to announce a summer internship opportunity, which is open to current undergraduate and graduate students. To apply, send your resume and cover letter to firstname.lastname@example.org. Keep reading for details on responsibilities, qualifications, and perks. Research Engineering Intern Key Responsibilities You’ll spend the summer on our research engineering team. You’ll be expected...
Jul 22 2019
by — We discussed this research as part of our virtual event on Wednesday, July 24th; you can watch the replay here! Convolutional Neural Networks (CNNs or ConvNets) excel at learning meaningful representations of features and concepts within images. These capabilities make CNNs extremely valuable for solving problems in the image analysis domain. We can automatically identify defects in manufactur...