Blog

May 25, 2016 · interview

Human-Machine Algorithms: Interview with Eric Colson

Therefore render unto Caesar the things that are Caesar’s, and unto God the things that are God’s.

– Matthew, 22:21

We tend to think that recommender systems are old hat. ECommerce platforms like Amazon have been using techniques like collaborative filtering for years to help shoppers navigate vast catalogues by inferring consumer taste from past behavior. And yet, we’ve all experienced the limitations of these approaches (that time you bought a toilet bowl plunger to subsequently be flooded by recommendations for strange bathroom accessories). This may be a nuisance for consumers, but it doesn’t jeopardize eCommerce business models: only 35% of Amazon’s sales, for example, are driven from recommendations.

But what would happen if the stakes were higher? What kind of algorithmic creativity would it take to build a company with revenue based entirely on recommendations?

This is the challenge faced by Eric Colson’s team at Stitch Fix. Stitch Fix is an online “personal styling service” that selects clothing apparel for customers (they also have a solid technical blog). Instead of recommending items for shoppers to choose from, Stitch Fix goes a step further and simply ships items its algorithms and stylists choose to customers. The key to making this work, according to Colson, is to optimize the division of labor between human and machine.

Colson spoke about the value of human-machine collaborations at the March Data Driven NYC Meetup. We interviewed him to learn more about the approach.

Let’s start with your background. How did you become a data scientist?

My father was a physicist, and physics features the most exotic math out there. Apparently – although this was too early for me to remember – he used to tape handwritten exotic math symbols inside my crib when I was a baby. I suppose this early exposure shaped my future obsession with microeconomics, and particularly graphs. Graphs are so pure, so elegant: I love how the smooth lines clearly show the theoretical relationship between things like price and demand (even if it’s always more complex in practice). I was lucky, as I entered the job market just when it was becoming viable to do price things like elasticity in the tech industry. You could record transactions and, with a bit of statistics, actually capture elasticity curves. Once in business, I had to learn how to get the data required to build these graphs. That meant learning the engineering behind data warehouses and ETL. It meant learning more sophisticated stats. At Yahoo, it eventually meant teasing relationships out of big data to optimize insights for human decision making. And, finally, at Netflix, it meant building systems for machine consumption - that is, automated decision-making or ‘algorithms’. You could even say I became a data scientist as a natural evolution of my sole pursuit: to optimize microeconomic graphs.

How did you choose your title? Chief Algorithms Officer isn’t something you see every day.

I’m not the only one, trust me. Udi Manber was the former algorithms officer at Amazon. I came to Stitch Fix to assume an officer role, and played with different words. I consider the title to play a recruiting function to attract the right talent. Chief Analytics Officer sounded too much like BI. Chief Data Officer sounded like a CIO, connoting data storage. I did consider Chief Data Scientist, and I love the word, but think there’s still confusion about what it means – it’s hard to capture a role that combines statistics, engineering, and business problems, and it’s getting more complicated now that artificial intelligence is used to describe machine learning algorithms. At any rate, I chose the title because the essence of what we do is to work with algorithms.

What inspired you to leave Netflix for Stitch Fix?

Katrina Lake, Stitch Fix’s Founder and CEO, engaged me in early 2012 as an advisor. I had no intentions of joining at the time, but was drawn to the work because it sought to mix algorithmic decision making with human decision-making in a way I thought merited exploration. At Netflix, I started to appreciate the limitations of a purely algorithmic approach for complex business tasks with an algorithm we built to predict the popularity of our content. This was critical when we shifted from the DVD model to streaming. With DVDs, you just buy and manage the inventory, and you can always buy more if needed; with streaming, you license content from studios, and pay high-dollar fixed costs over a set time period. That means, if no one watches a movie, Netflix still pays. It was therefore important to pick movies and shows many people would watch to cover costs. We gathered data from every source we could to build these models. They usually did well but there were anomalies where non-mainstream films with average ratings performed way better than expected. But if we just looked at the website we immediately knew why: the box shots would have sexy women in provocative positions, or even branding with subliminal effects (like the yellow National Geographic covers). Humans see this immediately, but it was not a feature we thought to engineer at the time (deep learning was not what it is today). So we built a little app where Netflix employees could indicate whether a movie had a sexually appealing image on it – we added human input to our models where human judgment performs better than algorithms. This, of course, is key to fashion image data and at Stitch Fix the entire company’s success was predicated on the algorithms working well. After six months of advising Stitch Fix, I decided to join full time.

How does human-machine collaboration work at Stitch Fix?

The machine learning happens first, and we combine all sorts of algorithms for different sub tasks, be they neural nets, collaborative filtering, mixed effects models, naïve Bayes, etc. to do a first pass at recommending styles for individual customers. Machines are far more efficient than humans and we leverage them for the rote calculations in our process. We leave the other types of activities - like synthesizing ambient information, improvising, fostering a relationship with the customer, applying empathy - to humans. The final step is logistics to manage delivery. It’s a division of labor modeled after Daniel Kahneman’s two systems of thinking in Thinking, Fast and Slow. The machines take the calculations and probabilities; the humans take the intuition. But there are overlapping tasks they share.

For example?

When it comes to fashion, customers have a hard time saying what they like. They’d rather show what they like. The vast majority of customers pin examples of styles the like on Pinterest that we can process in two ways. Machines can vectorize the pinned image and compare it to our inventory using convolutional neural nets (AlexNet) to find items that are similar, with similarity measured as Euclidean distance. Once we have this short list of items, we pass this to humans who process ambient information, recognize if the recommended items are too similar to the pinned image, or even judge whether the images are more aspirational than literal and modify suggestions accordingly. Take something like a leopard print dress. Machines are very pedantic: they can distinguish leopard print from cheetah print, but don’t have the social sense to know that a woman who likes leopard print would very likely also like cheetah print.

It can take a long time and a lot of data to train a convolutional neural net. Does your team have any creative techniques to speed up the process?

We’ve experimented with generative adversarial networks for our image processing. You use one “discriminative” neural net to break down images and find the elements of style, and then a generative net to put the building blocks back together to reproduce the images. It’s pretty quick to train another machine to then distinguish between real and machine-generated images.

Visualization of samples from a generative adversarial net in a 2014 article by Goodfellow et al.

What about improving the performance of the human stylists?

Humans are classic overfitters. As Kahneman explains, we don’t naturally think statistically, but tend to build narratives around the details most prominent in our consciousness. Say a stylist receives a tall customer, suggests some item she thinks would be good for tall people, and receives positive feedback from the customer. She’ll then reinforce the association between tallness and the product feature from this one win, often overlooking the hundreds of other instances where this didn’t work. Statistics are therefore a wonderful (and necessary) support to mitigate associative fallacies. The other challenge with our system is that there’s often a long delay between a stylist’s decision to pick a certain item and feedback from the customer on whether they like it (or it’s the right size, fit, etc.). Humans work best with instantaneous, unambiguous feedback: they forget why they did something by the time we have input from customers. So we built a parallel, “labs” version of the interface to give stylists realtime feedback. We use historical shipment data to simulate how a customer would react to give this quick feedback.

What percent of your recommendations are informed by collaborative filtering?

It’s true that matrix factorization is the most intuitive place to start for product recommendations, but they tend to fall short in the fashion industry because trends change so fast. The stuff we have now will be gone next month, so we have really sparse matrices. Most clients are getting boxes only on a monthly - or even less frequent - basis. This means we only get feedback on about 5 items per month per client on average. As such, it’s better to use other features to inform our recommendations, and our techniques run the gambit from neural networks to mixed effects models. That said, there are customers who get more items more frequently and where we have enough information to compare them with others. In those cases, collaborative filtering works.

How do you organize your team to support this diversity of algorithms and workflows?

Five groups report into me, and they are designed for autonomy. They have as few people as possible to work with and have to build their own pipelines: they do the math, learn the algorithms, communicate with the business, manage projects. I think this is better because it’s faster (there’s no coordination cost so you can iterate faster), attracts top talent, and supports long-term systems as data scientists understand things deeply and can make gradual tweaks to code. Three teams – merchandise algorithms, styling algorithms, and client algorithms – align with current business functions. The Merchandise Algorithms team works with merchants on procurement. The Styling Algorithms team works with stylists - it’s the recommendation workflow. The Client Algorithms team works with marketing on acquisition and retention: are clients happy? What’s the right action to take with them? Next, the data platform team is responsible for the infrastructure required to make the different algorithms run. We don’t have data engineers writing ETL on purpose. The final team is the labs team, which handles experimental stuff that hasn’t yet made it into the product (they are exploring the use of NLP and computer vision algorithms). Labs also manages our tech branding, like the Multithreaded blog.

What’s next for Stitch Fix?

One exciting piece of news is that we now have five proprietary, data-driven clothing lines. The items are like Frankenstyles, combining attributes (sleeve length, flair, etc…) that work well for a particular demographic but may not yet coexist in a single piece. We refine the items based on feedback, tweaking until it works. These will be shipped soon!

Colson’s talk at FirstMark’s Data Driven NYC.

Newer

May 27, 2016 · announcement

Online Talk: Summarization Algorithms

Older

May 23, 2016 · guest post

Evaluating Summarization Systems

Latest posts

Nov 15, 2022 · newsletter

CFFL November Newsletter

November 2022 Perhaps November conjures thoughts of holiday feasts and festivities, but for us, it’s the perfect time to chew the fat about machine learning! Make room on your plate for a peek behind the scenes into our current research on harnessing synthetic image generation to improve classification tasks. And, as usual, we reflect on our favorite reads of the month. New Research! In the first half of this year, we focused on natural language processing with our Text Style Transfer blog series.

Nov 14, 2022 · post

Implementing CycleGAN

by Michael Gallaspy · Introduction This post documents the first part of a research effort to quantify the impact of synthetic data augmentation in training a deep learning model for detecting manufacturing defects on steel surfaces. We chose to generate synthetic data using CycleGAN,1 an architecture involving several networks that jointly learn a mapping between two image domains from unpaired examples (I’ll elaborate below). Research from recent years has demonstrated improvement on tasks like defect detection2 and image segmentation3 by augmenting real image data sets with synthetic data, since deep learning algorithms require massive amounts of data, and data collection can easily become a bottleneck.

Oct 20, 2022 · newsletter

CFFL October Newsletter

October 2022 We’ve got another action-packed newsletter for October! Highlights this month include the re-release of a classic CFFL research report, an example-heavy tutorial on Dask for distributed ML, and our picks for the best reads of the month. Open Data Science Conference Cloudera Fast Forward Labs will be at ODSC West near San Fransisco on November 1st-3rd, 2022! If you’ll be in the Bay Area, don’t miss Andrew and Melanie who will be presenting our recent research on Neutralizing Subjectivity Bias with HuggingFace Transformers.

Sep 21, 2022 · newsletter

CFFL September Newsletter

September 2022 Welcome to the September edition of the Cloudera Fast Forward Labs newsletter. This month we’re talking about ethics and we have all kinds of goodies to share including the final installment of our Text Style Transfer series and a couple of offerings from our newest research engineer. Throw in some choice must-reads and an ASR demo, and you’ve got yourself an action-packed newsletter! New Research! Ethical Considerations When Designing an NLG System In the final post of our blog series on Text Style Transfer, we discuss some ethical considerations when working with natural language generation systems, and describe the design of our prototype application: Exploring Intelligent Writing Assistance.

Sep 8, 2022 · post

Thought experiment: Human-centric machine learning for comic book creation

by Michael Gallaspy · This post has a companion piece: Ethics Sheet for AI-assisted Comic Book Art Generation I want to make a comic book. Actually, I want to make tools for making comic books. See, the problem is, I can’t draw too good. I mean, I’m working on it. Check out these self portraits drawn 6 months apart: Left: “Sad Face”. February 2022. Right: “Eyyyy”. August 2022. But I have a long way to go until my illustrations would be considered professional quality, notwithstanding the time it would take me to develop the many other skills needed for making comic books.

Aug 18, 2022 · newsletter

CFFL August Newsletter

August 2022 Welcome to the August edition of the Cloudera Fast Forward Labs newsletter. This month we’re thrilled to introduce a new member of the FFL team, share TWO new applied machine learning prototypes we’ve built, and, as always, offer up some intriguing reads. New Research Engineer! If you’re a regular reader of our newsletter, you likely noticed that we’ve been searching for new research engineers to join the Cloudera Fast Forward Labs team.

Reports

In-depth guides to specific machine learning capabilities

FF24

Text Style Transfer

The NLP task of text style transfer (TST) aims to automatically control the style attributes of a piece of text while preserving the content, which is an important consideration for making NLP more user-centric. In this report, we explore text style transfer through an applied use case — neutralizing subjectivity bias in free text. Along the way, we describe our sequence-to-sequence modeling approach leveraging HuggingFace Transformers, and present a set of custom, reference-free evaluation metrics for quantifying model performance. Finally, we conclude with a discussion of ethics centered around our prototype: Exploring Intelligent Writing Assistance.

Read the report →

FF22

Inferring Concept Drift Without Labeled Data

Concept drift occurs when the statistical properties of a target domain change overtime causing model performance to degrade. Drift detection is generally achieved by monitoring a performance metric of interest and triggering a retraining pipeline when that metric falls below some designated threshold. However, this approach assumes ample labeled data is available at prediction time - an unrealistic constraint for many production systems. In this report, we explore various approaches for dealing with concept drift when labeled data is not readily accessible.

Read the report →

FF19

Session-based Recommender Systems

Being able to recommend an item of interest to a user (based on their past preferences) is a highly relevant problem in practice. A key trend over the past few years has been session-based recommendation algorithms that provide recommendations solely based on a user’s interactions in an ongoing session, and which do not require the existence of user profiles or their entire historical preferences. This report explores a simple, yet powerful, NLP-based approach (word2vec) to recommend a next item to a user. While NLP-based approaches are generally employed for linguistic tasks, here we exploit them to learn the structure induced by a user’s behavior or an item’s nature.

Read the report →

FF18

Few-Shot Text Classification

Text classification can be used for sentiment analysis, topic assignment, document identification, article recommendation, and more. While dozens of techniques now exist for this fundamental task, many of them require massive amounts of labeled data in order to be useful. Collecting annotations for your use case is typically one of the most costly parts of any machine learning application. In this report, we explore how latent text embeddings can be used with few (or even zero) training examples and provide insights into best practices for implementing this method.

Read the report →

Prototypes

Machine learning prototypes and interactive notebooks

Notebook

ASR with Whisper

Explore the capabilities of OpenAI's Whisper for automatic speech recognition by creating your own voice recordings!

https://colab.research.google.com/github/fastforwardlabs/whisper-openai/blob/master/WhisperDemo.ipynb

Library

NeuralQA

A usable library for question answering on large datasets.

https://neuralqa.fastforwardlabs.com

Notebook

Explain BERT for Question Answering Models

Tensorflow 2.0 notebook to explain and visualize a HuggingFace BERT for Question Answering model.

https://colab.research.google.com/drive/1tTiOgJ7xvy3sjfiFC9OozbjAX1ho8WN9?usp=sharing

Notebooks

NLP for Question Answering

Ongoing posts and code documenting the process of building a question answering model.

https://qa.fastforwardlabs.com

Cloudera Fast Forward Labs

Making the recently possible useful.

Cloudera Fast Forward Labs is an applied machine learning research group. Our mission is to empower enterprise data science practitioners to apply emergent academic research to production machine learning use cases in practical and socially responsible ways, while also driving innovation through the Cloudera ecosystem. Our team brings thoughtful, creative, and diverse perspectives to deeply researched work. In this way, we strive to help organizations make the most of their ML investment as well as educate and inspire the broader machine learning and data science community.

Cloudera Blog Twitter

May 25, 2016 · interview

Human-Machine Algorithms: Interview with Eric Colson

– Matthew, 22:21

Visualization of samples from a generative adversarial net in a 2014 article by Goodfellow et al.

Colson’s talk at FirstMark’s Data Driven NYC.

Read more

May 27, 2016 · announcement

May 23, 2016 · guest post

Latest posts

Nov 15, 2022 · newsletter

CFFL November Newsletter

Nov 14, 2022 · post

Implementing CycleGAN

Oct 20, 2022 · newsletter

CFFL October Newsletter

Sep 21, 2022 · newsletter

CFFL September Newsletter

Sep 8, 2022 · post

Thought experiment: Human-centric machine learning for comic book creation

Aug 18, 2022 · newsletter

CFFL August Newsletter

Popular posts

Oct 30, 2019 · newsletter

Nov 14, 2018 · post

Apr 10, 2018 · post

Oct 4, 2017 · post

Aug 22, 2016 · whitepaper

Feb 24, 2016 · post

Reports

FF24

FF22

FF19

FF18

Prototypes

Notebook

Library

Notebook

Notebooks

Cloudera Fast Forward Labs