May 25, 2016 · interview

Human-Machine Algorithms: Interview with Eric Colson

Therefore render unto Caesar the things that are Caesar’s, and unto God the things that are God’s.
– Matthew, 22:21

We tend to think that recommender systems are old hat. ECommerce platforms like Amazon have been using techniques like collaborative filtering for years to help shoppers navigate vast catalogues by inferring consumer taste from past behavior. And yet, we’ve all experienced the limitations of these approaches (that time you bought a toilet bowl plunger to subsequently be flooded by recommendations for strange bathroom accessories). This may be a nuisance for consumers, but it doesn’t jeopardize eCommerce business models: only 35% of Amazon’s sales, for example, are driven from recommendations.

But what would happen if the stakes were higher? What kind of algorithmic creativity would it take to build a company with revenue based entirely on recommendations?

This is the challenge faced by Eric Colson’s team at Stitch Fix. Stitch Fix is an online “personal styling service” that selects clothing apparel for customers (they also have a solid technical blog). Instead of recommending items for shoppers to choose from, Stitch Fix goes a step further and simply ships items its algorithms and stylists choose to customers. The key to making this work, according to Colson, is to optimize the division of labor between human and machine.

Colson spoke about the value of human-machine collaborations at the March Data Driven NYC Meetup. We interviewed him to learn more about the approach.

Let’s start with your background. How did you become a data scientist?

My father was a physicist, and physics features the most exotic math out there.  Apparently – although this was too early for me to remember – he used to tape handwritten exotic math symbols inside my crib when I was a baby. I suppose this early exposure shaped my future obsession with microeconomics, and particularly graphs. Graphs are so pure, so elegant: I love how the smooth lines clearly show the theoretical relationship between things like price and demand (even if it’s always more complex in practice). I was lucky, as I entered the job market just when it was becoming viable to do price things like elasticity in the tech industry. You could record transactions and, with a bit of statistics, actually capture elasticity curves. Once in business, I had to learn how to get the data required to build these graphs. That meant learning the engineering behind data warehouses and ETL. It meant learning more sophisticated stats. At Yahoo, it eventually meant teasing relationships out of big data to optimize insights for human decision making. And, finally, at Netflix, it meant building systems for machine consumption - that is, automated decision-making or ‘algorithms’. You could even say I became a data scientist as a natural evolution of my sole pursuit: to optimize microeconomic graphs.

How did you choose your title? Chief Algorithms Officer isn’t something you see every day.

I’m not the only one, trust me. Udi Manber was the former algorithms officer at Amazon. I came to Stitch Fix to assume an officer role, and played with different words. I consider the title to play a recruiting function to attract the right talent. Chief Analytics Officer sounded too much like BI. Chief Data Officer sounded like a CIO, connoting data storage. I did consider Chief Data Scientist, and I love the word, but think there’s still confusion about what it means – it’s hard to capture a role that combines statistics, engineering, and business problems, and it’s getting more complicated now that artificial intelligence is used to describe machine learning algorithms. At any rate, I chose the title because the essence of what we do is to work with algorithms.

What inspired you to leave Netflix for Stitch Fix?

Katrina Lake, Stitch Fix’s Founder and CEO, engaged me in early 2012 as an advisor. I had no intentions of joining at the time, but was drawn to the work because it sought to mix algorithmic decision making with human decision-making in a way I thought merited exploration. At Netflix, I started to appreciate the limitations of a purely algorithmic approach for complex business tasks with an algorithm we built to predict the popularity of our content. This was critical when we shifted from the DVD model to streaming. With DVDs, you just buy and manage the inventory, and you can always buy more if needed; with streaming, you license content from studios, and pay high-dollar fixed costs over a set time period. That means, if no one watches a movie, Netflix still pays. It was therefore important to pick movies and shows many people would watch to cover costs. We gathered data from every source we could to build these models. They usually did well but there were anomalies where non-mainstream films with average ratings performed way better than expected. But if we just looked at the website we immediately knew why: the box shots would have sexy women in provocative positions, or even branding with subliminal effects (like the yellow National Geographic covers). Humans see this immediately, but it was not a feature we thought to engineer at the time (deep learning was not what it is today). So we built a little app where Netflix employees could indicate whether a movie had a sexually appealing image on it – we added human input to our models where human judgment performs better than algorithms. This, of course, is key to fashion image data and at Stitch Fix the entire company’s success was predicated on the algorithms working well. After six months of advising Stitch Fix, I decided to join full time.

How does human-machine collaboration work at Stitch Fix?

The machine learning happens first, and we combine all sorts of algorithms for different sub tasks, be they neural nets, collaborative filtering, mixed effects models, naïve Bayes, etc. to do a first pass at recommending styles for individual customers. Machines are far more efficient than humans and we leverage them for the rote calculations in our process. We leave the other types of activities - like synthesizing ambient information, improvising, fostering a relationship with the customer, applying empathy - to humans. The final step is logistics to manage delivery. It’s a division of labor modeled after Daniel Kahneman’s two systems of thinking in Thinking, Fast and Slow. The machines take the calculations and probabilities; the humans take the intuition. But there are overlapping tasks they share.

For example?

When it comes to fashion, customers have a hard time saying what they like. They’d rather show what they like. The vast majority of customers pin examples of styles the like on Pinterest that we can process in two ways. Machines can vectorize the pinned image and compare it to our inventory using convolutional neural nets (AlexNet) to find items that are similar, with similarity measured as Euclidean distance. Once we have this short list of items, we pass this to humans who process ambient information, recognize if the recommended items are too similar to the pinned image, or even judge whether the images are more aspirational than literal and modify suggestions accordingly.  Take something like a leopard print dress. Machines are very pedantic: they can distinguish leopard print from cheetah print, but don’t have the social sense to know that a woman who likes leopard print would very likely also like cheetah print.

It can take a long time and a lot of data to train a convolutional neural net. Does your team have any creative techniques to speed up the process?

We’ve experimented with generative adversarial networks for our image processing. You use one “discriminative” neural net to break down images and find the elements of style, and then a generative net to put the building blocks back together to reproduce the images. It’s pretty quick to train another machine to then distinguish between real and machine-generated images.

Visualization of samples from a generative adversarial net in a 2014 article by Goodfellow et al.

What about improving the performance of the human stylists?

Humans are classic overfitters. As Kahneman explains, we don’t naturally think statistically, but tend to build narratives around the details most prominent in our consciousness. Say a stylist receives a tall customer, suggests some item she thinks would be good for tall people, and receives positive feedback from the customer. She’ll then reinforce the association between tallness and the product feature from this one win, often overlooking the hundreds of other instances where this didn’t work. Statistics are therefore a wonderful (and necessary) support to mitigate associative fallacies. The other challenge with our system is that there’s often a long delay between a stylist’s decision to pick a certain item and feedback from the customer on whether they like it (or it’s the right size, fit, etc.). Humans work best with instantaneous, unambiguous feedback: they forget why they did something by the time we have input from customers. So we built a parallel, “labs” version of the interface to give stylists realtime feedback. We use historical shipment data to simulate how a customer would react to give this quick feedback.  

What percent of your recommendations are informed by collaborative filtering?

It’s true that matrix factorization is the most intuitive place to start for product recommendations, but they tend to fall short in the fashion industry because trends change so fast. The stuff we have now will be gone next month, so we have really sparse matrices.  Most clients are getting boxes only on a monthly - or even less frequent - basis. This means we only get feedback on about 5 items per month per client on average.  As such, it’s better to use other features to inform our recommendations, and our techniques run the gambit from neural networks to mixed effects models. That said, there are customers who get more items more frequently and where we have enough information to compare them with others. In those cases, collaborative filtering works.

How do you organize your team to support this diversity of algorithms and workflows?

Five groups report into me, and they are designed for autonomy. They have as few people as possible to work with and have to build their own pipelines: they do the math, learn the algorithms, communicate with the business, manage projects. I think this is better because it’s faster (there’s no coordination cost so you can iterate faster), attracts top talent, and supports long-term systems as data scientists understand things deeply and can make gradual tweaks to code. Three teams – merchandise algorithms, styling algorithms, and client algorithms – align with current business functions. The Merchandise Algorithms team works with merchants on procurement. The Styling Algorithms team works with stylists - it’s the recommendation workflow. The Client Algorithms team works with marketing on acquisition and retention: are clients happy? What’s the right action to take with them? Next, the data platform team is responsible for the infrastructure required to make the different algorithms run. We don’t have data engineers writing ETL on purpose. The final team is the labs team, which handles experimental stuff that hasn’t yet made it into the product (they are exploring the use of NLP and computer vision algorithms). Labs also manages our tech branding, like the Multithreaded blog.

What’s next for Stitch Fix?

One exciting piece of news is that we now have five proprietary, data-driven clothing lines. The items are like Frankenstyles, combining attributes (sleeve length, flair, etc…) that work well for a particular demographic but may not yet coexist in a single piece. We refine the items based on feedback, tweaking until it works. These will be shipped soon!

Colson’s talk at FirstMark’s Data Driven NYC.

Read more

May 27, 2016 · announcement
May 23, 2016 · guest post

Latest posts

Nov 15, 2022 · newsletter

CFFL November Newsletter

November 2022 Perhaps November conjures thoughts of holiday feasts and festivities, but for us, it’s the perfect time to chew the fat about machine learning! Make room on your plate for a peek behind the scenes into our current research on harnessing synthetic image generation to improve classification tasks. And, as usual, we reflect on our favorite reads of the month. New Research! In the first half of this year, we focused on natural language processing with our Text Style Transfer blog series. more
Nov 14, 2022 · post

Implementing CycleGAN

by Michael Gallaspy · Introduction This post documents the first part of a research effort to quantify the impact of synthetic data augmentation in training a deep learning model for detecting manufacturing defects on steel surfaces. We chose to generate synthetic data using CycleGAN,1 an architecture involving several networks that jointly learn a mapping between two image domains from unpaired examples (I’ll elaborate below). Research from recent years has demonstrated improvement on tasks like defect detection2 and image segmentation3 by augmenting real image data sets with synthetic data, since deep learning algorithms require massive amounts of data, and data collection can easily become a bottleneck. more
Oct 20, 2022 · newsletter

CFFL October Newsletter

October 2022 We’ve got another action-packed newsletter for October! Highlights this month include the re-release of a classic CFFL research report, an example-heavy tutorial on Dask for distributed ML, and our picks for the best reads of the month. Open Data Science Conference Cloudera Fast Forward Labs will be at ODSC West near San Fransisco on November 1st-3rd, 2022! If you’ll be in the Bay Area, don’t miss Andrew and Melanie who will be presenting our recent research on Neutralizing Subjectivity Bias with HuggingFace Transformers. more
Sep 21, 2022 · newsletter

CFFL September Newsletter

September 2022 Welcome to the September edition of the Cloudera Fast Forward Labs newsletter. This month we’re talking about ethics and we have all kinds of goodies to share including the final installment of our Text Style Transfer series and a couple of offerings from our newest research engineer. Throw in some choice must-reads and an ASR demo, and you’ve got yourself an action-packed newsletter! New Research! Ethical Considerations When Designing an NLG System In the final post of our blog series on Text Style Transfer, we discuss some ethical considerations when working with natural language generation systems, and describe the design of our prototype application: Exploring Intelligent Writing Assistance. more
Sep 8, 2022 · post

Thought experiment: Human-centric machine learning for comic book creation

by Michael Gallaspy · This post has a companion piece: Ethics Sheet for AI-assisted Comic Book Art Generation I want to make a comic book. Actually, I want to make tools for making comic books. See, the problem is, I can’t draw too good. I mean, I’m working on it. Check out these self portraits drawn 6 months apart: Left: “Sad Face”. February 2022. Right: “Eyyyy”. August 2022. But I have a long way to go until my illustrations would be considered professional quality, notwithstanding the time it would take me to develop the many other skills needed for making comic books. more
Aug 18, 2022 · newsletter

CFFL August Newsletter

August 2022 Welcome to the August edition of the Cloudera Fast Forward Labs newsletter. This month we’re thrilled to introduce a new member of the FFL team, share TWO new applied machine learning prototypes we’ve built, and, as always, offer up some intriguing reads. New Research Engineer! If you’re a regular reader of our newsletter, you likely noticed that we’ve been searching for new research engineers to join the Cloudera Fast Forward Labs team. more

Popular posts

Oct 30, 2019 · newsletter
Exciting Applications of Graph Neural Networks
Nov 14, 2018 · post
Federated learning: distributed machine learning with data locality and privacy
Apr 10, 2018 · post
PyTorch for Recommenders 101
Oct 4, 2017 · post
First Look: Using Three.js for 2D Data Visualization
Aug 22, 2016 · whitepaper
Under the Hood of the Variational Autoencoder (in Prose and Code)
Feb 24, 2016 · post
"Hello world" in Keras (or, Scikit-learn versus Keras)


In-depth guides to specific machine learning capabilities


Machine learning prototypes and interactive notebooks

ASR with Whisper

Explore the capabilities of OpenAI's Whisper for automatic speech recognition by creating your own voice recordings!


A usable library for question answering on large datasets.

Explain BERT for Question Answering Models

Tensorflow 2.0 notebook to explain and visualize a HuggingFace BERT for Question Answering model.

NLP for Question Answering

Ongoing posts and code documenting the process of building a question answering model.

Cloudera Fast Forward Labs

Making the recently possible useful.

Cloudera Fast Forward Labs is an applied machine learning research group. Our mission is to empower enterprise data science practitioners to apply emergent academic research to production machine learning use cases in practical and socially responsible ways, while also driving innovation through the Cloudera ecosystem. Our team brings thoughtful, creative, and diverse perspectives to deeply researched work. In this way, we strive to help organizations make the most of their ML investment as well as educate and inspire the broader machine learning and data science community.

Cloudera   Blog   Twitter

©2022 Cloudera, Inc. All rights reserved.