Updates from Cloudera Fast Forward on new research, prototypes, and exciting developments

Welcome to Cloudera Fast Forward Labs’ February newsletter. Three exciting things this month.

We’re open sourcing a whole raft of applied ML applications.
We’re hiring.
Recommended reading.

Applied Machine Learning Prototypes

Deep in the underground (virtual) Fast Forward lab, we’ve been very busy working away in secrecy for months. We’re super excited to announce ten open source Applied ML Prototypes.

These prototypes (we call ‘em AMPs) are inspired by both our own research and our front row seats to the challenges enterprises face when developing ML applications. We hope they serve as useful inspiration, learning materials, and templates to kick start your own applications. They can be deployed automatically inside the Cloudera Machine Learning platform with just a few clicks. Even if you are not a Cloudera customer, the code is open source and ready for your tinkering!

Check out the code and read more here: Kickstart AI Use Cases With New Applied Machine Learning Prototypes.

Guitar Amps

Photo: Amps, not AMPs. Credit: Thomas Litangen (crop).

We’re Hiring

We’re hiring for a research engineer to join our team! At Cloudera Fast Forward Labs, we take the incredible research developments we find in academia and industry, and work to bridge the gap to products and processes that are useful for practitioners. If you’re thoughtful about applications of machine learning in industry, and enjoy building things and writing about them, we’d love to hear from you!

A description of the role, and details about how to apply can be found here: Research Engineer.

Recommended Reading

Connecting Text and Images

Can we learn robust image representations of images using natural language supervision? The answer is yes! In their January 2021 paper, OpenAI introduce CLIP - a model trained on 400M image and text pairs scraped from the internet with very interesting results. To illustrate its value, representations learned by CLIP are at par with a fully supervised ResNET50 trained on ImageNet and significantly outperforms all other models on challenging datasets (e.g. adversarial datasets, sketches etc). CLIP which is based on transformers (transformer based text encoder, and transformer based image encoder) demonstrate how transformers are helping unify the ML problem space. - Victor

Related Posts
Predictive Modeling: A Retrospective

An essay by Shreya Shankar about her experiences with predictive modeling in the industry. Quite an engaging read for an ML practitioner that discusses various aspects of what applied ML is like in practice - managing ML pipelines, dealing with algorithmic bias, the gaps between ML research and production, debugging models in production and others. It gives a glimpse of what it is like to be in a company where the infrastructure is nicely abstracted from the practitioner, and re-iterates why ML in production is hard. Building an ML model on a toy dataset, even if the data is derived from the real world, doesn’t mean it will perform and scale well during production! The essay discusses how building end-to-end ML pipelines is complex and so is managing people who have to effectively collaborate on these pipelines. Explaining model predictions could be a nightmare, for example, how do you rationalize a model’s prediction when it deviates from the true label? All in all, a thought provoking piece that could help people interested in pursuing ML or data science careers. - Nisha
End-to-End Object Detection with Transformers

In 2020, Facebook AI released a novel approach to the task of object detection that combines a standard CNN backbone with a modern Transformer architecture called DEtection TRansformer (DETR). DETR simplifies the detection pipeline by dropping several hand-designed components of traditional detection architectures that demand prior knowledge of the problem space like spatial anchors and non-maximal suppression (NMS). Despite heavier training requirements, the authors demonstrate comparable performance to a competitive baseline encouraging continued research on this simplistic design from the object detection community. - Andrew
Generally Intelligent

I’m not a big podcast listener but I stumbled across this new series last week and was pleasantly surprised! Dubbing themselves a “podcast for deep learning researchers,” the series is 0-indexed (love it!) and focuses on interviews with machine learning and deep learning scientists from leading think tanks such as DeepMind, OpenAI, Google AI, and more. The interviews are more like conversations that explore the hunches and processes of the deep learning research cycle while simultaneously imparting practical advice for other researchers without being overly technical. -Melanie
From local explanations to global understanding with explainable AI for trees

This paper introduces the TreeExplainer method to SHAP. SHAP can be used in a model agnostic fashion, being capable of explaining arbitrary, black box predictors with a Kernel estimation. However, this is extremely computationally costly. By restricting to only tree-based algorithms, TreeExplainer can work much faster. The paper also presents visual methods for aggregating the local, non-linear explanations into global feature importances. I love that the authors are thoughtful not only about the algorithmic properties of explanations, but how those explanations are ultimately presented. - Chris