Blog
Machine learning research and strategy
Latest posts
Popular posts
Oct 30, 2019 · newsletter
Nov 14, 2018 · post
Apr 10, 2018 · post
Oct 4, 2017 · post
Aug 22, 2016 · whitepaper
Feb 24, 2016 · post
Reports
In-depth guides to specific machine learning capabilities
FF24
Text Style Transfer
The NLP task of text style transfer (TST) aims to automatically control the style attributes of a piece of text while preserving the content, which is an important consideration for making NLP more user-centric. In this report, we explore text style transfer through an applied use case — neutralizing subjectivity bias in free text. Along the way, we describe our sequence-to-sequence modeling approach leveraging HuggingFace Transformers, and present a set of custom, reference-free evaluation metrics for quantifying model performance. Finally, we conclude with a discussion of ethics centered around our prototype: Exploring Intelligent Writing Assistance.
FF22
Inferring Concept Drift Without Labeled Data
Concept drift occurs when the statistical properties of a target domain change overtime causing model performance to degrade. Drift detection is generally achieved by monitoring a performance metric of interest and triggering a retraining pipeline when that metric falls below some designated threshold. However, this approach assumes ample labeled data is available at prediction time - an unrealistic constraint for many production systems. In this report, we explore various approaches for dealing with concept drift when labeled data is not readily accessible.
FF19
Session-based Recommender Systems
Being able to recommend an item of interest to a user (based on their past preferences) is a highly relevant problem in practice. A key trend over the past few years has been session-based recommendation algorithms that provide recommendations solely based on a user’s interactions in an ongoing session, and which do not require the existence of user profiles or their entire historical preferences. This report explores a simple, yet powerful, NLP-based approach (word2vec) to recommend a next item to a user. While NLP-based approaches are generally employed for linguistic tasks, here we exploit them to learn the structure induced by a user’s behavior or an item’s nature.
FF18
Few-Shot Text Classification
Text classification can be used for sentiment analysis, topic assignment, document identification, article recommendation, and more. While dozens of techniques now exist for this fundamental task, many of them require massive amounts of labeled data in order to be useful. Collecting annotations for your use case is typically one of the most costly parts of any machine learning application. In this report, we explore how latent text embeddings can be used with few (or even zero) training examples and provide insights into best practices for implementing this method.
Prototypes
Machine learning prototypes and interactive notebooks
Notebook
ASR with Whisper
Explore the capabilities of OpenAI's Whisper for automatic speech recognition by creating your own voice recordings!
Tensorflow 2.0 notebook to explain and visualize a HuggingFace BERT for Question Answering model.
Notebooks
NLP for Question Answering
Ongoing posts and code documenting the process of building a question answering model.
Cloudera Fast Forward Labs
Making the recently possible useful.
Cloudera Fast Forward Labs is an applied machine learning research group. Our mission is to empower enterprise data science practitioners to apply emergent academic research to production machine learning use cases in practical and socially responsible ways, while also driving innovation through the Cloudera ecosystem. Our team brings thoughtful, creative, and diverse perspectives to deeply researched work. In this way, we strive to help organizations make the most of their ML investment as well as educate and inspire the broader machine learning and data science community.
Cloudera Blog Twitter©2022 Cloudera, Inc. All rights reserved.