Updates from Cloudera Fast Forward on new research, prototypes, and exciting developments
View this email in browser

Welcome to the September Cloudera Fast Forward Labs newsletter.


New research!

We like to build things. In the past couple of months we’ve been hard at work on a new prototype, as well as wrapping up a few loose threads from various research directions that we’ve touched on so far this year. Our efforts have culminated in a fun new application and blog posts.

Summarize.

Automatic summarization is a task in which a machine distills a large amount of data into a subset (the summary) that retains the most relevant and important information from the whole. While traditionally applied to text, automatic summarization can include other formats such as images or audio. In this post we cover the main approaches to automatic text summarization, talk about what makes for a good summary, and introduce Summarize. – a summarization prototype we built that showcases both extractive and abstractive automatic summarization models.

summarize_crop.png

Extractive Summarization with SentenceBERT

In this post, we dive deeper into how we trained a SentenceBERT model to perform extractive summarization, from model architecture to considerations for training and inference. You can interact with this model in the Summarize. prototype!

How (and when) to enable early stopping for Gensim’s Word2Vec

The Gensim library is a staple of the NLP stack and supports what is likely the best-known implementation of Word2Vec. In this post, we cover how to train Word2Vec for non-language use cases (like learning item embeddings) and explain when you should and shouldn’t use early stopping.


Fast Forward Live!

Check out replays of livestreams covering some of our research from this year.

Deep Learning for Automatic Offline Signature Verification

Session-based Recommender Systems

Few-Shot Text Classification

Representation Learning for Software Engineers


Our research engineers share their favourite reads of the month: