 
      Welcome to the April edition of the Cloudera Fast Forward Labs newsletter. This month we share some exciting team news, open the FFL research vault, and talk about a recent initiative.
Research Engineer (US/UK/Canada - Remote)
As a Research Engineer on the Cloudera Fast Forwards Labs team, much of your time will be spent developing new applied research and sharing it with the community via reports and blog posts. You’ll also have the opportunity to hone your engineering skills by contributing to and supporting our catalog of applied machine learning prototypes. And as an ML advocate within Cloudera, you’ll be able to influence the shape of our product features to ensure their amenability for ML and data science practitioners.
We’ve opened up the vault again to re-release this fabulous report from 2019, still highly relevant today!
The application of deep learning to natural language processing (NLP) enables the development of language **models, which process language as it occurs naturally — as a sequence of symbols, preserving order and structure and incorporating context. These models are powerful enough to automatically answer questions, translate between languages, detect emotion, and even generate human-like language. But they can be complex and unwieldy. Often the cost to incorporate them into real-world systems and products is quite high and requires lots of data, skilled human experts, and specialized infrastructure.

This report dives into transfer learning, a method of training language models that reduces these problems through knowledge reuse. While initial language models are still difficult to build, once built, they can be reused and repurposed by anyone, often for tasks that they were not originally trained on. Transfer learning not only improves the accuracy and robustness of language models but, more importantly, also reduces the cost of leveraging these advanced techniques for practical use cases.
Will you be attending the Open Data Science Conference in Boston next week? If so, stop by the Cloudera booth at the AI Expo and say hi to the Fast Forward Labs team as well as other Clouderan experts! We’ll be showing off a handful of our Applied Machine Learning Prototypes and we love to talk shop (machine learning, that is).
Check out replays of livestreams covering some of our research from last year.
Deep Learning for Automatic Offline Signature Verification
Session-based Recommender Systems
Representation Learning for Software Engineers
Normally we close our newsletter with some reading suggestions from the team. This month, we thought we’d share a bit about another initiative the CFFL team has been working on behind the scenes.
Our applied machine learning prototypes (AMPs) provide reference examples of machine learning projects in Cloudera’s data science products, but the content in these repos can often be explored in any environment. They range from in-depth Jupyter Notebook examples to full-scale web or Streamlit applications, and everything in between.
As of earlier this month, all of our AMPs had been built to run on Python 3.6 in order to provide accessibility for all of Cloudera’s customers as well as the DS/ML community. But support for Python 3.6 ended last December so we knew it was time to upgrade! We’ve reworked nearly all of our AMPs to now be compatible with Python 3.9. While the changes aren’t always major, there have been some differences.
For example, our Few-Shot Text Classification AMP includes a Streamlit application. Upgrading to Python 3.9 allowed us to jump to the latest version of Streamlit, which includes new caching features that make our Few-Shot application faster and more efficient! If you haven’t yet taken any of our AMPs for a spin, now would be a great time to start.
And don’t worry, upgrading our AMPs is a labor of love that hasn’t held us back from continuing our research into Text Style Transfer. We’ll have more on that next month. Stay tuned!