Updates from Cloudera Fast Forward on new research, prototypes, and exciting developments
View this email in browser

Welcome to the June edition of Cloudera Fast Forward’s monthly newsletter. This month, alongside our regular recommended reading, we have two exciting research announcements!


New research: NLP for Question Answering

Here in the Fast Forward lab, we’re always asking ourselves a lot of questions. Now we’re asking BERT a lot of questions too! Our current research focus is question answering systems. In place of a report with all our learnings at the end of our research, we’re inviting you to follow along as we explore building a question answering system using modern neural architectures. We just released our third blog in the series, and you can check out each of them below:

Intro to Automated Question Answering

This introductory post discusses what QA is and isn’t, where this technology is being employed, and what techniques are used to accomplish this natural language task.

Building a QA System with BERT on Wikipedia

Follow along with this post to build a working Information Retrieval-based QA system, with BERT as the document reader and Wikipedia’s search engine as the document retriever. This is a fun toy model that that hints at potential real-world use cases.

Evaluating QA: Metrics, Predictions, and the Null Response

In this post, we look at how to assess the quality of a BERT-like model for Question Answering. We cover what metrics are used to quantify quality, how to evaluate your model using the Hugging Face framework, and the importance of the “null response” – questions that don’t have answers – for both improved performance and more realistic QA output.


New report: Causality for Machine Learning

Our latest research report — Causality for Machine Learning — is live, and the webinar is available on demand!

Causality is an emerging area of focus in data science practice, especially when we want to make decisions based on our models. Causality provides a framework for understanding which statistical relationships are true, and which only appear to be true in some circumstances. Our report provides guidance on when and how we need to think about causality.

Even when a problem does not require causal reasoning, we can greatly improve the robustness and generalizability of our machine learning models by taking some lessons from causality. The report outlines techniques that enable machine learning models to perform well across diverse unseen environments, including those that they were not trained on. This is applicable to any machine learning problem where we would like our models to perform well across diverse environments. In particular there are applications in natural language processing and computer vision, which we demonstrate in the accompanying prototype, Scene.