We end 2018 with a round-up of some of the research, talks, sci-fi, visualizations/art, and a grab bag of other stuff we found particularly interesting, enjoyable, or influential this year (and we’re going to be a bit fuzzy about the definition of “this year”)!
“The Alchymist, In Search of the Philosopher’s Stone, Discovers Phosphorus, and prays for the successful Conclusion of his operation, as was the custom of the Ancient Chymical Astrologers” (great title!) by Joseph Wright of Derby, now in Derby Museum and Art Gallery, Derby, UK (Wikipedia)
At NIPS in December 2017, Ali Rahimi (and Ben Recht) delivered an address that asserted that modern deep learning is more like alchemy than science. We won’t attempt to paraphrase their short talk, but many of us found it compelling, and it’s certainly worth watching or reading. This lead to much discussion in the deep learning community, and the appearance of a subdiscipline that treats deep learning as an observational science (see e.g. How AI Training Scales and How does batch normalization help optimization?).
We published our own research on interpretability in 2017. The area we focused on has developed rapidly, with tools like Anchors and Shap (see this back issue of our newsletter for more) now the state-of-the-art in black box interpretability. And to the extent interpretability will help deep learning become more scientific and less alchemical, we loved The Building Blocks of Interpretability. But our favorite interpretability work of 2018 questions the entire premise of this family of methods. Cynthia Rudin’s Please Stop Explaining Black Box Models for High Stakes Decisions is a bracing and highly recommended read.
The final theme we found exciting in 2018, and will be keeping an eye on in 2019, is transfer learning applied to NLP. Sebastian Ruder’s NLP’s ImageNet moment has arrived sets the scene really well for 2019, and highlights some of the projects we’re most interested in, which are well covered in Thomas Wolf’s The Current Best of Universal Word Embeddings and Sentence Embeddings. We also wrote a blog post about transfer learning, and a newsletter about its application to NLP in particular.
Ex-Clouderan Josh Wills’s ten minute talk on Visibility and Monitoring for Machine Learning Models was our favorite talk of 2018. The highlight of the talk was the koan-like “You should deploy [a model] never or prepare to deploy it over and over and over and over and over again, repeatedly forever, ad infinitum”.
Josh Wills (Image credit: Launch Darkly and the Test In Production Meetup
Hillel Wayne’s Beyond Unit Tests: Taking Your Testing to the Next Level was an engaging, opinionated and slightly mind-bending view of the relationship between traditional unit/integration testing and formal methods.
Highlighting a talk from 2015 feels a little like cheating, but in a year that saw the implementation of the GDPR, and our own research into federated learning, we rewatched Maciej Ceglowski’s 2015 Strata keynote Haunted by Data. His “don’t collect it, don’t store it, don’t keep it” takeaways feel like better advice than ever. Suresh Venkatasubramanian’s 2018 blog post on regulation of the tech industry vs ethical education is an interesting addendum to Ceglowski’s talk.
Art and sci-fi
In 2018 the machine learning community rediscovered the well-trodden issue of authorship in modern art thanks to Christie’s auction house and the Obvious Collective.
Marco Klingeman’s BigGAN explorations (Image credit: twitter.com/quasimondo)
Published in 1986, The Making of the Atomic Bomb by Richard Rhodes is perhaps not as cutting edge as some of the other things on this list. But we found it interesting for two reasons: first, it’s an interesting story about the management of research in a non-academic context, which is a topic we can’t get enough of at Cloudera Fast Forward Labs. And second, it’s a sobering look at the way researchers attempt (and in many cases fail) to grasp and control the impact of their inventions. The relevance to machine learning research is obvious.
Our favorite periodical was (and is!) Logic. If you’re a follower of Cloudera Fast Forward Labs, you’ll certainly enjoy this interview with an anonymous data scientist from their 2017 debut issue, but everything they’ve published since has been equally worthwhile, and relevant to anyone working in tech.
Finally, this was the best Halloween costume.
Onwards to 2019!
More from the Blog
Dec 6 2018
by — Our prototypes are designed to demonstrate the value of the technologies we research. For our most recent prototype, Turbofan Tycoon, we decided that the best way to demonstrate the value of federated learning was to place you in an interactive simulation where you’re in charge of maintaining four turbofan engines. In this post, I’m going to try and explain a bit about why we decided that, an...
Dec 28 2018
by — Machine learning continues to make its way into the arts, most recently in film and TV. In a recent blog post, data scientists at 20th Century Fox and technical staff at Google Cloud described the approach they are using to predict audiences for their movies. (The tone of the post is fairly self-promoting, befitting the subject matter and industries involved.) Their product, Merlin Video, is ...
May 22 2019
by — Active learning allows us to be smart about picking the right set of datapoints for which to create labels. Done properly, this approach results in models that are trained on less data performing comparatively to models trained on much more data. In the world of meta-learning, we do not focus on label acquisition; rather, we attempt to build a machine that learns quickly from a small number of ...