We’re excited to release the latest research from our machine intelligence R&D team!
This report and prototype explore probabilistic programming, an emerging programming paradigm that makes it easier to construct and fit Bayesian inference models in code. It’s advanced statistics, simplified for data scientists looking to build models fast.
Bayesian inference has been popular in scientific research for a long time. The statistical technique allows us to encode expert knowledge into a model by stating prior beliefs about what we think our data looks like. These prior beliefs are then updated in light of new data, providing not one prediction, but a full distribution of likely answers with baked-in confidence rates. This allows us to asses the risk of our decisions with more nuance.
Bayesian methods lack widespread commercial use because they’re tough to implement. But probabilistic programming reduces what used to take months of thorny statistical sampling into an afternoon of work.
This will further expand the utility of machine learning. Bayesian models aren’t black boxes, a criterion for regulated industries like healthcare. Unlike deep learning networks, they don’t require large, clean data sets or large amounts of GPU processing power to deliver results. And they bridge human knowledge with data, which may lead to breakthroughs in areas as diverse as anomaly detection and music analysis.
Our work on probabilistic programming includes two prototypes and a report that teaches you:
- How Bayesian inference works and where it’s useful
- Why probabilistic programming is becoming possible now
- When to use probabilistic programming and what the code looks like
- What tools and languages exist today and how they compare
- Which vendors offer probabilistic programming products
Finally, as in all our research, we predict where this technology is going, and applications for which it will be useful in the next couple of years.
Probabilistic Real Estate Prototype
One powerful feature of probabilistic programming is the ability to build hierarchical models, which allow us to group observations together and learn from their similarities. This is practical in contexts like user segmentation: individual users often shares tastes with other users of the same sex, age group, or location, and hierarchical models provide more accurate predictions about individuals by leveraging learnings from the group.
We explored using probabilistic programming for hierarchical models in our Probabilistic Real Estate prototype. This prototype predicts future real estate prices across the New York City boroughs. It enables you to input your budget (say $1.6 million) and shows you the probability of finding properties in that price range across different neighborhoods and future time periods.
Hierarchical models helped make predictions in neighborhoods with sparse pricing data. In our model, we declared that apartments are in neighborhoods and neighborhoods are in boroughs; on average, apartments in one neighborhood are more similar to others in the same location than elsewhere. By modeling this way, we could learn about the West Village not only from the West Village, but also from the East Village and Brooklyn. That means, with little data about the West Village, we could use data from the East Village to fill in the gaps!
Many companies suffer from imperfect, incomplete data. These types of inferences can be invaluable to improve predictions based on real-world dependencies.
Play around with the prototype! You’ll see how the color gradients give you an intuitive sense for what probability distributions look like in practice.
How to Access our Reports & Prototypes
We’re offering our research on probabilistic programming in a few ways:
- Single Report & Prototype (digital and physical copies)
- Annual Research Subscription (access to all our research)
- Subscription & Advising (research & time with our team)
- Special Projects (dedicated help to build a great data product)
Write to us at email@example.com if you’d like to learn more!
More from the Blog
Jan 11 2017
with — A rolling regression with PyMC3: instead of the regression coefficients being constant over time (the points are daily stock prices of 2 stocks), this model assumes they follow a random-walk and can thus slowly adapt them over time to fit the data best. Probabilistic programming is coming of age. While normal programming languages denote procedures, probabilistic programming languages denote...
with Thomas Wiecki
Jan 25 2017
with — Our mission at Fast Forward Labs is to commercialize artificial intelligence research, to clearly explain how new technologies work and help enterprises apply them in practical products. As our business model is rare, we’re always keen to connect with and learn from organizations that straddle this same line between research and application. That’s why we’re delighted to feature Galois, a Portl...
with David Archer
Aug 15 2017
by — The Tabula Rogeriana, a world map created by Muhammad al-Idrisi through traveler interviews in 1154. The Wikipedia corpus is one of the favorite datasets of the machine learning community. It is often used for experimenting, benchmarking and providing how-to examples. These experiments are generally presented separate from the Wikipedia user interface, however, which has remained true to the...