Working with over 30 enterprise clients, in industries like financial services, insurance, publishing, and retail, the Fast Forward Labs team has had ample opportunity to observe the challenges of doing data science in practice. By now, most organizations have moved beyond traditional waterfall software development process to adopt more risk-tolerant and agile methodologies. But directly applying agile to data science can create friction, as data products require more leeway for experimentation and exploration, as well as open communication between business, science, and engineering teams.
With so many data teams looking for guidance, we’re excited to see O’Reilly’s new (free!) eBook, Development Workflows for Data Scientists (PDF), which features insights from our Friederike Schuur. The book includes guidance on structuring teams, designing workflows, optimizing processes to learn from previous work, documenting outcomes, and communicating results to non-technical colleagues. Friederike, for example, contrasts the value documentation in standard software development versus experimental data product development:
In data science and machine learning you’re doing so many things before you know what actually works. You can’t just document the working solution. It’s equally valuable to know the dead ends. Otherwise, someone else will take the same approach.
Check out the book, and feel free to contact us at firstname.lastname@example.org with questions about your own data science processes!
More from the Blog
Mar 22 2017
by — This post originally appeared on quamproxime.com, the personal blog of our sales and marketing lead, Kathryn Hume One of the main arguments the Israeli historian Yuval Noah Harari makes in Sapiens: A Brief History of Humankind is that mankind differs from other species because we can cooperate flexibly in large numbers, united in cause and spirit not by anything real, but by the fictions of ...
Apr 14 2017
by — In this post we are going to look at an interactive visualization that clusters movies together based on their ratings by a set of users. This visualization will give us a glimpse into the aesthetic tastes of a community of cinephiles.
Aug 15 2017
by — The Tabula Rogeriana, a world map created by Muhammad al-Idrisi through traveler interviews in 1154. The Wikipedia corpus is one of the favorite datasets of the machine learning community. It is often used for experimenting, benchmarking and providing how-to examples. These experiments are generally presented separate from the Wikipedia user interface, however, which has remained true to the...