On January 11 at the Yelp office in San Francisco SF Big Analytics held a MeetUp dedicated to DevOps for Data Science: Lifecycle of Big Data analytics services.
“The next breakthrough in data analysis may not be in individual algorithms, but in the ability to rapidly combine, deploy, and maintain existing algorithms.” (c) Hazy: Making it Easier to Build and Maintain Big-data Analytics.
Provisioning a Hadoop/Spark cluster is not a big deal anymore. But how do you go beyond the basic proof of concept to collecting and preparing data, building machine learning model and analytics report? As a data scientists and engineers, we should realize that there are a lot of challenges to solve after PoC iteration before we can start to see big data deliver sustainable value and decision support to businesses. In this talk we discussed how to move data exploration and research projects into operations and continuously monitor, update and add new features reliably. We covered cultural and technical challenges to establishing DevOps process for big data analytics teams then continued code samples and a live demo.