DevOps for Data Science - Part 1
Published Sep 22 2020 09:06 AM 3,992 Views
Microsoft

Data Scientists have often worked in a bit of a “silo” – meaning they were off to the side in an organization, maybe not even part of the Information Technology (IT) function. But that is changing. As data science projects are adopted into the mainstream, there is a need for structure. I’ve explained a modern data science structure for integration, called the Team Data Science Process (TDSP). It’s similar to IITL or the MOF but is designed to handle the processes involved in machine learning, artificial intelligence, and other advanced analytics.

 

Developer Operations – or DevOps – is not a framework for “doing Information Technology”. It’s really three things: People, Process, and Products. I’ll explain more about DevOps in a later article, but the point is that DevOps overlays the TDSP nicely, and is certainly something you need to think about from the outset.  To distill the thought a bit, DevOps can be thought of as a “shift-left” mentality. That means at the very start of the project, you think about the outcomes of each step – coding, building, testing, deployment, security, patching – all that.

 

Seems difficult, doesn’t it? It’s actually not. Yes, there is work involved, but once you start, it simply becomes part of the process. And like all good habits, it requires a little effort and maintenance to keep it going. I’ll show you how to implement DevOps in Data Science as we go, but for now, know that it is essential to your data science projects. Essential? Why?

 

Because security. Because maintenance. Because testing. Because constant technical debt. For these reasons and many more that will become apparent, you need to start thinking about not only the TDSP as your structure your projects, but also DevOps. In this series I’ll show you how .

 

For Data Science, I find this progression works best – taking these one step at a time and building on the previous step –  And this is the series I'll create:

  1. Defining DevOps for Data Science
  2. Infrastructure as Code (IaC)
  3. Continuous Integration (CI) and Automated Testing
  4. Continuous Delivery (CD)
  5. Release Management (RM)
  6. Application Performance Monitoring
  7. Load Testing and Auto-Scale

In the articles in this series that follows, I’ll help you implement each of these in turn.

 

(If you’d like to implement DevOps, Microsoft has a site to assist. You can even get a free offering for Open-Source and other projects: https://azure.microsoft.com/en-us/pricing/details/devops/azure-devops-services/)

 

Co-Authors
Version history
Last update:
‎Feb 10 2021 04:24 AM
Updated by: