DevOps for Data Science - Part 2 - Defining DevOps
Published Oct 27 2020 06:17 AM 3,471 Views

I’m wading into treacherous waters here in my series on DevOps for Data Science. Computing terms often defy explanation, especially newer ones. While “DevOps” or Developer Operations has been around for a while, it’s still not as mature a term as, say, “Relational Database Management System (RDBMS)”. That term is well known, understood, and accepted. (It wasn’t when it came out). Whatever definition I give you for DevOps will be contested – and I’m OK with that. Nothing brings out a good flame-war like defining a new technical term.


Regardless of the danger, we have to define the terms we’re using. Andrew Shafer and Patrick Debois used the term first, from what I can tell,  in 2008 at a conference on Agile – Agile being a newer term as well.  They posited in their talk the breaking down of barriers between developers, operations, and other departments. Since then, the term DevOps has come to mean much more.


Think about getting software in a user’s hands (or another system’s…er, hands). Working sequentially, the process looks something like this:


Design -> Environment Setup -> Code -> Build -> Test -> Package -> Release -> Monitor


With a few exceptions, that’s how software is done. Data Science is usually somewhere in there during the Code phase.


In most cases, there are clearly defined boundaries for what gets done by whom. For instance, developers write the code after the business sends over requirements. The deployment team handles packaging and releasing. And the operations team (Ops) handles monitoring and updating. Maybe it’s a little different in your organization, but in general, each team has an area they are responsible for. And that’s mostly all they focus on.


We’re all busy. I barely have enough time in my day to write code and the commensurate documentation, much less think about other parts of the process. But we have to think about the phases of software development that follow our own. 


Imagine if Equifax, as the business owners were requesting the software to be written, had said “And remember, we need to build right into the software things that require the right security to be in place. And let’s make sure we have a plan for when things go wrong.” Imagine if the developers had included a patch-check for the frameworks they use to ensure everything was up to date. Imagine if the Ops team cared that proper security testing is done way back in the development stage. Your identity might still be safe. 


And that’s my definition of DevOps: At its simplest, DevOps is including all parties involved in getting an application deployed and maintained to think about all the phases that follow and precede their part of the solution. That means the developer needs to care about monitoring. Business owners need to care about security. Deployment teams need to care about testing. And everyone needs to talk, build the process into their tools, and follow processes that involve all phases of the release and maintenance of software solutions.


That also means DevOps isn’t a tool or even a team – it’s a thought process. Sure, there are tools and teams that help implement it, but if only a few people are part of DevOps, then you don’t have DevOps.


In this series, I’ll cover more about the intersection of DevOps and Data Science, and in particular, the things you need to be careful about in implementing DevOps for Data Science. Use the references below to inform yourself, as a Data Scientist, what DevOps is. I’ll show you how to integrate it into your projects as we go.


For Data Science, I find this progression works best – taking these one step at a time, and building on the previous step – the entire series is listed here - I'll be working on these articles throughout this series:

  1. Infrastructure as Code (IaC)
  2. Continuous Integration (CI) and Automated Testing
  3. Continuous Delivery (CD)
  4. Release Management (RM)
  5. Application Performance Monitoring
  6. Load Testing and Auto-Scale

In the articles in this series that follows, I’ll help you implement each of these in turn.

(If you’d like to implement DevOps, Microsoft has a site to assist. You can even get a free offering for Open-Source and other projects:

Version history
Last update:
‎Oct 27 2020 06:17 AM
Updated by: