What Is A DevOps Engineer?
Written by Nick Otter.
Contents
Devops Engineer Fundamentals
Gene Kim, Jez Humble, Patrick Debois, John Willis, Nicole Forsgren’s The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations has worthy DevOps definitions and responsibilities. Using that as a foundation and combined with my humble personal experience, here’s what I think.
A “DevOps Engineer” should be defined as so:
- DevOps is a methodology, not an individual task
- A DevOps Engineer has a responsibility to deliver that methodology
“DevOps” should:
- Abstract a technical solution from key business milestones
- Manage and deliver the technical solution for key business milestones
- Prioritise business milestones far before technical milestones
- Distill the technical side for Functional teams
DevOps is measured by:
- Deployment frequency
- Change lead time
- Change failure rate
- Time to restore services
Systems that have been improved by DevOps might be:
- Scalable
- Highly Available
- Secure
- Reliable
- Automated
- Observable
- Built using Agile
- Have very fast feedback loops
As a DevOps engineer, problems should be solved by:
- Being an enabler not a problem solver
- Creating a unified understanding of the problem for everyone
- Putting forward different options
- Enabling teams
A “DevOps Engineer” will have similar responsibilities to:
- Site Reliability Engineer
- Platform Engineer
- Cloud Engineer
- Scrum Master
Brushing past:
- Solutions Architect
- System Administrator
- Linux Engineer
- Operations
A DevOps Engineer aim should be never to “touch” Production.
DevOps Engineer Metrics
Performance of a companies DevOps movement should be analysed using DORA metrics (First discussed in Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations).
- Deployment frequency: How often a software team pushes changes to production
- Change lead time: The time it takes to get committed code to run in production
- Change failure rate: The share of incidents, rollbacks, and failures out of all deployments
- Time to restore service: The time it takes to restore service in production after an incident
DevOps Sources
Thanks. This was written by Nick Otter.