Recently, I delivered a talk called "Clearing the Way for SRE in the Enterprise" at SREcon 2018 Europe. The presentation sparked some spirited hallway conversation. It turns out there are many companies who are having a difficult time with SRE!
My point: Unless you address the forces that currently undermine your Operations organization, your SRE transformation will likely be in name only.
First, I looked at the three core principles of SRE (as defined by folks at Google who coined Site Reliability Engineering):
- SRE needs Service Level Objectives, with consequences
- SREs have time to make tomorrow better than today
- SRE teams have the ability to regulate their workload
Next, I looked at the four forces that will undermine any enterprise operations organization:
- Excessive Toil
- Low Trust
Then I went through the SRE principles again and discussed how the undermining forces prevent those principles from being realized. For example:
Finally, I discussed some of the ways to eliminate or reduce the undermining forces:
- Lean on Lean to find what to fix
- Get rid of as many Silos as possible
- Focus on creating shared responsibility across Dev and Ops
- Stay out of SREs way
- Turn remaining handoffs into pull-based self-service interfaces
- Use tickets only for what they are good for
- Shift left the ability to take action
- Start a book club
Of course, given my work at Rundeck, Self-Service Operations received a fair amount of discussion (covering both what it is and some of the popular use cases for SRE).
Here are the full slides:
And here is the full video:
See the SREcon site for more info on the conference and upcoming dates. It is shaping up to be a great conference series.