In this edition of SRE Anti-Patterns, I'm highlighting the common enterprise problem of disjointed access. Often the people responding to an incident are blocked from taking the required recovery actions even though they have the first-hand knowledge and experience needed to know what to do.
They don't have access to the correct services or infrastructure for any number of reasons. Sometimes, security or compliance concerns get in the way. Other times the lack of access could be just a byproduct of a siloed organization or political turf concerns.
In any case, the problem is the same: those who have the full context of the problem that needs to be solved (and usually urgently), aren't able to take action. They end up opening tickets (introducing new costs) or try to find by IM or phone a colleague who can help them.
From an SRE perspective, this way of working is expensive for two reasons:
- After you have gone through the trouble of hiring people with a valuable SRE skillset, you are putting roadblocks in front of them doing their job.
- You are introducing harmful task-switching and delays into the organization. This pushes interrupts on the people who are getting escalated to and causes everyone else to wait.
Instead, we can take an self-service approach and give SREs responding to incidents the access they need to get their job done quickly, effectively, and safely.
Those who are experts can define operational procedures that others can safely execute. With a tool like Rundeck, that is quick and simple to do since any combination of existing tools or scripting languages can be used to define those procedures.
Rundeck's access control policies make it easy to hand out fine-grain access to run the right procedures in the right environments. Rundeck's access control, combined with Rundeck's logging, will satisfy most security and compliance concerns as access is only given to take specific actions in specific locations.
These access control capabilities are also helpful in cases where the experts who are defining the response procedures — and taking action when problems arise — are outside of the traditional operations team (like developers in a "you build it, you run it" org model)
Self-Service Operations enables you to empower teams to take action that they couldn't take before while reducing the constant interruptions and escalations that interfere with the deliverables of other groups.
If you want to discuss how Rundeck and the Operations as a Service design pattern can help you get rid of your "I Could Fix It, If I Could Get To It" roadblocks, don't hesitate to contact us.
Other editions of "SRE Anti-Patterns":