Short answer: With runbook automation, engineers can standardize operating procedures, define automated jobs incorporating other existing automation, and safely delegate these processes as APIs and self-service requests to other stakeholders.
Now end users and team members can perform tasks that previously only subject matter experts could perform.
Popular runbook automation use cases include incident management, service requests, business continuity, or just spreading the operational load amongst your colleagues.
Longer answer: Keep reading...
What will runbook automation do for your operations?
- Less waiting and quicker turnaround times — Replace "open a ticket and wait" or dig through the wiki to find the runbook with "here's the button to do it yourself."
- Fewer interruptions and escalations— Cut down on the repetitive requests that disrupt your already overworked subject matter experts and delay other work.
- Enhance operational quality— Standardize operations to reduce operational errors.
Life before runbook automation
You already have the tools, scripts, and manual commands that will copy artifacts, manipulate files, and call APIs.
The problem? The knowledge needed to invoke and leverage those tools, scripts, and manual commands lives in the heads of only a few people.
That leaves everyone else in your organization with only a few unsatisfactory options when they need an ops task completed:
- Brave the wiki — Search for the correct docs and try to decipher what the writer intended (likely wondering the whole time if it is up-to-date and accurate). Of course, this assumes they have been granted the access to the environments in question.
- Dive into ad-hoc script/tool usage — Look in previously agreed-upon locations for shared scripts and hope your knowledge of the correct usage and environment details is current.
- Escalate! — The most likely option. Open a ticket and send disruptive interruptions deep into your organization. Then bide your time while you wait for a response.
A lack of up-to-date knowledge — or insufficient access privileges — block others from participating directly in operations activity.
Consequently, everything (provisioning, incident management, diagnostics, maintenance, reporting, and more) falls to a few already overworked and bottlenecked subject matter experts.
This inability to allow more people to participate in operations leads to expensive and painful problems:
- Bottlenecks form around your subject matter experts.
- Incidents are longer than they need to be because only a limited number of people can take action.
- Escalations are rampant, causing more disruption and interruption, which in turn crowds out planned business improvement work.
Runbook automation to the rescue
Runbook automation is essential to your operations because:
- Operations is more than executing a single command — You are routinely dealing with muti-step procedures that span multiple command line or graphical interfaces.
- Knowledge transfer is difficult and expensive — You have to convey what to do, the correct sequence, and how to evaluate the output at each step.
- The pace of change is skyrocketing — Under the pressures of Digital Transformation, DevOps, and Cloud-Native architectures, the pace of change (and complexity) has increased exponentially and will continue to do so. Timely and accurate knowledge transfer via meetings or written text is increasingly infeasible.
Runbook automation helps support the demands of DevOps and Digital Transformation by enabling anyone to safely execute self-service operations tasks that previously only subject matter experts could perform.
Need to perform a restart or other action during an incident? Use an automated runbook and ensure the most up-to-date procedures are executed.
Want to refresh an environment or have new resources provisioned? Don't fill out a ticket; serve yourself.
Keep getting interrupted to check the health or performance of a production service? Create an automated runbook so others can check the health or performance themselves.
Runbook automation enables you to easily translate expert operations knowledge into automated procedures that anyone in your organization can execute on-demand (assuming they have the access privileges).
Runbook automation leverages your existing skills and investments
The role of runbook automation is not to replace your existing tools, scripts, API calls, or manual commands.
The role of runbook automation is to automate the workflows that span and invoke your existing automation and manual commands.
Runbook automation quickly becomes the human-to-tool interface for your operations procedures.
ROI of runbook automation
Calculating the ROI of runbook automation is dependent on the activity. There are two general categories: incident response and service requests.
- Less Waiting — Stay out of people's way. Your team is spending less time filling out tickets and waiting for others to do something.
- Fewer Interruptions — Protect the limited capacity of subject matter experts. Avoiding interruptions from repetitive requests gives your subject matter experts more time to work on the projects that move the needle for your business.
- Shorter incidents — Incidents cause lost revenue, opportunity cost, and damage to your reputation. By responding quicker and enabling a broader set of colleagues to respond, incidents are resolved quicker, and potential damages are decreased.
- Fewer Escalations — Your people are your most expensive assets. Adding Runbook Automation to your incident response enables people closer to the issue to diagnose and resolve the issue, avoiding highly disruptive escalation chains that interrupt other work for subject matter experts.
- Fewer Incident Response Hours — Where does your organization spend its time? With shorter incidents and fewer escalations, your teams are spending less total hours responding to incidents and more time on project work or initiatives that move the company forward.
Critical capabilities of a runbook automation solution
At the technical core, Runbook automation is an interface to a workflow and it connects people to tools and infrastructure. However, there are a few essential capabilities needed for a successful solution.
Automation Harness — A universal hub that connects any scripts, tools, or APIs into a workflow. Works with any scripting language or tool and allows you to leverage your organization's existing skills and investments. If one team loves Ansible, drop in their playbooks. If another team is all PowerShell, drop in those scripts. The Automation Harness lets you plug in what you've already got (including manual system commands), and then use simple configuration to define the desired workflow.
Guardrails — Providing users with safe and controlled access to smart choices. These "Guardrail" features generally fall into two categories: access control and usability. Access control features constrain what users are allowed to do and provide a clear audit trail. Usability features are focused on guiding users and reducing training requirements. Usability feature examples include dynamic options, user input validation, output formatting/processing, error handling, and conditional notifications.
Dynamic Infrastructure Map — Today's infrastructure and software components are continuously in motion. Whether you are responding to an incident or completing a provisioning task, you need to know the location and the state of the things you care about. A Dynamic Infrastructure Map keeps track of the details by integrating with other "sources of truth" in your environment (CMDBs, config management, cloud/VM managers, monitoring tools, and more). Now, the targeting of your automation and variables in your automation automatically stays up-to-date.
Runbook automation for DevOps
DevOps inspired ways of working are encouraging the delegation of operations work to those who are outside of the traditional boundaries of "Operations." For example, allowing Developers to deploy, investigate, and fix their applications in production under a "you build it, you run it" model.
Runbook automation helps DevOps ways of working in several ways, including:
- Enable Developers with safe, self-service access to do the "run it" part of "you build it, you run it."
- Make it easy to hand-off operations procedures in a fast-moving or continuous delivery environment.
- By providing a secure, auditable platform through which all human-to-tool interaction takes place, making security and compliance comfortable with significantly expanding the number of people doing operations work in production.
Runbook automation for SRE
SRE (Site Reliability Engineering) is a significant change in how Operations work gets done. SRE emphasizes using software engineering practices to manage and improve the reliability, scalability, and performance of business-critical systems.
Runbook automation helps SRE practices in several ways, including:
- Turning what would previously have been written documentation into executable code managed through a software development lifecycle.
- Self-service that enables operations activity to be distributed throughout an organization, reducing "toil" (a key SRE mechanism for regulating workload).
- Building and collaborating on automated checklists that improve the speed of diagnosing and resolving incidents.
Runbook automation for legacy environments
Life in enterprise Operations will always be a mixture of "the old" and "the new." Responding to incidents or doing a provisioning activity will — more often than not — require you to work across multiple generations of technology.
Runbook automation helps operating in legacy environments in several ways, including:
- Capture standard operating procedures for all services, ensuring quick and reliable access for anyone responding to incidents or who need something provisioned.
- Go faster, but maintain ITSM standards, by replacing the need to open tickets for standard changes with self-service automation (that can still keep records in the ticket systems if needed).
- Ensure that the operations actions executed were the same as those previously agreed to during change advisory/review (with audit logs that allow you to review what ran, who ran it, and the output/results).
Ready to learn how to put runbook automation to work for you?