Short answer: With runbook automation, engineers can standardize
operating procedures, define automated jobs
incorporating other existing automation, and
safely delegate these processes as APIs and
self-service requests to other
Now end users and team members can perform
tasks that previously only subject matter
experts could perform.
Popular runbook automation use cases include
incident management, service requests, business
continuity, or just spreading the operational
load amongst your colleagues.
Longer answer: Keep reading...
What will runbook automation do for your
Less waiting and quicker turnaround times
— Replace "open a ticket and
wait" or dig through the wiki to find the
runbook with "here's the button to do it
Fewer interruptions and escalations— Cut down on the repetitive
requests that disrupt your already overworked
subject matter experts and delay other work.
Enhance operational quality— Standardize operations to reduce operational
Life before runbook automation
You already have the tools, scripts, and manual
commands that will copy artifacts, manipulate
files, and call APIs.
The problem? The knowledge needed to invoke and
leverage those tools, scripts, and manual commands
lives in the heads of only a few people.
That leaves everyone else in your organization
with only a few unsatisfactory options when they
need an ops task completed:
Brave the wiki — Search for the correct docs
and try to decipher what the writer intended
(likely wondering the whole time if it is
up-to-date and accurate). Of course, this
assumes they have been granted the access to the
environments in question.
Dive into ad-hoc script/tool usage — Look in previously
agreed-upon locations for shared scripts and
hope your knowledge of the correct usage and
environment details is current.
Escalate! — The most likely option. Open
a ticket and send disruptive interruptions deep
into your organization. Then bide your time
while you wait for a response.
A lack of up-to-date knowledge — or insufficient
access privileges — block others from
participating directly in operations activity.
Consequently, everything (provisioning, incident
management, diagnostics, maintenance, reporting,
and more) falls to a few already overworked and
bottlenecked subject matter experts.
This inability to allow more people to participate
in operations leads to expensive and painful
Bottlenecks form around your subject matter
Incidents are longer than they need to be
because only a limited number of people can take
Escalations are rampant, causing more disruption
and interruption, which in turn crowds out
planned business improvement work.
Runbook automation to the rescue
Runbook automation is essential to your operations
Operations is more than executing a single
command — You are routinely dealing
with muti-step procedures that span multiple
command line or graphical interfaces.
Knowledge transfer is difficult and expensive
— You have to convey what to
do, the correct sequence, and how to evaluate
the output at each step.
The pace of change is skyrocketing — Under the pressures of
Digital Transformation, DevOps, and Cloud-Native
architectures, the pace of change (and
complexity) has increased exponentially and will
continue to do so. Timely and accurate knowledge
transfer via meetings or written text is
Runbook automation helps support the demands of
DevOps and Digital Transformation by enabling
anyone to safely execute self-service operations
tasks that previously only subject matter experts
Need to perform a restart or other action during
an incident? Use an automated runbook and ensure
the most up-to-date procedures are executed.
Want to refresh an environment or have new
resources provisioned? Don't fill out a ticket;
Keep getting interrupted to check the health or
performance of a production service? Create an
automated runbook so others can check the health
or performance themselves.
Runbook automation enables you to easily translate
expert operations knowledge into automated
procedures that anyone in your organization can
execute on-demand (assuming they have the access
Runbook automation leverages your existing skills
The role of runbook automation is not to replace your
existing tools, scripts, API calls, or manual
The role of runbook automation is to automate the
workflows that span and invoke your existing
automation and manual commands.
Runbook automation quickly becomes the
human-to-tool interface for your operations procedures.
ROI of runbook automation
Calculating the ROI of runbook automation is
dependent on the activity. There are two general
categories: incident response and service
Less Waiting — Stay out of
people's way. Your team is spending less time
filling out tickets and waiting for others to do
Fewer Interruptions — Protect
the limited capacity of subject matter experts.
Avoiding interruptions from repetitive requests
gives your subject matter experts more time to
work on the projects that move the needle for
Shorter incidents — Incidents
cause lost revenue, opportunity cost, and damage
to your reputation. By responding quicker and
enabling a broader set of colleagues to respond,
incidents are resolved quicker, and potential
damages are decreased.
Fewer Escalations — Your people
are your most expensive assets. Adding Runbook
Automation to your incident response enables
people closer to the issue to diagnose and
resolve the issue, avoiding highly disruptive
escalation chains that interrupt other work for
subject matter experts.
Fewer Incident Response Hours —
Where does your organization spend its time?
With shorter incidents and fewer escalations,
your teams are spending less total hours
responding to incidents and more time on project
work or initiatives that move the company
Critical capabilities of a runbook automation
At the technical core, Runbook automation is an
interface to a workflow and it connects people to
tools and infrastructure. However, there are a few
essential capabilities needed for a successful
Automation Harness — A universal hub that
connects any scripts, tools, or APIs into a
workflow. Works with any scripting language or
tool and allows you to leverage your
organization's existing skills and investments. If
one team loves Ansible, drop in their playbooks.
If another team is all PowerShell, drop in those
scripts. The Automation Harness lets you plug in
what you've already got (including manual system
commands), and then use simple configuration to
define the desired workflow.
Providing users with safe and controlled access to
smart choices. These "Guardrail" features
generally fall into two categories: access control
and usability. Access control features constrain
what users are allowed to do and provide a clear
audit trail. Usability features are focused on
guiding users and reducing training requirements.
Usability feature examples include dynamic
options, user input validation, output
formatting/processing, error handling, and
Dynamic Infrastructure Map — Today's infrastructure and
software components are continuously in motion.
Whether you are responding to an incident or
completing a provisioning task, you need to know
the location and the state of the things you care
about. A Dynamic Infrastructure Map keeps track of
the details by integrating with other "sources of
truth" in your environment (CMDBs, config
management, cloud/VM managers, monitoring tools,
and more). Now, the targeting of your automation
and variables in your automation automatically
Runbook automation for DevOps
DevOps inspired ways of working are encouraging
the delegation of operations work to those who are
outside of the traditional boundaries of
"Operations." For example, allowing Developers to
deploy, investigate, and fix their applications in
production under a "you build it, you run it"
Runbook automation helps DevOps ways of working in
several ways, including:
Enable Developers with safe, self-service access
to do the "run it" part of "you build it, you
Make it easy to hand-off operations procedures
in a fast-moving or continuous delivery
By providing a secure, auditable platform
through which all human-to-tool interaction
takes place, making security and compliance
comfortable with significantly expanding the
number of people doing operations work in
Runbook automation for SRE
SRE (Site Reliability Engineering) is a
significant change in how Operations work gets
done. SRE emphasizes using software engineering
practices to manage and improve the reliability,
scalability, and performance of business-critical
Runbook automation helps SRE practices in several
Turning what would previously have been written
documentation into executable code managed
through a software development lifecycle.
Self-service that enables operations activity to
be distributed throughout an organization,
reducing "toil" (a key SRE mechanism for
Building and collaborating on automated
checklists that improve the speed of diagnosing
and resolving incidents.
Runbook automation for legacy environments
Life in enterprise Operations will always be a
mixture of "the old" and "the new." Responding to
incidents or doing a provisioning activity will —
more often than not — require you to work across
multiple generations of technology.
Runbook automation helps operating in legacy
environments in several ways, including:
Capture standard operating procedures for all
services, ensuring quick and reliable access for
anyone responding to incidents or who need
Go faster, but maintain ITSM standards, by
replacing the need to open tickets for standard
changes with self-service automation (that can
still keep records in the ticket systems if
Ensure that the operations actions executed were
the same as those previously agreed to during
change advisory/review (with audit logs that
allow you to review what ran, who ran it, and
Ready to learn how to put runbook automation to
work for you?