Use Case

Runbook Automation for Incident Management

The Problem: Longer Incidents and Frequent Disruptive Escalations

From minor issues to major outages, incidents are a fact of life in operations. When responding to an incident, we all feel the pressure and want to work quickly. But what gets in the way? Well, it’s hard to diagnose a problem and even harder to fix it!




Tool sprawl, lack or context/knowledge, disconnected access, changing environments — it’s like each time we respond we have to jump back into the mud and go through the same rigamarole over and over again. To make matters worse, these incidents take us away from other project work that we are supposed to be doing. Its stressful and frustrating.

What if we could setup self-service procedures that make it easier for us to diagnose and fix problems, and then allow others to use those procedures as well? However, it’s the same problems that make creating effective self-service appear to be too difficult or cost prohibitive: the sprawl of tools and scripts, disconnected knowledge, and security and compliance requirements preventing access.


The Rundeck solution

Rundeck makes it simple and easy to create self-service operations procedures that help you diagnose and resolve issues quicker.

Collaboratively define best practices, create standard operating procedures, and solve access issues — all while raising your security and compliance posture.

Use Rundeck’s access control features to enable your initial responders to take action with automated diagnostic and repair procedures. Incidents will be resolved quicker and with fewer disruptive escalations.



  • Quickly create workflows that span your existing tools, scripts, system commands, and API calls
  • Collaboratively define steps of the workflows that capture how-to knowledge and best practices from the various subject matter experts in your organization
  • Use Rundeck’s built-in access control features to define the fine-grained permissions (use you existing AD/LDAP or SSO for authentication) determining who can run, modify, or view what jobs.
  • Make it safe and easy for anyone to execute procedures —  by building “guardrails” using features like smart option handling (defaults, constraints, pick lists, dependent options, etc.), secure key/password store, data passing between steps, log filters, notifications, error-handling, and more.
  • Safely execute (or hand-off to colleagues) both diagnostic procedures (e.g. health checks, debug, or validation) or remediation procedures (e.g. restarts, resetting connections, deploying configurations, scaling, database procedures, or other tasks) 
  • Use read-only access combined with logging and notification features to give broad visibility into operations activity across your organization. 
  • Rundeck’s resource model learns details about your environment from multiple sources (Rundeck plugin points) so you automation can be parameterized and kept up to date

ROI Tips

  • Determine a baseline by counting the number of repetitive interruptions times the amount of time spent per interruption (including the cost of context switching from and back to the work that was interrupted). 
  • Determine the savings by identifying which repetitive interruptions can be eliminated through self-service with Rundeck and identifying how much faster the remaining interruptions can be  responded to use self-service jobs in Rundeck.




More about Rundeck’s ROI