How to Improve Support Capacity and Reduce Escalations in Dev Operations
  • Safely empower L1 teams to act quicker and more effectively
  • Protect the capacity of L2/L3 teams by reducing escalations
  • Reduce MTTR and overall support costs

Labor capacity shortages are a fact of life in the enterprise. This is especially common in Operations. It’s typical for teams to be faced with more work than there is time in the day, or people available to do it. Managers have come to rely on teams putting in long hours and frequent heroics in order to just get by -- a status quo that is unpleasant and ultimately unsustainable.

Self-Service Operations is a design pattern that helps Operations teams decrease support costs while improving agility and responsiveness.

The Problem

The interrupt-driven nature of operations work is at the heart of the capacity challenges. Unplanned and planned work mix like oil and water. Interruptions from the unplanned work -- and the costly context switching that comes with it -- prevent teams from completing the planned project work that the business needs to push forward.

These interruptions have a ripple effect throughout the organization in the form of cascading escalations (e.g. from L1 to L3) and compounding schedule slippage (project A is delayed by outage A; which in turn delays project B because it depended on Project A; which then delays project C because it needs people from Project B and required deliverables from Project A).




Current business and technology trends (e.g. Digital, DevOps, Cloud, Containers, etc.) are causing a sharp increase in both the pace and complexity of IT work. The pressure for operations to respond in kind and grain unprecedented speed and reliability is mounting. These pressures further exasperate the labor capacity shortages. Without that capacity, Operations organizations aren't able to get to the improvement work.  Technology organizations, and especially their Operations organizations, need to get a handle on this capacity problem before the problem becomes terminal.


Self-Service Improves Support Capacity and Reduces Escalations


Self-Service Operations is a design pattern that helps Operations teams decrease support costs while improving agility and responsiveness. Self-Service Operations enables you to safely shift activity to where your workforce can be best utilized. This includes moving the ability to take action closer to the problem or party in need.

The key to self-service operations is that it allows IT operations to divide and distribute the essential parts of an automated procedure: the definition of the automated procedure, the ability to execute that automated procedure, and control over the security and management policies governing that automated procedure.

The most common usages for the Self-Service Operations design pattern are:

  1. Improve the capacity of individual teams involved in operational support
  2. Enable operational support teams to take on more specialized and advanced work
  3. Enable other teams to safely, securely, and effectively do operational support work
  4. Reduce the cost of transitioning support knowledge to operations support teams




Let’s take a look at how Rundeck can play a key role in each:


1. Improve the capacity of individual teams involved in operational support

Operations support teams can use Rundeck to create standard operating procedures for known problems and expected events. Rundeck’s ability to consume almost any scripting language and connect to any tool makes it easy for teams to collaborate on creating a catalog of best practices and utilities.

By using Rundeck to catalog and collaborate on automated procedures, teams reduce manual effort and reduce variability (a key to reducing errors and rework). At an organizational level, Rundeck can be used to define a standard set of “operating verbs” for all applications and create a job for each (e.g. Status, Restart, Reconfigure, Clear Cache, Reset DB Connection, Deploy, etc.).

Operations support teams quickly becomes more efficient and effective as they improve their use of automation.





2. Enable operational support teams to take on more specialized and advanced work

In any enterprise, there are key personnel and specialists who get pulled into a disproportionate share of incidents. The result is bottlenecks and delays even when it looks like, on paper, that there should be plenty of labor capacity to go around.

Rundeck helps you protect the capacity of these internal experts. You need them to be able to focus on work that moves the company forward, rather than be swamped with repetitive requests. Capture, as Rundeck jobs, this repetitive work and safely delegate those tasks to others. Operation support teams will be able to handle more and more requests on their using this self-service model. This directly cuts down on escalations and protects the capacity of L2 / L3 resources who need to focus on project work. L1 teams can respond to incidents quicker when they don’t have to escalate.

Rundeck also helps an organization capture specialized knowledge in a reusable and shareable way. In addition to the labor and time savings noted above, this knowledge capture ability reduces the risk of key resources not being available when needed (or being lost altogether).





3. Enable other teams to safely, securely, and effectively do operational support work

DevOps and new organizational operating models are calling for unprecedented delivery speed and access to production environments. Development and delivery teams are looking to be provided with self-service access to tasks that were formerly only entrusted to a handful of people in Operations. These delivery teams are going through the effort of streamlining how they work in order to get into the tightest feedback loops possible so they can speed up their lifecycle. Operations needs to support this streamlined and rapid pace of working -- all while maintaining the high-level of quality, security, and reliability that is expected of them.

Rundeck’s fine-grained role-based access control enables operations to safely provide self-service access to anyone who needs to monitor or execute jobs. With Rundeck, operations can take a more open stance while being in total control over who can do what and where they can do it. This allows organizations to both dramatically improve lead times and shift operations work to available labor in other parts of the organization (freeing up capacity in Operations).





4. Reduce the cost of transitioning support knowledge to operations support teams

Handoffs between delivery and operations support teams is a necessary, but often messy, fact of life in most enterprises. It is commonly understood that handing off manually documented procedures is slow and error-prone. Following newer industry advice by handing off code (scripts or tools) for the receiving party can execute is, in theory, a better method -- but can also be cumbersome and error-prone. Often this method exposes the recipient to complicated or unpolished technology with very little context. Not to mention that the this approach demands a tool proficiency and scripting language uniformity that isn’t reality across organizational lines in a sizable enterprise.

Use Rundeck jobs as a lightweight and easy to use tool for capturing procedures. Let teams use the scripting languages with which they are comfortable. Use plugins to provide easy integration with other automation tools and information systems. Rundeck functions well as a place where Dev can contribute procedures and Ops can vet and accept. This allows the organization to move quicker and remove the bottlenecks, long delays, and self-inflicted errors that come with traditional delivery to operations support handoffs.





In future posts, we'll be covering the ROI benefits of Self-Service Operations, as well as looking at these design patterns in depth. 

If you are interested in self-service operations or improving the capacity of your operations organization, we'd like to talk to you. Feel free to contact us to setup a discussion.

Or, view our comprehensive guide to Operations as a Servicewhich is the method of dev operations that can guide your team out of the situations above into smoother, more efficient IT operations.

New Call-to-action