In this edition of SRE Anti-Patterns, I'm highlighting one of the more substantial shortcomings of written documentation — it is difficult to get people to read it!
Change is a constant in enterprise operations. Services, configuration, and underlying infrastructure are constantly changing. Procedures are frequently updated. How do you effectively convey those changes so mistakes aren't made?
While a lot of our industry's focus has gone into how to get people to write documentation, less attention has been paid to the problem of getting people to read that documentation. How do you get people to stop and see if the procedure has changed or the environment is not what they expected? Getting people to stop and read is difficult enough during project work, but becomes even less likely during emergencies.
Also working against documentation efforts is the classic "relax, I've done this before" syndrome. The more times that someone performs a task, the more they believe that they understand what to do and expect underlying conditions and results to be the same. The more routine and seemingly mundane the task, the more likely it is that the person performing the task will approach it with confidence and not feel the need to look for instructions.
Expecting people to stop, especially in an emergency, and look at the documentation (be it a wiki, static site, man pages, or other tools) to see if anything has changed just isn't likely. Of course, this common quirk of human nature has lead to many self-inflicted outages.
Perhaps, relying on documentation isn't the best way to convey infromation — especially changes to operations procedures. Email notices or Slack blasts aren't very helpful either as the messages get lost in the day-to-day communication noise of much more pertinent information.
Communicating through code and automated procedures is much more effective. As a side benefit of your company moving to a Self-Service Operations model, the infrastructure is there to quickly enable subject matter experts to define and update automated procedures as the need arises.
Now when it is 3 am, and you roll out of your bed to respond to that alert, you won't be asking yourself "Are these the still the correct commands? Am I passing the right options to these scripts? Has anything changed?"
Documentation is essential and has its place. But when it comes to communicating operations procedures, do it through code.
Rundeck is the easiest way to get started as you can plug in all of your existing commands, tools, and scripts. The Rundeck platform does the rest of the heavy lifting for you.
If you want to discuss how Rundeck and Self-Service Operations can help you capture and convey key knowledge in your organization, don't hesitate to contact us today.
Other editions of the "SRE Anti-Patterns" series: