Our industry has spent the past 7-8 years proclaiming the need for better integration of Dev and Ops to improve flow and quality. Despite this work — or perhaps because of it — there is a new rift forming between Dev and Ops.
Once upon a time, Developers had to be convinced that they should even care about operational concerns. But now, here we are in the middle of 2018, and there is a growing segment of Devs who proclaim that Ops is a thing of the past, won't exist in the future, and good riddance. "Ops is dead." "Containers and Serverless make Ops unnecessary." "Just give us a login and get out of our way."
Of course — like everything else in our industry — the tooling, the tasks, the organizational boundaries, and even the name of Operations are changing. But these assertions about the demise of Operations as a distinct craft and professional role are unrealistic and somewhat naive.
No one can predict the future, so what am I basing this on? History and experience. I've spent the past ten years in a diverse mix of enterprises (from web-scale hotshots to lumbering global enterprises) to see what Operations does and why it will survive any shift in platforms and skills or reorganization.
However, I'm aware that it is difficult in our industry to argue about the future when your position is seen as pouring reality on the hype.
My view that "Operations isn't going anywhere, nor do you really want it to" often gets me characterized as a buzzkill hanging on to the past.
So, I've started a list of reasons why the impending demise of Ops is greatly exaggerated -- and probably the opposite is true. This list isn't exhaustive, but I think it's a good start.
1. Abstractions are leaky (and so are Ops abstractions)
I give credit to Cindy Sridharan for pointing this out in her excellent article, "Everyone is not Ops." In Cindy's argument that Ops is a varied and necessary field, she brings in Joel Spolsky's famous point that all non-trivial abstractions are leaky. The fact that all abstractions leak means that to expertly use them you are going to have to know the details about what goes on beneath the abstraction.
Cindy points out that operations tools, platforms, and automation are abstractions on top of a broad set of technologies. These are non-trivial; these will leak. Operations professionals are the experts who will understand the nuances and wrestle with the complexity beneath the abstractions so that their developer colleagues can be productive.
In any organization of significant size, the scale and complexity of these operational concerns require specialists (no matter if the abstraction is made in-house, or externally by a cloud provider).
2. Serverless doesn't get rid of Operations; it just makes it look different.
Building on the theme of leaky abstractions, Patrick Debois' ongoing self-examination of his use of cloud-native services and serverless platforms is eye-opening.
Patrick, the person who coined the term "DevOps" and kicked off the movement, has been tweeting and speaking about his experiences building and running a significant business using mostly serverless technologies in the cloud.
Patrick's journey reinforces the point about abstractions being leaky. He is continuously required to explore the known and unknown behaviors of the myriad of services he uses. That is operations work. And, you can tell from Patrick's commentary alone that it is an ongoing and time-consuming job. Also, it is a real-time job, not just a design-time job. Services unexpectedly change, SLAs are broken, performance degrades, things just don't do what you expect them to do. Someone is going to have to respond on-the-fly, triage, and adapt.
if you had to start 100 ec2 instances in an ecs cluster every morning so they would be up for 1 hour - how confident would you be ? are there gotchas? #aws— Patrick Debois (@patrickdebois) May 3, 2018
Scale this up to enterprise level with hundreds of engineers working on multiple business lines, and you will need dedicated specialists. Those are Operations specialists doing operations work.
3. Deployment is just one part of Operations
Developers have historically held a reductionist view that deployment equals operations. This view is that deployment is the finish line and if there is a problem then just deploy it again with a different version. To be fair, in the smallest of organizations (i.e., a handful of devs working in cloud infrastructure) or the largest of organizations (siloed development team building a single component of a much larger system), this is the daily view of the developer.
However, spend some time in larger enterprises, and there is a broad range of necessary day-to-day operations activities that aren't code deployments. It is a huge list that includes responding to alerts, investigating performance, capacity planning, responding to ad-hoc business requests, managing caches, managing CDNs, configuring DNS services, managing SSL certs, managing proxies, managing firewalls/networks, running message systems, and more. Along with your application platforms, it doesn't really matter if these are run on servers you control or are APIs to another provider. Someone has to expertly and coherently manage all of these. That is operations work, and you are going to need Operations specialists.
When confronted with the depths of expertise needed for operations at scale, I'd wager that most application developers DON'T want to deal with it. Let's take just one slice of the operations domain and try it as a thought exercise. Here is a typically insightful post by Jessie Frazelle. Give it a quick read and then think about how many of your organization's developers really want to spend their days digging into the details behind that simple operations abstraction they currently see.
4. Legacy is inescapable
Legacy is the historical record of an organization's success. The longer you are in business, the more you will accumulate legacy code, platforms, processes, skills, and people.
I recently heard an insightful comment from Ron Forrester, "the moment you hit commit, your code is legacy." Someone (and it could be you) is going to come along tomorrow and try to make something new that works with it.
Enterprises are a web of legacy. Very little lives in isolation. Everything of significance has to hang together at runtime to keep the business going. No matter how well you think you know this complex web of dependencies, the behavior will be unexpected. Someone has to holistically care and feed for all of these disparate systems comprised of different technologies (from mainframes to serverless in some cases), built from different points of view, and built by different people (often via acquisition) who are no longer there. That is operations work, and you are going to need Operations specialists.
5. AI and automation won't save us
We can't forget that we are dealing with a complex system. In fact, an enterprise is two complex systems interacting to form an even more complex system.
One is the complex technical system (interactions between hardware, software, network, user traffic) and the other is a complex system of people working on those underlying technical components.
John Allspaw has been doing an excellent service for our industry by mapping the lessons from managing complex systems in traditional high-consequence domains (aviation, healthcare, manufacturing, etc.) to IT. Coming from a place of both deep practitioner experience and academic research, he has repeatedly pointed to bodies of knowledge that show that we can't stop failure at design-time and that automation has its limits (and often leads to unintended negative consequences). We need humans and we need those humans to be experts at operating complex systems.
If these other industries have spent billions of dollars and decades of scientific research and can't build sufficient autonomous operations of complex systems, it would be pure hubris to think we are going to do so now in IT.
6. Google does it
Ok, this is admittedly the weakest argument of all. But you have to admit that there is a certain irony that so many of the "I don't need Ops" crowd are the same ones chasing the cloud-native paradigm that emulates how Google operates. Yes, the same Google that just wrote the wildly popular book laying out the inner workings of their large, proud — and certainly not going anywhere — operations teams.
A lot is changing and a lot is staying the same
So, yes, Ops is changing. The skills required to do operations work are changing. The platforms and tools involved are evolving (but don't forget the decades of legacy that isn't!). Organizational silos are breaking-up, and developers and operators are co-mingling as peer engineers. This is an exciting time for everyone in IT. However, let's not get carried away, my friends in Development. As a craft and a professional role, Operations isn't going anywhere. And trust me, you'll be glad it isn't.