Rundeck performance dashboard in Grafana
We all know that modern systems and software are here to make our life easier and better, but we have to guarantee that they are healthy and resilient to support day-to-day demands.
Rundeck is a popular runbook automation product that started as an open source project. It can be used as a virtual machine provisioner for on-premises datacenters, cloud infrastructure manager, creator for auto-services, task or service scheduler, and much more. The Rundeck Community version is a server application you host on a system in your datacenter. The Rundeck application may contain critical jobs, it is critical to ensure the system is functioning optimally.
As a lead technical resource for our automation team, I work on environments that have an intensive usage and high number of workloads executed by Rundeck. I researched ways to monitor Rundeck instances, and found an old project on Github that aimed to expose Rundeck metrics to the Prometheus metrics format. Unfortunately, that project didn't work as expected.
Without a good existing solution, I decided to develop a new Rundeck Exporter that could get metrics and information about Rundeck server instances through its API, expose them so they could be read by Prometheus, and dashboards could be created with Grafana.
How it Works
The code makes requests to the Rundeck API System Info and Metrics endpoints, transforms the data into Prometheus metrics format, and opens a simple HTTP server that exposes that data through port 9620, by default.
With the metrics in a Prometheus instance, now it’s possible to get creative and produce dashboards to monitor Rundeck instances. Users can see the consumption of resources, what jobs failed the most, what jobs are running, time consuming jobs, and more. Grafana is a great tool to create those dashboards.
After creating a dashboard, it was easier to understand how the Rundeck servers' instances were performing (e.g. knowing the memory and CPU consumption of the instances and the jobs that were consuming more time than expected to be executed). This contributed to proactively alerting the teams responsible for the problematic jobs by configuring alarms in Prometheus Alertmanager which allowed our team to more efficiently scale the resources consumed by Rundeck servers’ instances.
Getting into the Project
All the project code and documentation can be found on Github, it's also listed on the Prometheus Exporters page. Additionally, read this article in the Rundeck docs to learn how to set up and configure the Rundeck exporter.
Feel free to contribute, make suggestions and open issues for the evolution of the project.
Message from Rundeck
If you have some Rundeck code you'd like to share with the community please share in the Rundeck category of the PagerDuty Community. Sign up for the Rundeck Discuss mailing list in Google Groups to ask questions.