Note from the Editor: the following is the author's point of view related to the topic of managing monitoring systems.
As organizations move toward a new generation of distributed systems and microservice architecture, the DevOps world finds it increasingly difficult to keep up with the hybrid needs of today's application monitoring, and the alerts it generates. Managing this aspect of IT infrastructure has DevOps professionals turning to up-and-coming serverless methodologies for this purpose.
The software implementing this process ranges from commercial to open source, and expensive to free. Let's start by looking at the problem itself. What makes managing monitoring and alerts so difficult?
Managing monitoring
Managing monitoring and alerts becomes complicated when different organizations, working in different regions, each choose different communication mediums to make their employees and customers comfortable.
Let’s understand this issue a bit more through an example. Take a company which:
- Has many products that live on various cloud and non-cloud platforms.
- Uses chat and email services for internal communication.
- Has support professionals working in different time zones.
Now, if an issue comes up with any of this company's products, the response team should act before the customer (and company) experiences negative effects. There won’t be much of a problem if the response team is immediately there to jump on the issue, but in case they are not, someone from the response team should notify them in some way to reduce the diameter of functional or possible financial losses.
Here's the problem. People are not able to notice and respond to issues all the time. If you send the response team an email or text message, there is a probability that no one on the team will see it before the issue causes significant financial loss. Also, the response team might already be receiving so many email alerts that even if they are available, they may find it difficult to spot the high-impact issues among the smaller ones. In this situation, you should send someone from the response team a distinct alert, such as making a phone call or messaging a pager. However, if you decide to call, you need to know who is actually available, otherwise you might have to call multiple people until you find the response team member who is ready to jump on a ringing phone at that very moment, which can take even longer if your call is at an odd time for their location.
Instead, what you need is a tool that not only monitors your systems but also intelligently manages the alert process for the quickest results possible. A popular commercial option is OpsGenie, and in this article, we will talk about open source alternatives to this proprietary option.
What we want from OpsGenie
OpsGenie is a paid alerting tool that helps organizations achieve a smart alerting and notification process. In addition to on-call rotation management, OpsGenie currently supports notifications to and from almost all existing systems, paid and free. There are many other reasons that it is nice to have in a DevOps environment that includes large amounts of automation, integration with chatbots, and on-call rotation. The need for technical support during an outage is one of the more important reasons to consider OpsGenie for these benefits.
We will focus on only the essential part of open source alerting tools in our comparison with OpsGenie. In many environments, that involves connecting teams by managing the following:
- Alerts to teams who rely on the service.
- A dashboard to view system status.
- Integrations with chat tools and automated response.
Note from the Editor: at the time of publishing, OpsGenie does have a free offering within certain usage. Visit there site for the most up-to-date details related to their services.
Open source alerting tools
There are open source tools that can do everything OpsGenie does that I believe to be essential for managing monitoring systems.
Cabot
Cabot provides all of the necessary features to get a complete monitoring picture of your infrastructure. Cabot supports alerts through phone, email, SMS, HipChat, and Slack. It is written in Python and mostly uses the Django framework. Cabot is independent from Java and other memory-hungry processes, which makes it a stable choice.
Nagios
Nagios Core is free and open source, but its support and some plugins have a cost. Thankfully, Nagios Core on its own is a great option for infrastructure monitoring and alerting. It supports notifications via email and has a few other options as integrations. It also supports user-defined notification mechanisms. If you have an API that can process alerts and send custom notifications to one or more mediums—such as, Slack, HipChat, SMS, etc.—this tool could be a good fit for you.
ngDesk
ngDesk can handle your on-call rotation, automatically escalate alerts when there is no response, and offers a ticketing tool as well. ngDesk is still working on the complete package, so stay tuned to this up-and-coming project.
Open Distro for Elasticsearch
Open Distro for Elasticsearch is a recent addition to the monitoring and alerting landscape. This project supports almost all chatbots, email, and a variety of other alert mechanisms. A complete, pluggable monitoring and alerting module, Open Distro for Elasticsearch is a combination of many tools. With it, you can view alerts in Kibana, so there’s no need to go use a separate tool, and you can get notified the way you want with supported integrations and receivers. Authentication support has been added to Kibana, Elasticsearch, and the other tools grouped in this combo, free of cost, so you can specify who can have view access and to what in your elastic stack.
OpenDuty
Another alerting tool providing big competition to the paid alternatives is OpenDuty. While still in beta, this project already supports SMS, phone calls, email, Slack, HipChat, and various other paid and open source integrations for sending alerts. Integrations with other alerting tools like Nagios are also supported, along with compatibility with the paid alerting tool PagerDuty, most likely to help people migrate.
Prometheus Alertmanager
Alertmanager has the ability to define alert definitions and then route alerts with specific definitions to easily set up integrations. These integrations can then broadcast alerts to endpoint devices which can be silenced by admins if needed. Regardless of its limitations, Alertmanager is still a very good tool for sending push notifications to chat platforms and cell phones.
Wrapping Up
If budget or using only open source software is a top concern, there are plenty of response team alerting options available. Start by taking a look at the weaknesses in your existing setup and pinpointing where your organization drops the ball on IT issues, leading to them escalating to real problems. Doing so makes it easier to choose which tool, or combination of tools, you should implement to best address these gaps. It is okay to use more than one if it helps you get a complete picture of managing your monitoring infrastructure.
9 Comments