Alert Manager Setup
The following is a guide outlining the steps to setup AlertManager to send alerts when a Tangle node or DKG is being disrupted. If you do not have Tangle node setup yet, please review the Tangle Node Quickstart setup guide here.
In this guide we will configure the following modules to send alerts from a running Tangle node.
- Alert Manager listens to Prometheus metrics and pushes an alert as soon as a threshold is crossed (CPU % usage for example).
What is Alert Manager?
The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts. To learn more about Alertmanager, please visit the official docs site here (opens in a new tab).
Getting Started
Start by downloading the latest releases of the AlertManager (opens in a new tab).
This guide assumes the user has root access to the machine running the Tangle node, and following the below steps inside that machine. As well as, the user has already configured Prometheus on this machine.
1. Download Alertmanager
AMD version:
wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.darwin-amd64.tar.gz
ARM version:
wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.darwin-arm64.tar.gz
2. Extract the Downloaded Files:
Run the following command:
tar xvf alertmanager-*.tar.gz
3. Copy the Extracted Files into /usr/local/bin
:
Note: The example below makes use of the linux-amd64
installations, please update to make use of the target system you have installed.
Copy the alertmanager
binary and amtool
:
sudo cp ./alertmanager-*.linux-amd64/alertmanager /usr/local/bin/ &&
sudo cp ./alertmanager-*.linux-amd64/amtool /usr/local/bin/
4. Create Dedicated Users:
Now we want to create dedicated users for the Alertmanager module we have installed:
sudo useradd --no-create-home --shell /usr/sbin/nologin alertmanager
5. Create Directories for Alertmanager
:
sudo mkdir /etc/alertmanager &&
sudo mkdir /var/lib/alertmanager
6. Change the Ownership for all Directories:
We need to give our user permissions to access these directories:
alertManager:
sudo chown alertmanager:alertmanager /etc/alertmanager/ -R &&
sudo chown alertmanager:alertmanager /var/lib/alertmanager/ -R &&
sudo chown alertmanager:alertmanager /usr/local/bin/alertmanager &&
sudo chown alertmanager:alertmanager /usr/local/bin/amtool
7. Finally, let's clean up these directories:
rm -rf ./alertmanager*
Great! You have now installed and setup your environment. The next series of steps will be configuring the service.
Configuration
For implementation examples, refer to our GitHub. (opens in a new tab).
Prometheus
The first thing we need to do is add rules.yml
file to our Prometheus configuration:
Let's create the rules.yml
file that will give the rules for Alert manager:
sudo touch /etc/prometheus/rules.yml
sudo nano /etc/prometheus/rules.yml
We are going to create 2 basic rules that will trigger an alert in case the instance is down or the CPU usage crosses 80%. You can create all kinds of rules that can triggered, refer to our full list. (opens in a new tab).
Add the following lines and save the file:
groups:
- name: alert_rules
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Instance $labels.instance down"
description: "[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 1 minute."
- alert: HostHighCpuLoad
expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
for: 0m
labels:
severity: warning
annotations:
summary: Host high CPU load (instance bLd Kusama)
description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"
The criteria for triggering an alert are set in the expr:
part. You can customize these triggers as you see fit.
Then, check the rules file:
promtool check rules /etc/prometheus/rules.yml
And finally, check the Prometheus config file:
promtool check config /etc/prometheus/prometheus.yml
Gmail setup
We can use a Gmail address to send the alert emails. For that, we will need to generate an app password from our Gmail account.
Note: we recommend you here to use a dedicated email address for your alerts. Review Google's own guide for proper set-up (opens in a new tab).
Slack notifications
We can also utilize Slack notifications to send the alerts through. For that we need to a specific Slack channel to send the notifications to, and to install Incoming WebHooks Slack application.
To do so, navigate to:
- Administration > Manage Apps.
- Search for "Incoming Webhooks"
- Install into your Slack workspace.
Alertmanager
The Alert manager config file is used to set the external service that will be called when an alert is triggered. Here, we are going to use the Gmail and Slack notification created previously.
Let's create the file:
sudo touch /etc/alertmanager/alertmanager.yml
sudo nano /etc/alertmanager/alertmanager.yml
And add the Gmail configuration to it and save the file:
global:
resolve_timeout: 1m
route:
receiver: 'gmail-notifications'
receivers:
- name: 'gmail-notifications'
email_configs:
- to: 'EMAIL-ADDRESS'
from: 'EMAIL-ADDRESS'
smarthost: 'smtp.gmail.com:587'
auth_username: 'EMAIL-ADDRESS'
auth_identity: 'EMAIL-ADDRESS'
auth_password: 'EMAIL-ADDRESS'
send_resolved: true
# ********************************************************************************************************************************************
# Alert Manager for Slack Notifications *
# ********************************************************************************************************************************************
global:
resolve_timeout: 1m
slack_api_url: 'INSERT SLACK API URL'
route:
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: 'channel-name'
send_resolved: true
icon_url: https://avatars3.githubusercontent.com/u/3380462
title: |-
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }} for {{ .CommonLabels.job }}
{{- if gt (len .CommonLabels) (len .GroupLabels) -}}
{{" "}}(
{{- with .CommonLabels.Remove .GroupLabels.Names }}
{{- range $index, $label := .SortedPairs -}}
{{ if $index }}, {{ end }}
{{- $label.Name }}="{{ $label.Value -}}"
{{- end }}
{{- end -}}
)
{{- end }}
text: >-
{{ range .Alerts -}}
*Alert:* {{ .Annotations.title }}{{ if .Labels.severity }} - `{{ .Labels.severity }}`{{ end }}
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
Of course, you have to change the email addresses and the auth_password with the one generated from Google previously.
Service Setup
Alert manager
Create and open the Alert manager service file:
sudo tee /etc/systemd/system/alertmanager.service > /dev/null << EOF
[Unit]
Description=AlertManager Server Service
Wants=network-online.target
After=network-online.target
[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file /etc/alertmanager/alertmanager.yml \
--storage.path /var/lib/alertmanager \
--web.external-url=http://localhost:9093 \
--cluster.advertise-address='0.0.0.0:9093'
[Install]
WantedBy=multi-user.target
EOF
Starting the Services
Launch a daemon reload to take the services into account in systemd:
sudo systemctl daemon-reload
Next, we will want to start the alertManager service:
alertManager:
sudo systemctl start alertmanager.service
And check that they are working fine:
alertManager::
sudo systemctl status alertmanager.service
If everything is working adequately, activate the services!
alertManager:
sudo systemctl enable alertmanager.service
Amazing! We have now successfully added alert monitoring for our Tangle node!