prometheus-alerts.rst (5610B)
1 .. 2 This file is part of GNU TALER. 3 Copyright (C) 2014-2023 Taler Systems SA 4 5 TALER is free software; you can redistribute it and/or modify it under the 6 terms of the GNU Affero General Public License as published by the Free Software 7 Foundation; either version 2.1, or (at your option) any later version. 8 9 TALER is distributed in the hope that it will be useful, but WITHOUT ANY 10 WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR 11 A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. 12 13 You should have received a copy of the GNU Affero General Public License along with 14 TALER; see the file COPYING. If not, see <http://www.gnu.org/licenses/> 15 16 @author Javier Sepulveda 17 18 Prometheus alerts 19 ################# 20 21 .. contents:: Table of Contents 22 :depth: 1 23 :local: 24 25 The Prometheus alert manager does periodically queries defined in the alert rules files. 26 In the event that any of these conditions are met, then the alerting system will send a notification (i.e email), 27 directly to the specified contact points, or towards a specific group of contact points (these are named notification policies). 28 29 Very basic concepts of the Prometheus Alerting system, are explained right below. Please check the `Prometheus official 30 documentation <https://prometheus.io/docs/alerting/latest/overview/>`_ , or the `Grafana documentation <https://grafana.com/docs/grafana/latest/alerting/fundamentals/>`_, if you need to get some additional information. 31 32 * Alert rules: One or more queries (expresions) to measure (i.e disk space, memory, or cpu usage). 33 34 - Each alert rule contains a condition with a specific threshold. 35 - Each alert rule can contain a precise contact point to send the notifications to. 36 - Within the same alert rule, you can specify multiple alert instances. 37 38 * Contact points: This is the message notification itself, in conjunction with the specific address to send the notification to. 39 40 * Notification policies: This feature allows you to gather a group different contact points, under the same label name. 41 42 Install Prometheus alert manager 43 ================================ 44 45 .. code-block:: console 46 47 # apt install prometheus-alertmanager 48 # systemctl start prometheus-alertmanager 49 # systemctl status protheus-alertmanager 50 51 Edit the Prometheus configuration file 52 ====================================== 53 54 To make Prometheus talk with the alerting system, you need to 55 speficy this, on the main prometheus configuration file. 56 57 .. code-block:: yaml 58 59 # Path: /etc/prometheus/prometheus.yml 60 # Add this at the end of yml file 61 # Alertmanager configuration 62 alerting: 63 alertmanagers: 64 - static_configs: 65 - targets: ["localhost:9093"] 66 67 Alert rules configuration file 68 ============================== 69 70 - Create your very first first_rule.yml file. 71 72 .. note:: 73 74 The code shown below, is just an example for CPU, disk and memory usage. 75 76 .. code-block:: yaml 77 78 # Path: /etc/prometheus/alert_rules.yml 79 groups: 80 - name: node_exporter_alerts 81 rules: 82 - alert: HighCPULatency 83 expr: sum(rate(node_cpu_seconds_total{mode="system"}[1m])) / count(node_cpu_seconds_total{mode="system"}) * 100 > 80 84 for: 1m 85 labels: 86 severity: warning 87 annotations: 88 summary: "High CPU Latency detected" 89 description: "CPU latency is above 80% for more than 1 minute." 90 91 - alert: LowDiskSpace 92 expr: (node_filesystem_free_bytes / node_filesystem_size_bytes) * 100 < 10 93 for: 1m 94 labels: 95 severity: critical 96 annotations: 97 summary: "Low Disk Space detected" 98 description: "Disk space is below 10% for more than 1 minute." 99 100 - alert: HighMemoryUsage 101 expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80 102 for: 1m 103 labels: 104 severity: warning 105 annotations: 106 summary: "High Memory Usage detected" 107 description: "Memory usage is above 80% for more than 1 minute." 108 109 110 Configure SMTP 111 ============== 112 113 .. code-block:: yaml 114 115 # Path: /etc/prometheus/alertmanager.yml 116 117 global: 118 smtp_smarthost: 'smtp.example.com:587' 119 smtp_from: 'alertmanager@example.com' 120 smtp_auth_username: 'yourusername' 121 smtp_auth_password: 'yourpassword' 122 123 route: 124 receiver: 'email' 125 126 receivers: 127 - name: 'email' 128 email_configs: 129 - to: 'recipient@example.com' 130 send_resolved: true 131 132 133 Add your alert rules to Prometheus 134 ================================== 135 136 .. code-block:: yaml 137 138 #Path: /etc/prometheus/prometheus.yml 139 # Add here your alert_rules.yml files 140 rule_files: 141 - "first_rule.yml" 142 - # "second_rule.yml" 143 144 145 Edit the alertmanager systemd service file 146 ============================================ 147 148 .. code-block:: systemd 149 150 # Path: /usr/lib/systemd/system/prometheus-alertmanager.service 151 152 [Unit] 153 Description=Alertmanager for prometheus 154 Documentation=https://prometheus.io/docs/alerting/alertmanager/ 155 156 [Service] 157 Restart=on-failure 158 User=prometheus 159 EnvironmentFile=/etc/default/prometheus-alertmanager 160 ExecStart=/usr/bin/prometheus-alertmanager \ 161 --cluster.advertise-address="ip:9093" # Add this, as otherwise it won't work 162 ExecReload=/bin/kill -HUP $MAINPID 163 TimeoutStopSec=20s 164 SendSIGKILL=no 165 166 [Install] 167 WantedBy=multi-user.target 168 169 170 .. code-block:: console 171 172 # systemctl daemon-reload 173 # systemctl restart prometheus-alertmanager 174 # systemctl restart prometheus 175 176 Check 177 ===== 178 179 You can check both your rules (http://ip:9090/rules) and alerts (http://ip:9090/alerts), from your web browser. 180