prometheus-alerts.rst (5555B)
1 .. 2 This file is part of GNU TALER. 3 Copyright (C) 2014-2023 Taler Systems SA 4 5 TALER is free software; you can redistribute it and/or modify it under the 6 terms of the GNU Affero General Public License as published by the Free Software 7 Foundation; either version 2.1, or (at your option) any later version. 8 9 TALER is distributed in the hope that it will be useful, but WITHOUT ANY 10 WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR 11 A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. 12 13 You should have received a copy of the GNU Affero General Public License along with 14 TALER; see the file COPYING. If not, see <http://www.gnu.org/licenses/> 15 16 @author Javier Sepulveda 17 18 Prometheus alerts 19 ################# 20 21 The Prometheus alert manager does periodically queries defined in the alert rules files. 22 In the event that any of these conditions are met, then the alerting system will send a notification (i.e email), 23 directly to the specified contact points, or towards a specific group of contact points (these are named notification policies). 24 25 Very basic concepts of the Prometheus Alerting system, are explained right below. Please check the `Prometheus official 26 documentation <https://prometheus.io/docs/alerting/latest/overview/>`_ , or the `Grafana documentation <https://grafana.com/docs/grafana/latest/alerting/fundamentals/>`_, if you need to get some additional information. 27 28 * Alert rules: One or more queries (expresions) to measure (i.e disk space, memory, or cpu usage). 29 30 - Each alert rule contains a condition with a specific threshold. 31 - Each alert rule can contain a precise contact point to send the notifications to. 32 - Within the same alert rule, you can specify multiple alert instances. 33 34 * Contact points: This is the message notification itself, in conjunction with the specific address to send the notification to. 35 36 * Notification policies: This feature allows you to gather a group different contact points, under the same label name. 37 38 Install Prometheus alert manager 39 ================================ 40 41 .. code-block:: console 42 43 # apt install prometheus-alertmanager 44 # systemctl start prometheus-alertmanager 45 # systemctl status protheus-alertmanager 46 47 Edit the Prometheus configuration file 48 ====================================== 49 50 To make Prometheus talk with the alerting system, you need to 51 speficy this, on the main prometheus configuration file. 52 53 .. code-block:: yaml 54 55 # Path: /etc/prometheus/prometheus.yml 56 # Add this at the end of yml file 57 # Alertmanager configuration 58 alerting: 59 alertmanagers: 60 - static_configs: 61 - targets: ["localhost:9093"] 62 63 Alert rules configuration file 64 ============================== 65 66 - Create your very first first_rule.yml file. 67 68 .. note:: 69 70 The code shown below, is just an example for CPU, disk and memory usage. 71 72 .. code-block:: yaml 73 74 # Path: /etc/prometheus/alert_rules.yml 75 groups: 76 - name: node_exporter_alerts 77 rules: 78 - alert: HighCPULatency 79 expr: sum(rate(node_cpu_seconds_total{mode="system"}[1m])) / count(node_cpu_seconds_total{mode="system"}) * 100 > 80 80 for: 1m 81 labels: 82 severity: warning 83 annotations: 84 summary: "High CPU Latency detected" 85 description: "CPU latency is above 80% for more than 1 minute." 86 87 - alert: LowDiskSpace 88 expr: (node_filesystem_free_bytes / node_filesystem_size_bytes) * 100 < 10 89 for: 1m 90 labels: 91 severity: critical 92 annotations: 93 summary: "Low Disk Space detected" 94 description: "Disk space is below 10% for more than 1 minute." 95 96 - alert: HighMemoryUsage 97 expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80 98 for: 1m 99 labels: 100 severity: warning 101 annotations: 102 summary: "High Memory Usage detected" 103 description: "Memory usage is above 80% for more than 1 minute." 104 105 106 Configure SMTP 107 ============== 108 109 .. code-block:: yaml 110 111 # Path: /etc/prometheus/alertmanager.yml 112 113 global: 114 smtp_smarthost: 'smtp.example.com:587' 115 smtp_from: 'alertmanager@example.com' 116 smtp_auth_username: 'yourusername' 117 smtp_auth_password: 'yourpassword' 118 119 route: 120 receiver: 'email' 121 122 receivers: 123 - name: 'email' 124 email_configs: 125 - to: 'recipient@example.com' 126 send_resolved: true 127 128 129 Add your alert rules to Prometheus 130 ================================== 131 132 .. code-block:: yaml 133 134 #Path: /etc/prometheus/prometheus.yml 135 # Add here your alert_rules.yml files 136 rule_files: 137 - "first_rule.yml" 138 - # "second_rule.yml" 139 140 141 Edit the alertmanager systemd service file 142 ============================================ 143 144 .. code-block:: systemd 145 146 # Path: /usr/lib/systemd/system/prometheus-alertmanager.service 147 148 [Unit] 149 Description=Alertmanager for prometheus 150 Documentation=https://prometheus.io/docs/alerting/alertmanager/ 151 152 [Service] 153 Restart=on-failure 154 User=prometheus 155 EnvironmentFile=/etc/default/prometheus-alertmanager 156 ExecStart=/usr/bin/prometheus-alertmanager \ 157 --cluster.advertise-address="ip:9093" # Add this, as otherwise it won't work 158 ExecReload=/bin/kill -HUP $MAINPID 159 TimeoutStopSec=20s 160 SendSIGKILL=no 161 162 [Install] 163 WantedBy=multi-user.target 164 165 166 .. code-block:: console 167 168 # systemctl daemon-reload 169 # systemctl restart prometheus-alertmanager 170 # systemctl restart prometheus 171 172 Check 173 ===== 174 175 You can check both your rules (http://ip:9090/rules) and alerts (http://ip:9090/alerts), from your web browser. 176