taler-docs

Documentation for GNU Taler components, APIs and protocols
Log | Files | Refs | README | LICENSE

prometheus-alerts.rst (5610B)


      1 ..
      2   This file is part of GNU TALER.
      3   Copyright (C) 2014-2023 Taler Systems SA
      4 
      5   TALER is free software; you can redistribute it and/or modify it under the
      6   terms of the GNU Affero General Public License as published by the Free Software
      7   Foundation; either version 2.1, or (at your option) any later version.
      8 
      9   TALER is distributed in the hope that it will be useful, but WITHOUT ANY
     10   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
     11   A PARTICULAR PURPOSE.  See the GNU Affero General Public License for more details.
     12 
     13   You should have received a copy of the GNU Affero General Public License along with
     14   TALER; see the file COPYING.  If not, see <http://www.gnu.org/licenses/>
     15 
     16   @author Javier Sepulveda
     17 
     18 Prometheus alerts
     19 #################
     20 
     21 .. contents:: Table of Contents
     22   :depth: 1
     23   :local:
     24 
     25 The Prometheus alert manager does periodically queries defined in the alert rules files.
     26 In the event that any of these conditions are met, then the alerting system will send a notification (i.e email),
     27 directly to the specified contact points, or towards a specific group of contact points (these are named notification policies).
     28 
     29 Very basic concepts of the Prometheus Alerting system, are explained right below. Please check the `Prometheus official
     30 documentation <https://prometheus.io/docs/alerting/latest/overview/>`_ , or the `Grafana documentation <https://grafana.com/docs/grafana/latest/alerting/fundamentals/>`_, if you need to get some additional information.
     31 
     32 * Alert rules: One or more queries (expresions) to measure (i.e disk space, memory, or cpu usage).
     33 
     34   - Each alert rule contains a condition with a specific threshold.
     35   - Each alert rule can contain a precise contact point to send the notifications to.
     36   - Within the same alert rule, you can specify multiple alert instances.
     37 
     38 * Contact points: This is the message notification itself, in conjunction with the specific address to send the notification to.
     39 
     40 * Notification policies: This feature allows you to gather a group different contact points, under the same label name.
     41 
     42 Install Prometheus alert manager
     43 ================================
     44 
     45 .. code-block:: console
     46 
     47    # apt install prometheus-alertmanager
     48    # systemctl start prometheus-alertmanager
     49    # systemctl status protheus-alertmanager
     50 
     51 Edit the Prometheus configuration file
     52 ======================================
     53 
     54 To make Prometheus talk with the alerting system, you need to
     55 speficy this, on the main prometheus configuration file.
     56 
     57 .. code-block:: yaml
     58 
     59    # Path: /etc/prometheus/prometheus.yml
     60    # Add this at the end of yml file
     61    # Alertmanager configuration
     62    alerting:
     63      alertmanagers:
     64        - static_configs:
     65            - targets: ["localhost:9093"]
     66 
     67 Alert rules configuration file
     68 ==============================
     69 
     70 - Create your very first first_rule.yml file.
     71 
     72   .. note::
     73 
     74      The code shown below, is just an example for CPU, disk and memory usage.
     75 
     76 .. code-block:: yaml
     77 
     78    # Path: /etc/prometheus/alert_rules.yml
     79    groups:
     80    - name: node_exporter_alerts
     81      rules:
     82      - alert: HighCPULatency
     83      expr: sum(rate(node_cpu_seconds_total{mode="system"}[1m])) / count(node_cpu_seconds_total{mode="system"}) * 100 > 80
     84      for: 1m
     85      labels:
     86       severity: warning
     87      annotations:
     88        summary: "High CPU Latency detected"
     89        description: "CPU latency is above 80% for more than 1 minute."
     90 
     91    - alert: LowDiskSpace
     92      expr: (node_filesystem_free_bytes / node_filesystem_size_bytes) * 100 < 10
     93      for: 1m
     94      labels:
     95        severity: critical
     96      annotations:
     97        summary: "Low Disk Space detected"
     98        description: "Disk space is below 10% for more than 1 minute."
     99 
    100    - alert: HighMemoryUsage
    101      expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80
    102      for: 1m
    103      labels:
    104        severity: warning
    105      annotations:
    106        summary: "High Memory Usage detected"
    107        description: "Memory usage is above 80% for more than 1 minute."
    108 
    109 
    110 Configure SMTP
    111 ==============
    112 
    113 .. code-block:: yaml
    114 
    115    # Path: /etc/prometheus/alertmanager.yml
    116 
    117    global:
    118      smtp_smarthost: 'smtp.example.com:587'
    119      smtp_from: 'alertmanager@example.com'
    120      smtp_auth_username: 'yourusername'
    121      smtp_auth_password: 'yourpassword'
    122 
    123    route:
    124      receiver: 'email'
    125 
    126    receivers:
    127      - name: 'email'
    128        email_configs:
    129          - to: 'recipient@example.com'
    130            send_resolved: true
    131 
    132 
    133 Add your alert rules to Prometheus
    134 ==================================
    135 
    136 .. code-block:: yaml
    137 
    138    #Path: /etc/prometheus/prometheus.yml
    139    # Add here your alert_rules.yml files
    140    rule_files:
    141       - "first_rule.yml"
    142       - # "second_rule.yml"
    143 
    144 
    145 Edit the alertmanager systemd service file
    146 ============================================
    147 
    148 .. code-block:: systemd
    149 
    150    # Path: /usr/lib/systemd/system/prometheus-alertmanager.service
    151 
    152    [Unit]
    153    Description=Alertmanager for prometheus
    154    Documentation=https://prometheus.io/docs/alerting/alertmanager/
    155 
    156    [Service]
    157    Restart=on-failure
    158    User=prometheus
    159    EnvironmentFile=/etc/default/prometheus-alertmanager
    160    ExecStart=/usr/bin/prometheus-alertmanager \
    161     --cluster.advertise-address="ip:9093" # Add this, as otherwise it won't work
    162    ExecReload=/bin/kill -HUP $MAINPID
    163    TimeoutStopSec=20s
    164    SendSIGKILL=no
    165 
    166    [Install]
    167    WantedBy=multi-user.target
    168 
    169 
    170 .. code-block:: console
    171 
    172    # systemctl daemon-reload
    173    # systemctl restart prometheus-alertmanager
    174    # systemctl restart prometheus
    175 
    176 Check
    177 =====
    178 
    179 You can check both your rules (http://ip:9090/rules) and alerts (http://ip:9090/alerts), from your web browser.
    180