taler-docs

Documentation for GNU Taler components, APIs and protocols
Log | Files | Refs | README | LICENSE

prometheus-alerts.rst (5555B)


      1 ..
      2   This file is part of GNU TALER.
      3   Copyright (C) 2014-2023 Taler Systems SA
      4 
      5   TALER is free software; you can redistribute it and/or modify it under the
      6   terms of the GNU Affero General Public License as published by the Free Software
      7   Foundation; either version 2.1, or (at your option) any later version.
      8 
      9   TALER is distributed in the hope that it will be useful, but WITHOUT ANY
     10   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
     11   A PARTICULAR PURPOSE.  See the GNU Affero General Public License for more details.
     12 
     13   You should have received a copy of the GNU Affero General Public License along with
     14   TALER; see the file COPYING.  If not, see <http://www.gnu.org/licenses/>
     15 
     16   @author Javier Sepulveda
     17 
     18 Prometheus alerts
     19 #################
     20 
     21 The Prometheus alert manager does periodically queries defined in the alert rules files.
     22 In the event that any of these conditions are met, then the alerting system will send a notification (i.e email),
     23 directly to the specified contact points, or towards a specific group of contact points (these are named notification policies).
     24 
     25 Very basic concepts of the Prometheus Alerting system, are explained right below. Please check the `Prometheus official
     26 documentation <https://prometheus.io/docs/alerting/latest/overview/>`_ , or the `Grafana documentation <https://grafana.com/docs/grafana/latest/alerting/fundamentals/>`_, if you need to get some additional information.
     27 
     28 * Alert rules: One or more queries (expresions) to measure (i.e disk space, memory, or cpu usage).
     29 
     30   - Each alert rule contains a condition with a specific threshold.
     31   - Each alert rule can contain a precise contact point to send the notifications to.
     32   - Within the same alert rule, you can specify multiple alert instances.
     33 
     34 * Contact points: This is the message notification itself, in conjunction with the specific address to send the notification to.
     35 
     36 * Notification policies: This feature allows you to gather a group different contact points, under the same label name.
     37 
     38 Install Prometheus alert manager
     39 ================================
     40 
     41 .. code-block:: console
     42 
     43    # apt install prometheus-alertmanager
     44    # systemctl start prometheus-alertmanager
     45    # systemctl status protheus-alertmanager
     46 
     47 Edit the Prometheus configuration file
     48 ======================================
     49 
     50 To make Prometheus talk with the alerting system, you need to
     51 speficy this, on the main prometheus configuration file.
     52 
     53 .. code-block:: yaml
     54 
     55    # Path: /etc/prometheus/prometheus.yml
     56    # Add this at the end of yml file
     57    # Alertmanager configuration
     58    alerting:
     59      alertmanagers:
     60        - static_configs:
     61            - targets: ["localhost:9093"]
     62 
     63 Alert rules configuration file
     64 ==============================
     65 
     66 - Create your very first first_rule.yml file.
     67 
     68   .. note::
     69 
     70      The code shown below, is just an example for CPU, disk and memory usage.
     71 
     72 .. code-block:: yaml
     73 
     74    # Path: /etc/prometheus/alert_rules.yml
     75    groups:
     76    - name: node_exporter_alerts
     77      rules:
     78      - alert: HighCPULatency
     79      expr: sum(rate(node_cpu_seconds_total{mode="system"}[1m])) / count(node_cpu_seconds_total{mode="system"}) * 100 > 80
     80      for: 1m
     81      labels:
     82       severity: warning
     83      annotations:
     84        summary: "High CPU Latency detected"
     85        description: "CPU latency is above 80% for more than 1 minute."
     86 
     87    - alert: LowDiskSpace
     88      expr: (node_filesystem_free_bytes / node_filesystem_size_bytes) * 100 < 10
     89      for: 1m
     90      labels:
     91        severity: critical
     92      annotations:
     93        summary: "Low Disk Space detected"
     94        description: "Disk space is below 10% for more than 1 minute."
     95 
     96    - alert: HighMemoryUsage
     97      expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80
     98      for: 1m
     99      labels:
    100        severity: warning
    101      annotations:
    102        summary: "High Memory Usage detected"
    103        description: "Memory usage is above 80% for more than 1 minute."
    104 
    105 
    106 Configure SMTP
    107 ==============
    108 
    109 .. code-block:: yaml
    110 
    111    # Path: /etc/prometheus/alertmanager.yml
    112 
    113    global:
    114      smtp_smarthost: 'smtp.example.com:587'
    115      smtp_from: 'alertmanager@example.com'
    116      smtp_auth_username: 'yourusername'
    117      smtp_auth_password: 'yourpassword'
    118 
    119    route:
    120      receiver: 'email'
    121 
    122    receivers:
    123      - name: 'email'
    124        email_configs:
    125          - to: 'recipient@example.com'
    126            send_resolved: true
    127 
    128 
    129 Add your alert rules to Prometheus
    130 ==================================
    131 
    132 .. code-block:: yaml
    133 
    134    #Path: /etc/prometheus/prometheus.yml
    135    # Add here your alert_rules.yml files
    136    rule_files:
    137       - "first_rule.yml"
    138       - # "second_rule.yml"
    139 
    140 
    141 Edit the alertmanager systemd service file
    142 ============================================
    143 
    144 .. code-block:: systemd
    145 
    146    # Path: /usr/lib/systemd/system/prometheus-alertmanager.service
    147 
    148    [Unit]
    149    Description=Alertmanager for prometheus
    150    Documentation=https://prometheus.io/docs/alerting/alertmanager/
    151 
    152    [Service]
    153    Restart=on-failure
    154    User=prometheus
    155    EnvironmentFile=/etc/default/prometheus-alertmanager
    156    ExecStart=/usr/bin/prometheus-alertmanager \
    157     --cluster.advertise-address="ip:9093" # Add this, as otherwise it won't work
    158    ExecReload=/bin/kill -HUP $MAINPID
    159    TimeoutStopSec=20s
    160    SendSIGKILL=no
    161 
    162    [Install]
    163    WantedBy=multi-user.target
    164 
    165 
    166 .. code-block:: console
    167 
    168    # systemctl daemon-reload
    169    # systemctl restart prometheus-alertmanager
    170    # systemctl restart prometheus
    171 
    172 Check
    173 =====
    174 
    175 You can check both your rules (http://ip:9090/rules) and alerts (http://ip:9090/alerts), from your web browser.
    176