taler-docs

Documentation for GNU Taler components, APIs and protocols
Log | Files | Refs | README | LICENSE

commit b050c2ee73ea87e73284d41fafb338b0def5ff77
parent 3ab0aa73532d859bf495d6c98d68b4cba23a94cb
Author: Javier Sepulveda <javier.sepulveda@uv.es>
Date:   Thu, 11 Jul 2024 10:26:09 +0200

Grafana backup,promtail,loki proxy,nginx proxy,prometheus alerts

Diffstat:
Asystem-administration/grafana-backup.rst | 61+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msystem-administration/grafana-loki.rst | 79++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
Msystem-administration/grafana-promtail.rst | 26+++++++++++++++++++++-----
Msystem-administration/index.rst | 2++
Msystem-administration/nginx-prometheus-exporter.rst | 38+++++++++++++++++++++++++++++++++++++-
Asystem-administration/prometheus-alerts.rst | 180+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 379 insertions(+), 7 deletions(-)

diff --git a/system-administration/grafana-backup.rst b/system-administration/grafana-backup.rst @@ -0,0 +1,61 @@ +.. + This file is part of GNU TALER. + Copyright (C) 2014-2023 Taler Systems SA + + TALER is free software; you can redistribute it and/or modify it under the + terms of the GNU Affero General Public License as published by the Free Software + Foundation; either version 2.1, or (at your option) any later version. + + TALER is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR + A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. + + You should have received a copy of the GNU Affero General Public License along with + TALER; see the file COPYING. If not, see <http://www.gnu.org/licenses/> + + @author Javier Sepulveda + +Installation of Prometheus +########################## + +.. contents:: Table of Contents + :depth: 1 + :local: + +Backup grafana configuration file +================================= + +.. code-block:: console + + # mkdir grafana-backup + # cp /etc/grafana/grafana.ini grafana-backup + +Backup Grafana plugin files +=========================== + +In our case yet we don't use any Grafana plugins, but the plugin +folder is include in our backup script. + +.. code-block:: console + + # mkdir -p grafana-backups/plugins + #cp /var/lib/grafana/plugins grafana-backup/plugins + +Backup the database +================== + +- For grafana.taler.net, we are currently using SQL Lite. + +.. code-block:: console + + # cp /var/lib/grafana/grafana.db /root/grafana-backup/grafana.db-june-27 + + +Backup all files with Borgbackup +=============================== + +For undertaking our backups we use Borkbackup. +All the above instructions are included into our databases backup script with Borg. + +* `More information <https://grafana.com/docs/grafana/latest/administration/back-up-grafana/>`_ + diff --git a/system-administration/grafana-loki.rst b/system-administration/grafana-loki.rst @@ -20,6 +20,8 @@ Grafana Loki ############ Loki is an aggregation system really similar to Prometheus, but instead of reading metrics, it reads logs (via push). +Please check the `official documentation website <https://grafana.com/docs/loki/latest/get-started/>`_ for additional information. + .. contents:: Table of Contents :depth: 1 @@ -125,7 +127,82 @@ Refresh systemd and restart Check ----- -http://147.87.255.218:3100 +http://ip:3100 + +Close the 3100 port with Nginx +============================== + +.. code-block:: nginx + + # Path: /etc/nginx/sites-available/loki.conf + + upstream loki { + server 127.0.0.1:3100; # Loopback + keepalive 15; + } + + server { + listen 80; + listen [::]:80; + + server_name loki.taler-ops.ch; + root /dev/null; + + # LOKI + + location / { + proxy_read_timeout 1800s; + proxy_connect_timeout 1600s; + proxy_pass http://loki; + } + + location /ready { + proxy_pass http://loki; + proxy_http_version 1.1; + proxy_set_header Connection "Keep-Alive"; + proxy_set_header Proxy-Connection "Keep-Alive"; + proxy_redirect off; + auth_basic "off"; + } + + } + +- Enable in Nginx the new virtualhost file + +.. code-block:: console + + # ln -s /etc/nginx/sites-available/loki.conf /etc/nginx/sites-enabled/loki.conf + # nginx -t + # systemctl reload nginx + +Edit the loki configuration file +================================ + +.. code-block:: yaml + + #Path: /etc/loki/config.yml + server: + http_listen_port: 3100 + grpc_listen_port: 9096 + # Add this to close the 3100 port + http_listen_address: 127.0.0.1 + +Refresh systemd and restart +=========================== + +.. code-block:: console + + # systemctl restart nginx + # systemctl restart loki + # systemctl restart prometheus + +Check +----- + +Check that the 3100 port is publicly closed by typing in your Web browser http://ip:3100. At +the same time change through the Grafana control panel, your loki data source server URL field, +so it can connect now through the specified subdomain in the nginx virtualhost file (in our test case +loki.taler-ops.ch). Grafana control panel ===================== diff --git a/system-administration/grafana-promtail.rst b/system-administration/grafana-promtail.rst @@ -18,7 +18,9 @@ Grafana Promtail ################ -Promtail can read two different types of logs. A regular log file, or information from a systemd journal unit. +Official documentation website can be found `here <https://grafana.com/docs/loki/latest/send-data/promtail/>`_. + +Promtail can read two different types of logs. Regular log files, or information from the Systemd journald. .. contents:: Table of Contents :depth: 1 @@ -66,6 +68,8 @@ Promtail configuration file clients: - url: http://localhost:3100/loki/api/v1/push + # Example for log file + scrape_configs: - job_name: system static_configs: @@ -73,8 +77,20 @@ Promtail configuration file - localhost labels: job: nginx - __path__: /var/log/nginx/*log + __path__: /var/log/nginx/*log # List here your log files + # Example for Systemd journald + + scrape_configs: + - job_name: journal + journal: + json: true + path: /var/log/journal + labels: + job: systemd-journal + relabel_configs: + - source_labels: ['__journal__systemd_unit'] + target_label: 'unit' Promtail systemd service file ============================= @@ -121,8 +137,8 @@ Promtail temporary files Grafana control panel ===================== -To check if Promtail is reading properly either your log files or the journald units, you can go to the "Explore" section in the Grafana -control panel, choose the right Loki connector, choose your desired log file or journald unit, and execute the query. -If that works, you can convert this temporary query into a real Grafana "dashboard", to continue working afterwards with additional filtering options. +To check if Promtail is reading properly either your log files or the systemd journald units, you can click on the "Explore" section in the Grafana control panel, choose the right Loki connector, choose your desired log file or journald unit, and execute the query. +If you see this is working (you can see the chunks of the big log files), +you can convert this temporary query into a real Grafana "dashboard", to continue working later with additional filtering options. diff --git a/system-administration/index.rst b/system-administration/index.rst @@ -27,8 +27,10 @@ System Administration tutorials taler-monitoring-infrastructure borgbackup-tutorial prometheus + prometheus alerts nginx-prometheus-exporter prometheus-node-exporter prometheus-postgres-exporter grafana-loki grafana-promtail + grafana-backup diff --git a/system-administration/nginx-prometheus-exporter.rst b/system-administration/nginx-prometheus-exporter.rst @@ -140,6 +140,42 @@ Restart everything # systemctl restart prometheus.service +Close the 9113 port with Nginx +============================== + +* Change the nginx-exporter.socket file + +.. code-block:: console + + # Path: /etc/systemd/system/nginx-exporter.socket + + [Unit] + Description=NGINX Prometheus Exporter + + # Add here the loopback ip address + [Socket] + ListenStream=127.0.0.1:9113 + + [Install] + WantedBy=sockets.target + + +Add a new server block to Nginx +=============================== + +.. code-block:: nginx + + server { + listen 80; + listen [::]:80; + + server_name ip_address | subdomain.ltd; + location /nginx-exporter { + proxy_pass http://localhost:9113; + } + + } + Create a new dashboard in Grafana ================================= @@ -147,4 +183,4 @@ You can now go to the `Grafana dashboards <a href="https://grafana.taler.net/das add the new dashboard for the Nginx Exporter program. Please make sure you choose the right Prometheus data source. * Dashboard Id: 11199 -* Dashboard URL: https://grafana.com/grafana/dashboards/11199-nginx/ +* Dashboard URL: https://grafana.com/grafana/dashboards/11199-nginx/ diff --git a/system-administration/prometheus-alerts.rst b/system-administration/prometheus-alerts.rst @@ -0,0 +1,180 @@ +.. + This file is part of GNU TALER. + Copyright (C) 2014-2023 Taler Systems SA + + TALER is free software; you can redistribute it and/or modify it under the + terms of the GNU Affero General Public License as published by the Free Software + Foundation; either version 2.1, or (at your option) any later version. + + TALER is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR + A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. + + You should have received a copy of the GNU Affero General Public License along with + TALER; see the file COPYING. If not, see <http://www.gnu.org/licenses/> + + @author Javier Sepulveda + +Grafana Alerting +################ + +.. contents:: Table of Contents + :depth: 1 + :local: + +The Prometheus alert manager does periodically queries defined in the alert rules files. +In the event that any of these conditions are met, then the alerting system will send a notification (i.e email), +directly to the specified contact points, or towards a specific group of contact points (these are named notification policies). + +Very basic concepts of the Prometheus Alerting system, are explained right below. Please check the `Prometheus official +documentation <https://prometheus.io/docs/alerting/latest/overview/>`_ , or the `Grafana documentation <https://grafana.com/docs/grafana/latest/alerting/fundamentals/>`_, if you need to get some additional information. + +* Alert rules: One or more queries (expresions) to measure (i.e disk space, memory, or cpu usage). + + - Each alert rule contains a condition with a specific threshold. + - Each alert rule can contain a precise contact point to send the notifications to. + - Within the same alert rule, you can specify multiple alert instances. + +* Contact points: This is the message notification itself, in conjunction with the specific address to send the notification to. + +* Notification policies: This feature allows you to gather a group different contact points, under the same label name. + +Install Prometheus alert manager +================================ + +.. code-block:: console + + # apt install prometheus-alertmanager + # systemctl start prometheus-alertmanager + # systemctl status protheus-alertmanager + +Edit the Prometheus configuration file +====================================== + +To make Prometheus talk with the alerting system, you need to +speficy this, on the main prometheus configuration file. + +.. code-block:: yaml + + # Path: /etc/prometheus/prometheus.yml + # Add this at the end of yml file + # Alertmanager configuration + alerting: + alertmanagers: + - static_configs: + - targets: ["localhost:9093"] + +Alert rules configuration file +============================== + +- Create your very first first_rule.yml file. + + .. note:: + + The code shown below, is just an example for CPU, disk and memory usage. + +.. code-block:: yaml + + # Path: /etc/prometheus/alert_rules.yml + groups: + - name: node_exporter_alerts + rules: + - alert: HighCPULatency + expr: sum(rate(node_cpu_seconds_total{mode="system"}[1m])) / count(node_cpu_seconds_total{mode="system"}) * 100 > 80 + for: 1m + labels: + severity: warning + annotations: + summary: "High CPU Latency detected" + description: "CPU latency is above 80% for more than 1 minute." + + - alert: LowDiskSpace + expr: (node_filesystem_free_bytes / node_filesystem_size_bytes) * 100 < 10 + for: 1m + labels: + severity: critical + annotations: + summary: "Low Disk Space detected" + description: "Disk space is below 10% for more than 1 minute." + + - alert: HighMemoryUsage + expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80 + for: 1m + labels: + severity: warning + annotations: + summary: "High Memory Usage detected" + description: "Memory usage is above 80% for more than 1 minute." + + +Configure SMTP +============== + +.. code-block:: yaml + + # Path: /etc/prometheus/alertmanager.yml + + global: + smtp_smarthost: 'smtp.example.com:587' + smtp_from: 'alertmanager@example.com' + smtp_auth_username: 'yourusername' + smtp_auth_password: 'yourpassword' + + route: + receiver: 'email' + + receivers: + - name: 'email' + email_configs: + - to: 'recipient@example.com' + send_resolved: true + + +Add your alert rules to Prometheus +================================== + +.. code-block:: yaml + + #Path: /etc/prometheus/prometheus.yml + # Add here your alert_rules.yml files + rule_files: + - "first_rule.yml" + - # "second_rule.yml" + + +Edit the alertmanager systemd service file +============================================ + +.. code-block:: systemd + + # Path: /usr/lib/systemd/system/prometheus-alertmanager.service + + [Unit] + Description=Alertmanager for prometheus + Documentation=https://prometheus.io/docs/alerting/alertmanager/ + + [Service] + Restart=on-failure + User=prometheus + EnvironmentFile=/etc/default/prometheus-alertmanager + ExecStart=/usr/bin/prometheus-alertmanager \ + --cluster.advertise-address="ip:9093" # Add this, as otherwise it won't work + ExecReload=/bin/kill -HUP $MAINPID + TimeoutStopSec=20s + SendSIGKILL=no + + [Install] + WantedBy=multi-user.target + + +.. code-block:: console + + # systemctl daemon-reload + # systemctl restart prometheus-alertmanager + # systemctl restart prometheus + +Check +===== + +You can check both your rules (http://ip:9090/rules) and alerts (http://ip:9090/alerts), from your web browser. +