taler-docs

Documentation for GNU Taler components, APIs and protocols
Log | Files | Refs | README | LICENSE

taler-monitoring-infrastructure.rst (8697B)


      1 ..
      2   This file is part of GNU TALER.
      3 
      4   Copyright (C) 2014-2023 Taler Systems SA
      5 
      6   TALER is free software; you can redistribute it and/or modify it under the
      7   terms of the GNU Affero General Public License as published by the Free Software
      8   Foundation; either version 2.1, or (at your option) any later version.
      9 
     10   TALER is distributed in the hope that it will be useful, but WITHOUT ANY
     11   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
     12   A PARTICULAR PURPOSE.  See the GNU Affero General Public License for more details.
     13 
     14   You should have received a copy of the GNU Affero General Public License along with
     15   TALER; see the file COPYING.  If not, see <http://www.gnu.org/licenses/>
     16 
     17   @author Javier Sepulveda
     18 .. _taler-merchant-monitoring:
     19 
     20 GNU Taler monitoring 
     21 ####################
     22 
     23 .. image:: images/taler-monitoring-infrastructure.png
     24 
     25 In order to check the availability of our server infrastructure, we use the Grafana and Uptime KUMA monitoring programs. 
     26 
     27 On the one hand Grafana enables us to see *graphically* the server consumption resources, and even alert us of some specific situations. 
     28 On the other hand with a more basic tool such as Uptime KUMA (which does mostly ping and https checks), 
     29 we get the very first status information, as a very first countermeasure.  
     30 
     31 Grafana
     32 =======
     33 
     34 - Our grafana instance can be reached at https://grafana.taler.net
     35 - Our grafana instance is installed on the (TUE) server
     36 
     37 User accounts:
     38 --------------
     39 
     40 We have only two main user accounts: 
     41 
     42 - One "admin" account for server administrators.
     43 - One general "read-only" account, for the rest of the team. 
     44 
     45 How to install Grafana
     46 ----------------------
     47 
     48 Please refer to the Grafana official website for installation instructions for your specific operating system. For the 
     49 specific case of the GNU/Linux distribution Debian 13 (trixie), you can use the next set of instructions.
     50 
     51 .. code-block:: console
     52    
     53    # apt-get install -y apt-transport-https
     54    # apt-get install -y software-properties-common wget
     55    # wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key
     56    # echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | tee -a /etc/apt/sources.list.d/grafana.list
     57    # apt update
     58    # apt-get install grafana
     59    # systemctl daemon-reload
     60    # systemctl enable --now  grafana-server
     61 
     62 .. note::
     63   
     64    If you want to deploy grafana automatically, and if you have access to the --private git repository "migration-exercise-stable.git",
     65    please clone it, and execute from Grafana subfolder the grafana.sh file. This script will install for you Grafana and will leave it up and running on port 3000 of your server.
     66 
     67 Grafana Dashboards
     68 ------------------
     69 
     70 As we understand creating tailored Grafana dashboards, is very time consuming thing to do, and in the top of that
     71 you really have to to be very proficient to do that,  we use the available and pre-built `Grafana dashboards <https://grafana.com/grafana/dashboards/>`_, which eventually we can also tweak a little, to fit our needs. 
     72 
     73 Node Exporter
     74 ++++++++++++++
     75 
     76 - More information can be found on the `Node Exporter <https://grafana.com/grafana/dashboards/1860-node-exporter-full/>`_ website. 
     77 - Dashboard ID: 1860
     78 
     79 .. note::
     80 
     81    If you want to deploy Postgres Exporter automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it,
     82    and execute from the subfolder taler.net/grafana/node-exporter.sh, this script will install for you Node Exporter and will leave it running on port 9100.
     83    This script also will create, start, and enable on reboot a new service. 
     84 
     85 Postgres Exporter
     86 +++++++++++++++++
     87 
     88 - More information can be found on the `PostgreSQL exporter <https://grafana.com/grafana/dashboards/12485-postgresql-exporter/>`_ website.
     89 - Dashboard ID: 12485
     90 
     91 .. image:: images/grafana-postgres-exporter.png
     92 
     93 .. note::
     94 
     95    If you want to deploy Postgres Exporter automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it,  
     96    and execute from the subfolder taler.net/grafana/postgres-exporter.sh, this script will install for you Grafana and will leave it running on port 9187.
     97    
     98 Uptime Kuma from Grafana
     99 ++++++++++++++++++++++++
    100 
    101 This is an easy to way to integrate all monitored websites from Uptime Kuma, into Grafana. Thus, 
    102 from the same place (Grafana), you can check also the status of the website and the expiration date of the 
    103 certificates. 
    104 
    105 - More information can be found on the `Uptime Kuma for Grafana <https://grafana.com/grafana/dashboards/18278-uptime-kuma/>`_ website.
    106 - Dashboard ID: 18278
    107 
    108 .. image:: images/uptime-kuma-from-grafana.png
    109    
    110 Grafana Data Sources
    111 ---------------------
    112 As a data source connector we use Prometheus.
    113 
    114 Prometheus
    115 ++++++++++
    116 More information can be found in the `Grafana and Prometheus <https://grafana.com/docs/grafana/latest/getting-started/get-started-grafana-prometheus/>`_ website.
    117 
    118 .. note::
    119 
    120    If you want to deploy Prometheus automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it,  
    121    and execute from the subfolder taler.net/grafana/prometheus.sh, this script will install for you Grafana and will leave it running on port 9090. 
    122 
    123 Managing logs
    124 -------------
    125 
    126 In order to manage logs, we use Loki + Promtail (Debian packages), which are very easy to integrate with Grafana and Prometheus. 
    127 
    128 .. code-block:: console
    129 
    130    # Install
    131    # apt-get install loki promtail
    132    # Start services
    133    # systemctl start loki promtail
    134    # Enable services on reboot
    135    # systemctl enable loki
    136    # systemctl enable promtail
    137 
    138 Loki and Promtail services in Grafana
    139 ----------------------------------------------
    140 
    141 1) Make sure you have prometheus running on port 9090
    142 2) Make sure you have loki running on port 3100
    143 
    144 .. code-block:: console
    145 
    146    systemctl status prometheus loki
    147    
    148    
    149 .. note::
    150 
    151    We still don't have Loki and Promtail installed in production (taler.net), and neither
    152    configured to track certain log files. 
    153 
    154 Grafana Alerting
    155 ----------------
    156 
    157 #. In order to use the Grafana alerting system rules, you need first to configure working SMTP service of your server. 
    158 #. Once you have done the necessary changes on the Grafana configuration file, you have to either restart or reload the "grafana-server" service with the systemctl command as usual.
    159 #. Then go to the Grafana admin panel Alerting -> Contact points, and within the email address you are using for this purpose, check if SMTP is indeed working by pressing the "test" button.
    160 #. If that works, you will receive an email in your mailbox with the Grafana logo confirming that the server can satisfactorily send email messages. 
    161  
    162 
    163 Uptime Kuma
    164 ===========
    165 
    166 - URL: https://uptimekuma.anastasis.lu (main)
    167 - Users: One single administration account with full privileges.
    168 - Installation: Without docker. All within the user home folder /home/uptime-kuma
    169 - Monitors almost all our servers, websites and certificates expiration dates.
    170 
    171 - URL: https://uptimekuma.taler.net
    172 - Users: One single administration account with full privileges.
    173 - Installation: Without docker. All within the user home folder /home/uptime-kuma
    174 - Monitors the "main" uptimekuma installation, to make sure it is up and running, and doing the monitoring properly. 
    175 
    176 .. image:: images/kuma.png
    177 
    178 .. note::
    179    
    180    1) The main uptimekuma installation is under the server anastasis.lu
    181    2) The second uptimekuma installation on top, is installed on gv.taler.net. 
    182 
    183 Kuma monitor types
    184 -------------------
    185 
    186 Kuma counts with quite a few monitor types, such as https, TCP port or ping. In our case, we use mainly https requests, 
    187 and pings, to make sure as a first check that our servers are responsive. 
    188 
    189 Another handy feature that Kuma has, is the "Certificate Expiry Notification feature,  which we also use, and eventually  warn us about a certificate
    190 expiration dates. 
    191 
    192 So in brief in our KUMA main server, we use these 3 monitor types (ping,https,certificate expiration) for each website that we monitor. 
    193 
    194 Exceptionally for high priority notifications for essential services, and specifically due of the importance of the Taler Operations production
    195 server, we use in addition SMS notifications (Clicksend provider). This way in the case the main uptimekuma detecting the Taler Operations server unavailability, or any other essential service such as GIt, 
    196 a SMS message would be sent to the system administrator and eventually some other team member of the deployment and operations department, for urgent action. 
    197  
    198 
    199 How to edit notifications:
    200 
    201 .. image:: images/uptime-kuma-edit.png
    202