taler-monitoring-infrastructure.rst (8697B)
1 .. 2 This file is part of GNU TALER. 3 4 Copyright (C) 2014-2023 Taler Systems SA 5 6 TALER is free software; you can redistribute it and/or modify it under the 7 terms of the GNU Affero General Public License as published by the Free Software 8 Foundation; either version 2.1, or (at your option) any later version. 9 10 TALER is distributed in the hope that it will be useful, but WITHOUT ANY 11 WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR 12 A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. 13 14 You should have received a copy of the GNU Affero General Public License along with 15 TALER; see the file COPYING. If not, see <http://www.gnu.org/licenses/> 16 17 @author Javier Sepulveda 18 .. _taler-merchant-monitoring: 19 20 GNU Taler monitoring 21 #################### 22 23 .. image:: images/taler-monitoring-infrastructure.png 24 25 In order to check the availability of our server infrastructure, we use the Grafana and Uptime KUMA monitoring programs. 26 27 On the one hand Grafana enables us to see *graphically* the server consumption resources, and even alert us of some specific situations. 28 On the other hand with a more basic tool such as Uptime KUMA (which does mostly ping and https checks), 29 we get the very first status information, as a very first countermeasure. 30 31 Grafana 32 ======= 33 34 - Our grafana instance can be reached at https://grafana.taler.net 35 - Our grafana instance is installed on the (TUE) server 36 37 User accounts: 38 -------------- 39 40 We have only two main user accounts: 41 42 - One "admin" account for server administrators. 43 - One general "read-only" account, for the rest of the team. 44 45 How to install Grafana 46 ---------------------- 47 48 Please refer to the Grafana official website for installation instructions for your specific operating system. For the 49 specific case of the GNU/Linux distribution Debian 13 (trixie), you can use the next set of instructions. 50 51 .. code-block:: console 52 53 # apt-get install -y apt-transport-https 54 # apt-get install -y software-properties-common wget 55 # wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key 56 # echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | tee -a /etc/apt/sources.list.d/grafana.list 57 # apt update 58 # apt-get install grafana 59 # systemctl daemon-reload 60 # systemctl enable --now grafana-server 61 62 .. note:: 63 64 If you want to deploy grafana automatically, and if you have access to the --private git repository "migration-exercise-stable.git", 65 please clone it, and execute from Grafana subfolder the grafana.sh file. This script will install for you Grafana and will leave it up and running on port 3000 of your server. 66 67 Grafana Dashboards 68 ------------------ 69 70 As we understand creating tailored Grafana dashboards, is very time consuming thing to do, and in the top of that 71 you really have to to be very proficient to do that, we use the available and pre-built `Grafana dashboards <https://grafana.com/grafana/dashboards/>`_, which eventually we can also tweak a little, to fit our needs. 72 73 Node Exporter 74 ++++++++++++++ 75 76 - More information can be found on the `Node Exporter <https://grafana.com/grafana/dashboards/1860-node-exporter-full/>`_ website. 77 - Dashboard ID: 1860 78 79 .. note:: 80 81 If you want to deploy Postgres Exporter automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it, 82 and execute from the subfolder taler.net/grafana/node-exporter.sh, this script will install for you Node Exporter and will leave it running on port 9100. 83 This script also will create, start, and enable on reboot a new service. 84 85 Postgres Exporter 86 +++++++++++++++++ 87 88 - More information can be found on the `PostgreSQL exporter <https://grafana.com/grafana/dashboards/12485-postgresql-exporter/>`_ website. 89 - Dashboard ID: 12485 90 91 .. image:: images/grafana-postgres-exporter.png 92 93 .. note:: 94 95 If you want to deploy Postgres Exporter automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it, 96 and execute from the subfolder taler.net/grafana/postgres-exporter.sh, this script will install for you Grafana and will leave it running on port 9187. 97 98 Uptime Kuma from Grafana 99 ++++++++++++++++++++++++ 100 101 This is an easy to way to integrate all monitored websites from Uptime Kuma, into Grafana. Thus, 102 from the same place (Grafana), you can check also the status of the website and the expiration date of the 103 certificates. 104 105 - More information can be found on the `Uptime Kuma for Grafana <https://grafana.com/grafana/dashboards/18278-uptime-kuma/>`_ website. 106 - Dashboard ID: 18278 107 108 .. image:: images/uptime-kuma-from-grafana.png 109 110 Grafana Data Sources 111 --------------------- 112 As a data source connector we use Prometheus. 113 114 Prometheus 115 ++++++++++ 116 More information can be found in the `Grafana and Prometheus <https://grafana.com/docs/grafana/latest/getting-started/get-started-grafana-prometheus/>`_ website. 117 118 .. note:: 119 120 If you want to deploy Prometheus automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it, 121 and execute from the subfolder taler.net/grafana/prometheus.sh, this script will install for you Grafana and will leave it running on port 9090. 122 123 Managing logs 124 ------------- 125 126 In order to manage logs, we use Loki + Promtail (Debian packages), which are very easy to integrate with Grafana and Prometheus. 127 128 .. code-block:: console 129 130 # Install 131 # apt-get install loki promtail 132 # Start services 133 # systemctl start loki promtail 134 # Enable services on reboot 135 # systemctl enable loki 136 # systemctl enable promtail 137 138 Loki and Promtail services in Grafana 139 ---------------------------------------------- 140 141 1) Make sure you have prometheus running on port 9090 142 2) Make sure you have loki running on port 3100 143 144 .. code-block:: console 145 146 systemctl status prometheus loki 147 148 149 .. note:: 150 151 We still don't have Loki and Promtail installed in production (taler.net), and neither 152 configured to track certain log files. 153 154 Grafana Alerting 155 ---------------- 156 157 #. In order to use the Grafana alerting system rules, you need first to configure working SMTP service of your server. 158 #. Once you have done the necessary changes on the Grafana configuration file, you have to either restart or reload the "grafana-server" service with the systemctl command as usual. 159 #. Then go to the Grafana admin panel Alerting -> Contact points, and within the email address you are using for this purpose, check if SMTP is indeed working by pressing the "test" button. 160 #. If that works, you will receive an email in your mailbox with the Grafana logo confirming that the server can satisfactorily send email messages. 161 162 163 Uptime Kuma 164 =========== 165 166 - URL: https://uptimekuma.anastasis.lu (main) 167 - Users: One single administration account with full privileges. 168 - Installation: Without docker. All within the user home folder /home/uptime-kuma 169 - Monitors almost all our servers, websites and certificates expiration dates. 170 171 - URL: https://uptimekuma.taler.net 172 - Users: One single administration account with full privileges. 173 - Installation: Without docker. All within the user home folder /home/uptime-kuma 174 - Monitors the "main" uptimekuma installation, to make sure it is up and running, and doing the monitoring properly. 175 176 .. image:: images/kuma.png 177 178 .. note:: 179 180 1) The main uptimekuma installation is under the server anastasis.lu 181 2) The second uptimekuma installation on top, is installed on gv.taler.net. 182 183 Kuma monitor types 184 ------------------- 185 186 Kuma counts with quite a few monitor types, such as https, TCP port or ping. In our case, we use mainly https requests, 187 and pings, to make sure as a first check that our servers are responsive. 188 189 Another handy feature that Kuma has, is the "Certificate Expiry Notification feature, which we also use, and eventually warn us about a certificate 190 expiration dates. 191 192 So in brief in our KUMA main server, we use these 3 monitor types (ping,https,certificate expiration) for each website that we monitor. 193 194 Exceptionally for high priority notifications for essential services, and specifically due of the importance of the Taler Operations production 195 server, we use in addition SMS notifications (Clicksend provider). This way in the case the main uptimekuma detecting the Taler Operations server unavailability, or any other essential service such as GIt, 196 a SMS message would be sent to the system administrator and eventually some other team member of the deployment and operations department, for urgent action. 197 198 199 How to edit notifications: 200 201 .. image:: images/uptime-kuma-edit.png 202