summaryrefslogtreecommitdiff
path: root/taler-monitoring-infrastructure.rst
diff options
context:
space:
mode:
Diffstat (limited to 'taler-monitoring-infrastructure.rst')
-rw-r--r--taler-monitoring-infrastructure.rst197
1 files changed, 197 insertions, 0 deletions
diff --git a/taler-monitoring-infrastructure.rst b/taler-monitoring-infrastructure.rst
new file mode 100644
index 00000000..3b809fb3
--- /dev/null
+++ b/taler-monitoring-infrastructure.rst
@@ -0,0 +1,197 @@
+..
+ This file is part of GNU TALER.
+
+ Copyright (C) 2014-2023 Taler Systems SA
+
+ TALER is free software; you can redistribute it and/or modify it under the
+ terms of the GNU Affero General Public License as published by the Free Software
+ Foundation; either version 2.1, or (at your option) any later version.
+
+ TALER is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
+ A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
+
+ You should have received a copy of the GNU Affero General Public License along with
+ TALER; see the file COPYING. If not, see <http://www.gnu.org/licenses/>
+
+ @author Javier Sepulveda
+.. _taler-merchant-monitoring:
+
+GNU Taler monitoring
+####################
+
+.. image:: images/taler-monitoring-infrastructure.png
+
+In order to check the availability of our server infrastructure, we use the Grafana and Uptime KUMA monitoring programs.
+
+On the one hand Grafana let us to see *graphically* the server consumption resources, and even alert us of some specific situations.
+On the other hand with a more basic tool such as Uptime KUMA (which does mostly ping and https checks),
+we get the very first status information, as the very first countermeasure.
+
+Grafana
+=======
+
+- Our grafana instance can be reached at https://grafana.taler.net
+
+User accounts:
+--------------
+
+We have only two main user accounts:
+
+- One "admin" account for server administrators.
+- One general "read-only" account, for the rest of the team.
+
+How to install Grafana
+----------------------
+
+Please refer to the Grafana official website for installation instructions for your specific operating system. For the
+specific case of the GNU/Linux distribution Debian 12 (bookworm), you can use the next set of instructions.
+
+.. code-block:: console
+
+ # apt-get install -y apt-transport-https
+ # apt-get install -y software-properties-common wget
+ # wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key
+ # echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | tee -a /etc/apt/sources.list.d/grafana.list
+ # apt update
+ # apt-get install grafana
+ # systemctl daemon-reload
+ # systemctl enable --now grafana-server
+
+.. note::
+
+ If you want to deploy grafana automatically, and if you have access to the --private git repository "migration-exercise-stable.git",
+ please clone it, and execute from Grafana subfolder the grafana.sh file. This script will install for you Grafana and will leave it up and running on port 3000 of your server.
+
+Grafana Dashboards
+------------------
+
+As we understand creating tailored Grafana dashboards, is very time consuming thing to do, and in the top of that
+you really have to to be very proficient to do that, we use the available and pre-built `Grafana dashboards <https://grafana.com/grafana/dashboards/>`_, which eventually we can also tweak a little, to fit our needs.
+
+Node Exporter
+++++++++++++++
+
+- More information can be found on the `Node Exporter <https://grafana.com/grafana/dashboards/1860-node-exporter-full/>`_ website.
+- Dashboard ID: 1860
+
+.. note::
+
+ If you want to deploy Postgres Exporter automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it,
+ and execute from the subfolder taler.net/grafana/node-exporter.sh, this script will install for you Node Exporter and will leave it running on port 9100.
+ This script also will create, start, and enable on reboot a new service.
+
+Postgres Exporter
++++++++++++++++++
+
+- More information can be found on the `PostgreSQL exporter <https://grafana.com/grafana/dashboards/12485-postgresql-exporter/>`_ website.
+- Dashboard ID: 12485
+
+.. image:: images/grafana-postgres-exporter.png
+
+.. note::
+
+ If you want to deploy Postgres Exporter automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it,
+ and execute from the subfolder taler.net/grafana/postgres-exporter.sh, this script will install for you Grafana and will leave it running on port 9187.
+
+Uptime Kuma from Grafana
+++++++++++++++++++++++++
+
+This is an easy to way to integrate all monitored websites from Uptime Kuma, into Grafana. Thus,
+from the same place (Grafana), you can check also the status of the website and the expiration date of the
+certificates.
+
+- More information can be found on the `Uptime Kuma for Grafana <https://grafana.com/grafana/dashboards/18278-uptime-kuma/>`_ website.
+- Dashboard ID: 18278
+
+.. image:: images/uptime-kuma-from-grafana.png
+
+Grafana Data Sources
+---------------------
+As a data source connector we use Prometheus.
+
+Prometheus
+++++++++++
+More information can be found in the `Grafana and Prometheus <https://grafana.com/docs/grafana/latest/getting-started/get-started-grafana-prometheus/>`_ website.
+
+.. note::
+
+ If you want to deploy Prometheus automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it,
+ and execute from the subfolder taler.net/grafana/prometheus.sh, this script will install for you Grafana and will leave it running on port 9090.
+
+Managing logs
+-------------
+
+In order to manage logs, we use Loki + Promtail (Debian packages), which are very easy to integrate with Grafana and Prometheus.
+
+.. code-block:: console
+
+ # Install
+ # apt-get install loki promtail
+ # Start services
+ # systemctl start loki promtail
+ # Enable services on reboot
+ # systemctl enable loki
+ # systemctl enable promtail
+
+Loki and Promtail services in Grafana
+----------------------------------------------
+
+1) Make sure you have prometheus running on port 9090
+2) Make sure you have loki running on port 3100
+
+.. code-block:: console
+
+ systemctl status prometheus loki
+
+
+.. note::
+
+ We still don't have Loki and Promtail installed in production (taler.net), and neither
+ configured to track certain log files.
+
+Grafana Alerting
+----------------
+
+#. In order to use the Grafana alerting system rules, you need first to configure working SMTP service of your server.
+#. Once you have done the necessary changes on the Grafana configuration file, you have to either restart or reload the "grafana-server" service with the systemctl command as usual.
+#. Then go to the Grafana admin panel Alerting -> Contact points, and within the email address you are using for this purpose, check if SMTP is indeed working by pressing the "test" button.
+#. If that works, you will receive an email in your mailbox with the Grafana logo confirming that the server can satisfactorily send email messages.
+
+
+Uptime Kuma
+===========
+
+- URL: http://139.162.254.179:3001/dashboard
+- Users: One single administration account with full privileges.
+- Installation: With Docker
+
+.. image:: images/kuma.png
+
+.. note::
+
+ 1) In order to guarantee the KUMA is doing its work, it needs to be install 100% externally from the servers you want to monitor. (Server Kuma 1)
+ 2) Also, it is important to monitor the KUMA server itself, so you don't endup without a monitoring system. (Server Kuma 2)
+
+In our case, we do both. We have the two Uptime KUMA servers completely outside our server infrastructure, so one monitors the other, and
+the latter one, monitors our own Taler servers.
+
+Kuma monitor types
+-------------------
+
+Kuma counts with quite a few monitor types, such as https, TCP port or ping. In our case, we use mainly https requests,
+and pings, to make sure as a first check that our servers are responsive.
+
+Another handy feature that Kuma has, is the "Certificate Expiry Notification feature, which we also use, and eventually warn us about a certificate
+expiration dates.
+
+So in brief in our KUMA main server, we use these 3 monitor types (ping,https,certificate expiration) for each website that we monitor.
+
+Exceptionally for additional notifications, and specifically due of the importance of the Taler Operations server,
+we use in addition SMS notifications (clicksend provider). This way in case of KUMA detecting the Taler Operations unavailability,
+a SMS message will be sent to at the very least two persons from the deployment and operations department.
+
+How to edit notifications:
+
+.. image:: images/uptime-kuma-edit.png
+