summaryrefslogtreecommitdiff
path: root/system-administration/taler-monitoring-infrastructure.rst
blob: e1b26c3b224821d3f8591f3a08c50f7a2abad1d8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
..
  This file is part of GNU TALER.

  Copyright (C) 2014-2023 Taler Systems SA

  TALER is free software; you can redistribute it and/or modify it under the
  terms of the GNU Affero General Public License as published by the Free Software
  Foundation; either version 2.1, or (at your option) any later version.

  TALER is distributed in the hope that it will be useful, but WITHOUT ANY
  WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
  A PARTICULAR PURPOSE.  See the GNU Affero General Public License for more details.

  You should have received a copy of the GNU Affero General Public License along with
  TALER; see the file COPYING.  If not, see <http://www.gnu.org/licenses/>

  @author Javier Sepulveda
.. _taler-merchant-monitoring:

GNU Taler monitoring 
####################

.. image:: images/taler-monitoring-infrastructure.png

In order to check the availability of our server infrastructure, we use the Grafana and Uptime KUMA monitoring programs. 

On the one hand Grafana enables us to see *graphically* the server consumption resources, and even alert us of some specific situations. 
On the other hand with a more basic tool such as Uptime KUMA (which does mostly ping and https checks), 
we get the very first status information, as a very first countermeasure.  

Grafana
=======

- Our grafana instance can be reached at https://grafana.taler.net
- Our grafana instance is installed on the (TUE) server

User accounts:
--------------

We have only two main user accounts: 

- One "admin" account for server administrators.
- One general "read-only" account, for the rest of the team. 

How to install Grafana
----------------------

Please refer to the Grafana official website for installation instructions for your specific operating system. For the 
specific case of the GNU/Linux distribution Debian 12 (bookworm), you can use the next set of instructions.

.. code-block:: console
   
   # apt-get install -y apt-transport-https
   # apt-get install -y software-properties-common wget
   # wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key
   # echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | tee -a /etc/apt/sources.list.d/grafana.list
   # apt update
   # apt-get install grafana
   # systemctl daemon-reload
   # systemctl enable --now  grafana-server

.. note::
  
   If you want to deploy grafana automatically, and if you have access to the --private git repository "migration-exercise-stable.git",
   please clone it, and execute from Grafana subfolder the grafana.sh file. This script will install for you Grafana and will leave it up and running on port 3000 of your server.

Grafana Dashboards
------------------

As we understand creating tailored Grafana dashboards, is very time consuming thing to do, and in the top of that
you really have to to be very proficient to do that,  we use the available and pre-built `Grafana dashboards <https://grafana.com/grafana/dashboards/>`_, which eventually we can also tweak a little, to fit our needs. 

Node Exporter
++++++++++++++

- More information can be found on the `Node Exporter <https://grafana.com/grafana/dashboards/1860-node-exporter-full/>`_ website. 
- Dashboard ID: 1860

.. note::

   If you want to deploy Postgres Exporter automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it,
   and execute from the subfolder taler.net/grafana/node-exporter.sh, this script will install for you Node Exporter and will leave it running on port 9100.
   This script also will create, start, and enable on reboot a new service. 

Postgres Exporter
+++++++++++++++++

- More information can be found on the `PostgreSQL exporter <https://grafana.com/grafana/dashboards/12485-postgresql-exporter/>`_ website.
- Dashboard ID: 12485

.. image:: images/grafana-postgres-exporter.png

.. note::

   If you want to deploy Postgres Exporter automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it,  
   and execute from the subfolder taler.net/grafana/postgres-exporter.sh, this script will install for you Grafana and will leave it running on port 9187.
   
Uptime Kuma from Grafana
++++++++++++++++++++++++

This is an easy to way to integrate all monitored websites from Uptime Kuma, into Grafana. Thus, 
from the same place (Grafana), you can check also the status of the website and the expiration date of the 
certificates. 

- More information can be found on the `Uptime Kuma for Grafana <https://grafana.com/grafana/dashboards/18278-uptime-kuma/>`_ website.
- Dashboard ID: 18278

.. image:: images/uptime-kuma-from-grafana.png
   
Grafana Data Sources
---------------------
As a data source connector we use Prometheus.

Prometheus
++++++++++
More information can be found in the `Grafana and Prometheus <https://grafana.com/docs/grafana/latest/getting-started/get-started-grafana-prometheus/>`_ website.

.. note::

   If you want to deploy Prometheus automatically and have access to the --private git repository "migration-exercise-stable.git", please clone it,  
   and execute from the subfolder taler.net/grafana/prometheus.sh, this script will install for you Grafana and will leave it running on port 9090. 

Managing logs
-------------

In order to manage logs, we use Loki + Promtail (Debian packages), which are very easy to integrate with Grafana and Prometheus. 

.. code-block:: console
   
   # Install
   # apt-get install loki promtail
   # Start services
   # systemctl start loki promtail
   # Enable services on reboot
   # systemctl enable loki
   # systemctl enable promtail

Loki and Promtail services in Grafana
----------------------------------------------

1) Make sure you have prometheus running on port 9090
2) Make sure you have loki running on port 3100

.. code-block:: console

   systemctl status prometheus loki
   
   
.. note::

   We still don't have Loki and Promtail installed in production (taler.net), and neither
   configured to track certain log files. 

Grafana Alerting
----------------

#. In order to use the Grafana alerting system rules, you need first to configure working SMTP service of your server. 
#. Once you have done the necessary changes on the Grafana configuration file, you have to either restart or reload the "grafana-server" service with the systemctl command as usual.
#. Then go to the Grafana admin panel Alerting -> Contact points, and within the email address you are using for this purpose, check if SMTP is indeed working by pressing the "test" button.
#. If that works, you will receive an email in your mailbox with the Grafana logo confirming that the server can satisfactorily send email messages. 
 

Uptime Kuma
===========

- URL: https://uptimekuma.anastasis.lu (main)
- Users: One single administration account with full privileges.
- Installation: Without docker. All within the user home folder /home/uptime-kuma
- Monitors almost all our servers, websites and certificates expiration dates.

- URL: https://uptimekuma.taler.net
- Users: One single administration account with full privileges.
- Installation: Without docker. All within the user home folder /home/uptime-kuma
- Monitors the "main" uptimekuma installation, to make sure it is up and running, and doing the monitoring properly. 

.. image:: images/kuma.png

.. note::
   
   1) The main uptimekuma installation is under the server anastasis.lu
   2) The second uptimekuma installation on top, is installed on gv.taler.net. 

Kuma monitor types
-------------------

Kuma counts with quite a few monitor types, such as https, TCP port or ping. In our case, we use mainly https requests, 
and pings, to make sure as a first check that our servers are responsive. 

Another handy feature that Kuma has, is the "Certificate Expiry Notification feature,  which we also use, and eventually  warn us about a certificate
expiration dates. 

So in brief in our KUMA main server, we use these 3 monitor types (ping,https,certificate expiration) for each website that we monitor. 

Exceptionally for high priority notifications for essential services, and specifically due of the importance of the Taler Operations production
server, we use in addition SMS notifications (Clicksend provider). This way in the case the main uptimekuma detecting the Taler Operations server unavailability, or any other essential service such as GIt, 
a SMS message would be sent to the system administrator and eventually some other team member of the deployment and operations department, for urgent action. 
 

How to edit notifications:

.. image:: images/uptime-kuma-edit.png