commit 4bbb9e5ccc9ee1126382e891c6fd06190cebfe30
parent 4da6f9020a343feccf432db5c3dabfe23c105f93
Author: Antoine A <>
Date: Mon, 16 Feb 2026 14:10:55 +0100
dd85 & dd85: improvements
Diffstat:
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/design-documents/084-simple-observability.rst b/design-documents/084-simple-observability.rst
@@ -34,6 +34,9 @@ Requirements
Proposed Solution
=================
+Health endpoint
+---------------
+
All services have a least one REST API. This API should expose a health endpoint that would expose the service global status, this means the httpd process but also all the other components it's use (either ``ok`` if everything is fine, or ``degraded`` if it's running but not functioning properly). We are also adding a way to add more context in a less structured way to help with remediation.
.. ts:def:: HealthStatus
@@ -91,6 +94,13 @@ For taler-exchange:
Next, Uptime Kuma can be configured to retrieve this endpoint and trigger an alert when the status is degraded if the API is in place.
The JSON body can be shared in the alert, which makes remediation easier because we have hints as to what is not working well inside the service.
+Logs
+----
+
+Currently, we also rely on logs to detect failures. To do this, we need to ingest the logs into a monitoring system in order to analyze them and generate alerts. We need to decide whether we want to continue doing this or whether we can choose to use the logs only for remediation and therefore leave them where they are.
+
+Whenever we want to analyze logs to detect a failure condition, we need to see if it is possible for the system to expose it in its health endpoint.
+
Test Plan
=========
diff --git a/design-documents/085-transfer-status.rst b/design-documents/085-transfer-status.rst
@@ -29,8 +29,8 @@ For each transfer we would store a list of all status it whent through.
TODO
-API
----
+Wire Gateway API
+----------------
.. http:get:: /transfers-status
@@ -92,6 +92,13 @@ API
timestamp: Timestamp;
}
+Exchange aggregation
+--------------------
+
+I think all this information should be stored and aggregated within the exchange. Wire adapters should only contain code to abstract the underlying system internal workings, but the remediation logic should be at another level.
+
+The exchange should read the status anyway to refund in case of failure, but if we want to add more complex rules and manual remediation, we could create another component that I could maintain if it makes life easier for Christian.
+
Test Plan
=========