commit 3d8f0de592a4649ef2b107a5e2c9030637864038
parent f92ef42d8e11f2a3d64681110d30d26778e7dad7
Author: Antoine A <>
Date: Mon, 16 Feb 2026 03:51:58 +0100
init dd84 & dd85
Diffstat:
3 files changed, 238 insertions(+), 0 deletions(-)
diff --git a/design-documents/084-simple-observability.rst b/design-documents/084-simple-observability.rst
@@ -0,0 +1,128 @@
+DD 84: Simple observability
+###########################
+
+Summary
+=======
+
+We want a simple way to check whether our various services are working properly.
+
+Motivation
+==========
+
+We want to have observability, and the obvious maximalist solution is Prometheus and Grafana.
+The problem is that this system is so complex to configure and so cumbersome that we never have the time to configure properly.
+By trying to have a perfect solution, we end up with none at all.
+
+I propose a simple solution based on health endpoints that should give us most of what we need more quickly.
+
+Requirements
+============
+
+* Easy to implement by services maintainers
+* Easy to configure
+* Easy to maintain
+* Effective at detecting downtime or degraded states
+
+Proposed Solution
+=================
+
+Each service should have an health endpoint that give its current health status:
+
+.. ts:def:: HealthStatus
+
+ interface HealthStatus {
+ // Whether the service is running fine or in a degraded way
+ status: "ok" | "degraded";
+ // Additional information about the service components
+ components: [key: string]: string;
+ }
+
+For libeufin-bank:
+
+.. code-block:: json
+
+ {
+ "status": "degraded",
+ "component": {
+ "database": "ok",
+ "tan-sms": "ok",
+ "tan-email": "failure"
+ }
+ }
+
+For libeufin-nexus:
+
+.. code-block:: json
+
+ {
+ "status": "degraded",
+ "component": {
+ "database": "ok",
+ "ebics-submit": "ok",
+ "ebics-fetch": "failure"
+ }
+ }
+
+For taler-exchange:
+
+.. code-block:: json
+
+ {
+ "status": "degraded",
+ "component": {
+ "database": "ok",
+ "wirewatch": "failure"
+ }
+ }
+
+Next, Uptime Kuma can be configured to retrieve this endpoint and trigger an alert when the status is degraded event if the API is up.
+The JSON body can be shared within the alert, which makes debugging easier because we have a clue as to what is failling.
+
+Test Plan
+=========
+
+- Add health endpoint to libeufin-bank and libeufin-nexus
+- Deploy on demo
+- Test status update and alerts
+
+Alternatives
+============
+
+Prometheus format with Uptime Kuma
+----------------------------------
+
+Instead of using JSON we could use Prometheus metrics textual format. This would make upgrading to a better observability system easier.
+
+.. code-block:: text
+
+ # HELP app_status 1=ok, 0.5=degraded, 0=down
+ # TYPE app_status gauge
+ app_status 0.5
+
+ # HELP component_database_status 1=ok, 0=failure
+ # TYPE component_database_status gauge
+ database_status 1.0
+
+ # HELP component_tan_sms_status 1=ok, 0=failure
+ # TYPE component_tan_sms_status gauge
+ tan_sms_status 1.0
+
+ # HELP component_tan_email_status 1=ok, 0=failure
+ # TYPE component_tan_email_status gauge
+ tan_email_status 0.0
+
+We could also use the existing Taler Observability API.
+
+Prometheus & Grafana alternative
+--------------------------------
+
+We can try to use other more performant while not simpler alternative like Victoria Metrics.
+
+Drawbacks
+=========
+
+This does not resolve the issue of system resources and services.
+It's therefore not sufficient for a complete observability system. However, it is easier to implement for now.
+
+Discussion / Q&A
+================
diff --git a/design-documents/085-transfer-status.rst b/design-documents/085-transfer-status.rst
@@ -0,0 +1,108 @@
+DD 85: Transfer status
+######################
+
+Summary
+=======
+
+We need a way to handle wire gateway incoming and outgoing transfer failures.
+Automatically when possible with fallbacks to manual resolution.
+
+Motivation
+==========
+
+Right now when we make a deposit the wallet show confirmation before the transfer is actually made.
+In case of failure nothing is done automatically and the user has no way to see it from it's wallet.
+
+We already have a transfer status API in place but it's not suitable for automation as it's expose a paginated list of transfers and we need a pagniated list of transfer status changes.
+
+I think the Wire Gateway API should expose a transfer status history endpoint and the logic for failure resolution should be done at a higher level in the exchange.
+
+We also need an API for incoming transactions that are malformed but cannot be bounce.
+
+Proposed Solution
+=================
+
+Database
+--------
+
+For each transfer we would store a list of all status it whent through.
+
+TODO
+
+API
+---
+
+.. http:get:: /transfers-status
+
+ Return a list of transfers status changes.
+
+ **Request:**
+
+ :query limit: *Optional.*
+ At most return the given number of results. Negative for descending by
+ ``row_id``, positive for ascending by ``row_id``. Defaults to ``-20``.
+ :query offset: *Optional.*
+ Starting ``row_id`` for :ref:`pagination <row-id-pagination>`.
+ :query status: *Optional*.
+ Filters by status.
+ :query transfer_id: *Optional*
+ Only list statuses for a specific transfer.
+
+ **Response:**
+
+ :http:statuscode:`200 OK`:
+ JSON object of type `TransferList`.
+ :http:statuscode:`204 No content`:
+ There are no transfers statuses to report (under the given filter).
+ :http:statuscode:`400 Bad request`:
+ Request malformed.
+ :http:statuscode:`401 Unauthorized`:
+ Authentication failed, likely the credentials are wrong.
+ :http:statuscode:`404 Not found`:
+ The endpoint is wrong or the user name is unknown.
+
+ **Details:**
+
+ .. ts:def:: TransferStatusList
+
+ interface TransferStatusList {
+ // Array of transfers statuses
+ statuses: TransferListStatus[];
+ }
+
+ .. ts:def:: TransferStatus
+
+ interface TransferStatus {
+ // Opaque ID of the status change.
+ // Is is different from the /transfers
+ row_id: SafeUint64;
+
+ // Opaque ID of the wire transfer initiation performed by the bank.
+ // It is different from the /history endpoints row_id.
+ transfer_id: SafeUint64;
+
+ // Status of the transfer at this time
+ // pending: the transfer is in progress
+ // transient_failure: the transfer has failed but may succeed later
+ // permanent_failure: the transfer has failed permanently and will never appear in the outgoing history
+ // success: the transfer has succeeded and appears in the outgoing history
+ status: "pending" | "transient_failure" | "permanent_failure" | "success";
+
+ // Timestamp that indicates when this status was reached.
+ timestamp: Timestamp;
+ }
+
+Test Plan
+=========
+
+
+Alternatives
+============
+
+
+Drawbacks
+=========
+
+
+Discussion / Q&A
+================
diff --git a/design-documents/index.rst b/design-documents/index.rst
@@ -95,4 +95,6 @@ Design documents that start with "XX" are considered deprecated.
081-shop-discovery
082-wallet-diagnostics
083-wallet-initiated-withdrawal
+ 084-simple-observability
+ 085-transfer-status
999-template