taler-docs

Documentation for GNU Taler components, APIs and protocols
Log | Files | Refs | README | LICENSE

commit 3d8f0de592a4649ef2b107a5e2c9030637864038
parent f92ef42d8e11f2a3d64681110d30d26778e7dad7
Author: Antoine A <>
Date:   Mon, 16 Feb 2026 03:51:58 +0100

init dd84 & dd85

Diffstat:
Adesign-documents/084-simple-observability.rst | 128+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adesign-documents/085-transfer-status.rst | 108+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mdesign-documents/index.rst | 2++
3 files changed, 238 insertions(+), 0 deletions(-)

diff --git a/design-documents/084-simple-observability.rst b/design-documents/084-simple-observability.rst @@ -0,0 +1,128 @@ +DD 84: Simple observability +########################### + +Summary +======= + +We want a simple way to check whether our various services are working properly. + +Motivation +========== + +We want to have observability, and the obvious maximalist solution is Prometheus and Grafana. +The problem is that this system is so complex to configure and so cumbersome that we never have the time to configure properly. +By trying to have a perfect solution, we end up with none at all. + +I propose a simple solution based on health endpoints that should give us most of what we need more quickly. + +Requirements +============ + +* Easy to implement by services maintainers +* Easy to configure +* Easy to maintain +* Effective at detecting downtime or degraded states + +Proposed Solution +================= + +Each service should have an health endpoint that give its current health status: + +.. ts:def:: HealthStatus + + interface HealthStatus { + // Whether the service is running fine or in a degraded way + status: "ok" | "degraded"; + // Additional information about the service components + components: [key: string]: string; + } + +For libeufin-bank: + +.. code-block:: json + + { + "status": "degraded", + "component": { + "database": "ok", + "tan-sms": "ok", + "tan-email": "failure" + } + } + +For libeufin-nexus: + +.. code-block:: json + + { + "status": "degraded", + "component": { + "database": "ok", + "ebics-submit": "ok", + "ebics-fetch": "failure" + } + } + +For taler-exchange: + +.. code-block:: json + + { + "status": "degraded", + "component": { + "database": "ok", + "wirewatch": "failure" + } + } + +Next, Uptime Kuma can be configured to retrieve this endpoint and trigger an alert when the status is degraded event if the API is up. +The JSON body can be shared within the alert, which makes debugging easier because we have a clue as to what is failling. + +Test Plan +========= + +- Add health endpoint to libeufin-bank and libeufin-nexus +- Deploy on demo +- Test status update and alerts + +Alternatives +============ + +Prometheus format with Uptime Kuma +---------------------------------- + +Instead of using JSON we could use Prometheus metrics textual format. This would make upgrading to a better observability system easier. + +.. code-block:: text + + # HELP app_status 1=ok, 0.5=degraded, 0=down + # TYPE app_status gauge + app_status 0.5 + + # HELP component_database_status 1=ok, 0=failure + # TYPE component_database_status gauge + database_status 1.0 + + # HELP component_tan_sms_status 1=ok, 0=failure + # TYPE component_tan_sms_status gauge + tan_sms_status 1.0 + + # HELP component_tan_email_status 1=ok, 0=failure + # TYPE component_tan_email_status gauge + tan_email_status 0.0 + +We could also use the existing Taler Observability API. + +Prometheus & Grafana alternative +-------------------------------- + +We can try to use other more performant while not simpler alternative like Victoria Metrics. + +Drawbacks +========= + +This does not resolve the issue of system resources and services. +It's therefore not sufficient for a complete observability system. However, it is easier to implement for now. + +Discussion / Q&A +================ diff --git a/design-documents/085-transfer-status.rst b/design-documents/085-transfer-status.rst @@ -0,0 +1,108 @@ +DD 85: Transfer status +###################### + +Summary +======= + +We need a way to handle wire gateway incoming and outgoing transfer failures. +Automatically when possible with fallbacks to manual resolution. + +Motivation +========== + +Right now when we make a deposit the wallet show confirmation before the transfer is actually made. +In case of failure nothing is done automatically and the user has no way to see it from it's wallet. + +We already have a transfer status API in place but it's not suitable for automation as it's expose a paginated list of transfers and we need a pagniated list of transfer status changes. + +I think the Wire Gateway API should expose a transfer status history endpoint and the logic for failure resolution should be done at a higher level in the exchange. + +We also need an API for incoming transactions that are malformed but cannot be bounce. + +Proposed Solution +================= + +Database +-------- + +For each transfer we would store a list of all status it whent through. + +TODO + +API +--- + +.. http:get:: /transfers-status + + Return a list of transfers status changes. + + **Request:** + + :query limit: *Optional.* + At most return the given number of results. Negative for descending by + ``row_id``, positive for ascending by ``row_id``. Defaults to ``-20``. + :query offset: *Optional.* + Starting ``row_id`` for :ref:`pagination <row-id-pagination>`. + :query status: *Optional*. + Filters by status. + :query transfer_id: *Optional* + Only list statuses for a specific transfer. + + **Response:** + + :http:statuscode:`200 OK`: + JSON object of type `TransferList`. + :http:statuscode:`204 No content`: + There are no transfers statuses to report (under the given filter). + :http:statuscode:`400 Bad request`: + Request malformed. + :http:statuscode:`401 Unauthorized`: + Authentication failed, likely the credentials are wrong. + :http:statuscode:`404 Not found`: + The endpoint is wrong or the user name is unknown. + + **Details:** + + .. ts:def:: TransferStatusList + + interface TransferStatusList { + // Array of transfers statuses + statuses: TransferListStatus[]; + } + + .. ts:def:: TransferStatus + + interface TransferStatus { + // Opaque ID of the status change. + // Is is different from the /transfers + row_id: SafeUint64; + + // Opaque ID of the wire transfer initiation performed by the bank. + // It is different from the /history endpoints row_id. + transfer_id: SafeUint64; + + // Status of the transfer at this time + // pending: the transfer is in progress + // transient_failure: the transfer has failed but may succeed later + // permanent_failure: the transfer has failed permanently and will never appear in the outgoing history + // success: the transfer has succeeded and appears in the outgoing history + status: "pending" | "transient_failure" | "permanent_failure" | "success"; + + // Timestamp that indicates when this status was reached. + timestamp: Timestamp; + } + +Test Plan +========= + + +Alternatives +============ + + +Drawbacks +========= + + +Discussion / Q&A +================ diff --git a/design-documents/index.rst b/design-documents/index.rst @@ -95,4 +95,6 @@ Design documents that start with "XX" are considered deprecated. 081-shop-discovery 082-wallet-diagnostics 083-wallet-initiated-withdrawal + 084-simple-observability + 085-transfer-status 999-template