taler-docs

Documentation for GNU Taler components, APIs and protocols
Log | Files | Refs | README | LICENSE

commit 79950811a669e6375ad0365ee88e258b2ec36595
parent 0112d5bd294c9255ff05e5ccb9d6b5b04a5b7958
Author: Christian Grothoff <grothoff@gnunet.org>
Date:   Wed, 26 Mar 2025 15:54:29 +0900

starting text for DD59

Diffstat:
Mdesign-documents/059-statistics.rst | 67++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 64 insertions(+), 3 deletions(-)

diff --git a/design-documents/059-statistics.rst b/design-documents/059-statistics.rst @@ -4,28 +4,89 @@ DD 59: Statistics Summary ======= +This design document elaborates how we track various statistics +in the exchange and merchant, typically for tax reporting or +to detect anomalies to be investigated by anti-money laundering +officers. The key idea is to use SQL triggers to keep the +statistics always up-to-date and a bit of garbage collection +to expire ancient statistics. Finally, deployment-specific +statistics can easily be added this way by simply injecting +the correct SQL code into the backend, without having to modify +the core exchange or merchant logic. + + Motivation ========== +Exchange operators are required to monitor for suspicious +transactions as part of their AML efforts. Merchants need to +collect certain data for their business, especially for tax +purposes but also conceivably to analyze sales. The specific +data to be tracked varies by operator (and legislation), so +we need to be quite flexible in terms of which statistics +should be kept, especially to minimize the performance impact. + + Requirements ============ +- statistics should always be up-to-date (in real-time) and + not only be updated in batches +- some statistics are amounts, others are simple numerical + (integer) values +- some statistics need to be kept over a sliding interval that + moves over time, while others need to be mapped to fixed + buckets such as a day, month, quarter or year. +- which statistics are being tracked may depend on the + operational context, especially for the exchange; it must + thus be easy to add (or remove) statistics at any time; +- while tracking statistics inherently costs performance the + runtime (CPU and storage) overhead should be minimized; + in particular for "sliding intervals", the events that + "slide out" of the interval may be coarsened and do not + necessarily require accuracy down to the second; +- when adding new statistics, it may be desirable to compute + them retroactivey over historic data (if available); +- the SPAs displaying statistics should not have to make an + excessive number of REST API calls, so generally multiple + values should be returned from a single endpoint; +- the merchant is multi-currency capable, and thus amount-valued + statistics in the merchant backend should be kept per currency + + Proposed Solution ================= + + Definition of Done ================== -(Only applicable to design documents that describe a new feature. While the -DoD is not satisfied yet, a user-facing feature **must** be behind a feature -flag or dev-mode flag.) +- key statistics for merchant and TOPS-deployment implemented +- REST API for merchant specified and implemented +- REST API for AML officer specified and implemented +- SPAs visualize key statistics Alternatives ============ +- batch processing to compute statistics, REST API only + returns the value computed by the last batch (computational + cost only paid if statistic is desired, but then may be high, + also potential for outdated data being shown); +- computing of statistics on the C side; may have more data + easily available, but has the major disadvantage of making + it harder to add/remove statistics and makes the transaction + logic more complex; also easier to miss triggering events; + + Drawbacks ========= +- ongoing baseline cost for statistics even if nobody looks + at them (but only if the respective statistic is enabled) + + Discussion / Q&A ================