commit 79950811a669e6375ad0365ee88e258b2ec36595
parent 0112d5bd294c9255ff05e5ccb9d6b5b04a5b7958
Author: Christian Grothoff <grothoff@gnunet.org>
Date: Wed, 26 Mar 2025 15:54:29 +0900
starting text for DD59
Diffstat:
1 file changed, 64 insertions(+), 3 deletions(-)
diff --git a/design-documents/059-statistics.rst b/design-documents/059-statistics.rst
@@ -4,28 +4,89 @@ DD 59: Statistics
Summary
=======
+This design document elaborates how we track various statistics
+in the exchange and merchant, typically for tax reporting or
+to detect anomalies to be investigated by anti-money laundering
+officers. The key idea is to use SQL triggers to keep the
+statistics always up-to-date and a bit of garbage collection
+to expire ancient statistics. Finally, deployment-specific
+statistics can easily be added this way by simply injecting
+the correct SQL code into the backend, without having to modify
+the core exchange or merchant logic.
+
+
Motivation
==========
+Exchange operators are required to monitor for suspicious
+transactions as part of their AML efforts. Merchants need to
+collect certain data for their business, especially for tax
+purposes but also conceivably to analyze sales. The specific
+data to be tracked varies by operator (and legislation), so
+we need to be quite flexible in terms of which statistics
+should be kept, especially to minimize the performance impact.
+
+
Requirements
============
+- statistics should always be up-to-date (in real-time) and
+ not only be updated in batches
+- some statistics are amounts, others are simple numerical
+ (integer) values
+- some statistics need to be kept over a sliding interval that
+ moves over time, while others need to be mapped to fixed
+ buckets such as a day, month, quarter or year.
+- which statistics are being tracked may depend on the
+ operational context, especially for the exchange; it must
+ thus be easy to add (or remove) statistics at any time;
+- while tracking statistics inherently costs performance the
+ runtime (CPU and storage) overhead should be minimized;
+ in particular for "sliding intervals", the events that
+ "slide out" of the interval may be coarsened and do not
+ necessarily require accuracy down to the second;
+- when adding new statistics, it may be desirable to compute
+ them retroactivey over historic data (if available);
+- the SPAs displaying statistics should not have to make an
+ excessive number of REST API calls, so generally multiple
+ values should be returned from a single endpoint;
+- the merchant is multi-currency capable, and thus amount-valued
+ statistics in the merchant backend should be kept per currency
+
+
Proposed Solution
=================
+
+
Definition of Done
==================
-(Only applicable to design documents that describe a new feature. While the
-DoD is not satisfied yet, a user-facing feature **must** be behind a feature
-flag or dev-mode flag.)
+- key statistics for merchant and TOPS-deployment implemented
+- REST API for merchant specified and implemented
+- REST API for AML officer specified and implemented
+- SPAs visualize key statistics
Alternatives
============
+- batch processing to compute statistics, REST API only
+ returns the value computed by the last batch (computational
+ cost only paid if statistic is desired, but then may be high,
+ also potential for outdated data being shown);
+- computing of statistics on the C side; may have more data
+ easily available, but has the major disadvantage of making
+ it harder to add/remove statistics and makes the transaction
+ logic more complex; also easier to miss triggering events;
+
+
Drawbacks
=========
+- ongoing baseline cost for statistics even if nobody looks
+ at them (but only if the respective statistic is enabled)
+
+
Discussion / Q&A
================