starting text for DD59 - taler-docs - Documentation for GNU Taler components, APIs and protocols

commit b0687b27250fd57014c7146b7944c9d1ab69f1ec
parent 70156edcae310206bd812528b1373ab6c7328aea
Author: Christian Grothoff <grothoff@gnunet.org>
Date:   Wed, 26 Mar 2025 21:10:12 +0900

starting text for DD59

Diffstat:
M design-documents/059-statistics.rst  | 145 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-

1 file changed, 144 insertions(+), 1 deletion(-)
diff --git a/design-documents/059-statistics.rst b/design-documents/059-statistics.rst
@@ -57,6 +57,149 @@ Requirements
 Proposed Solution
 =================
 
+At a high-level, we use SQL TRIGGERs to update statistics on-the-fly whenever
+an INSERT or UPDATE changes values in the database that would affect the
+statistics we are tracking.  The statistics themselves are also stored in the
+database. Additionally, we also store meta-data in the database that
+determines the granularity of the statistics to keep. This way, the SQL
+TRIGGERs can be generic.
+
+
+Tables
+------
+
+The schema consists of several tables:
+
+- **statistic_bucket_meta**:
+  meta data for statistics we track in buckets, including
+  a slug used as the main key to identify the statistic,
+  the data type (amount or number), and a pair of
+  range of the bucket (year, quarter, month, etc.)
+  and how many ages (generations) of that bucket to keep.
+  Basically, now minus age times range tells us how far
+  into the past statistics are kept.
+- **statistic_interval_meta**:
+  meta data used for statistics we keep by time interval,
+  with again a slug and data type, but then pairs
+  consisting of a number of seconds for the range
+  and a precision.  A range of 60s implies that we
+  count the events of the last 60s in the interval.
+  Usually multiple ranges are given, like 60s, 120s
+  and 180s to keep events for the last one, two and
+  three minutes. Note that in the database the data
+  kept for the 120s range excludes the data from the
+  first 60s, and the correct amount for any interval
+  is computed on-the-fly by adding up the amounts for
+  the smaller intervals. A precision of 5s means that
+  event timestamps are normalized (and rounded down)
+  to multiples of 5s and thus events at 64s may still
+  be counted for the 60s interval.  Note that both
+  intervals and precisions must be monotonically
+  increasing in their respective arrays.
+- **statistic_bucket_counter**: contains the current
+  numeric (non-amount) value for a particular bucket
+  (given by starting point and range) and object;
+  objects in the exchange are (normalized) payto
+  hashes identifying bank accounts, while objects for
+  the merchant backend are the instances of the merchant.
+- **statistic_bucket_amount**: same as the counter
+  bucket, except for an amount; in the case of the
+  merchant the *currency* is an additional dimension
+  as amounts are tracked per currency.
+- **statistic_counter_event**: represents the sum of
+  all (counter) events at a particular time-*slot* in
+  time, again given per object; the *slot* is the
+  time of the event rounded to the *presision* of the
+  interval the event falls into.
+- **statistic_amount_event**: same as the counter
+  event, except for amounts.
+- **statistic_interval_counter**: cumulative value
+  of the counter for the given range; additionally
+  includes the unique ID of the oldest event that
+  is included in the counter (as the range is given
+  as a relative time, we need to track which events
+  are included).
+- **statistic_interval_amount**: same as the
+  interval counter, except for amounts.
+
+Stored procedures
+-----------------
+
+We additionally provide a few stored procedures to
+update the statistics.  These are:
+
+* **bump_number_bucket_stat** (slug,object,timestamp,delta):
+  increases the *slug* bucket counter of *object* at *timestamp*
+  by *delta*.  Does nothing if *slug* is not in the
+  bucket meta table or if *timestamp* is past the range
+  of buckets we are currently tracking;
+* **bump_amount_bucket_stat** (slug,object,timestamp,amount):
+  similar to the above, just for statistics of type amount;
+* **bump_number_interval_stat** (slug,object,timestamp,delta):
+  increases the *slug* interval counter of *object* at
+  *timestamp* by *delta*;
+* **bump_amount_interval_stat** (slug,object,timestamp,amount):
+  similar to the above, just for statistics of type amount;
+* **bump_number_stat** (slug, object, timestamp, delta):
+  increases both bucket and interval counters for *slug*
+  of *object* at *timestamp* by *delta*;
+* **bump_amount_stat** (slug, object, timestamp, delta):
+  similar to the above, just for statistics of type amount;
+
+We furthermore provide a few stored procedures to
+access the interval statistics (while making sure
+they are current).  These are:
+
+* **statistic_interval_number_get** (slug, object):
+  returns all non-zero counters and time intervals that
+  we are tracking for the given *slug* and *object*;
+  the intervals are updated and events are possibly
+  discarded or coarsened (if numeric value of the
+  precision for larger intervals is larger and thus
+  allows for more rounding); 
+* **statistic_interval_amount_get** (slug, object):
+  returns all non-zero amounts and time intervals that
+  we are tracking for the given *slug* and *object*;
+  the intervals are updated and events are possibly
+  discarded or coarsened (if numeric value of the
+  precision for larger intervals is larger and thus
+  allows for more rounding); 
+
+Finally, there are some helpers for cleaning up:
+
+* **statistic_bucket_gc** ():
+  Removes buckets and events past the range for which
+  we track statistics; also coarsens events to the
+  precision of the range into which they are falling.
+* **exchange_drop_customization** (schema):
+  special function for the exchange which allows dropping
+  customizations by *schema*.
+
+Schema
+------
+
+For the exchange, all triggers (and associated stored
+procedures) should be stored in one or more payment
+service provider specific schema.  Furthermore, the name
+of the schema MUST be provided in the *schema* column of
+the meta data table entries.  This way, all triggers,
+stored procedures and statistics can be removed simply
+by DROPing the SCHEMA and removing the associated
+entries from the meta data tables (the foreign key
+constraints then CASCADE and clean up the statistics
+themselves).
+
+The SCHEMA name must also be used as the prefix
+for the SQL files that inject the triggers. The
+usual database versioning should also be used,
+except this time using the SCHEMA name instead of
+"exchange-". "exchange" is not allowed as a name
+for customization SCHEMA.
+
+**taler-exchange-dbinit** is extended with command line arguments to load the
+latest version of a customization schema or to call
+**exchange_drop_customization** to remove one.
+
 
 
 Definition of Done
@@ -85,7 +228,7 @@ Drawbacks
 
 - ongoing baseline cost for statistics even if nobody looks
   at them (but only if the respective statistic is enabled)
-  
+
 
 Discussion / Q&A
 ================

	taler-docs Documentation for GNU Taler components, APIs and protocols
	Log \| Files \| Refs \| README \| LICENSE