commit b0687b27250fd57014c7146b7944c9d1ab69f1ec
parent 70156edcae310206bd812528b1373ab6c7328aea
Author: Christian Grothoff <grothoff@gnunet.org>
Date: Wed, 26 Mar 2025 21:10:12 +0900
starting text for DD59
Diffstat:
1 file changed, 144 insertions(+), 1 deletion(-)
diff --git a/design-documents/059-statistics.rst b/design-documents/059-statistics.rst
@@ -57,6 +57,149 @@ Requirements
Proposed Solution
=================
+At a high-level, we use SQL TRIGGERs to update statistics on-the-fly whenever
+an INSERT or UPDATE changes values in the database that would affect the
+statistics we are tracking. The statistics themselves are also stored in the
+database. Additionally, we also store meta-data in the database that
+determines the granularity of the statistics to keep. This way, the SQL
+TRIGGERs can be generic.
+
+
+Tables
+------
+
+The schema consists of several tables:
+
+- **statistic_bucket_meta**:
+ meta data for statistics we track in buckets, including
+ a slug used as the main key to identify the statistic,
+ the data type (amount or number), and a pair of
+ range of the bucket (year, quarter, month, etc.)
+ and how many ages (generations) of that bucket to keep.
+ Basically, now minus age times range tells us how far
+ into the past statistics are kept.
+- **statistic_interval_meta**:
+ meta data used for statistics we keep by time interval,
+ with again a slug and data type, but then pairs
+ consisting of a number of seconds for the range
+ and a precision. A range of 60s implies that we
+ count the events of the last 60s in the interval.
+ Usually multiple ranges are given, like 60s, 120s
+ and 180s to keep events for the last one, two and
+ three minutes. Note that in the database the data
+ kept for the 120s range excludes the data from the
+ first 60s, and the correct amount for any interval
+ is computed on-the-fly by adding up the amounts for
+ the smaller intervals. A precision of 5s means that
+ event timestamps are normalized (and rounded down)
+ to multiples of 5s and thus events at 64s may still
+ be counted for the 60s interval. Note that both
+ intervals and precisions must be monotonically
+ increasing in their respective arrays.
+- **statistic_bucket_counter**: contains the current
+ numeric (non-amount) value for a particular bucket
+ (given by starting point and range) and object;
+ objects in the exchange are (normalized) payto
+ hashes identifying bank accounts, while objects for
+ the merchant backend are the instances of the merchant.
+- **statistic_bucket_amount**: same as the counter
+ bucket, except for an amount; in the case of the
+ merchant the *currency* is an additional dimension
+ as amounts are tracked per currency.
+- **statistic_counter_event**: represents the sum of
+ all (counter) events at a particular time-*slot* in
+ time, again given per object; the *slot* is the
+ time of the event rounded to the *presision* of the
+ interval the event falls into.
+- **statistic_amount_event**: same as the counter
+ event, except for amounts.
+- **statistic_interval_counter**: cumulative value
+ of the counter for the given range; additionally
+ includes the unique ID of the oldest event that
+ is included in the counter (as the range is given
+ as a relative time, we need to track which events
+ are included).
+- **statistic_interval_amount**: same as the
+ interval counter, except for amounts.
+
+Stored procedures
+-----------------
+
+We additionally provide a few stored procedures to
+update the statistics. These are:
+
+* **bump_number_bucket_stat** (slug,object,timestamp,delta):
+ increases the *slug* bucket counter of *object* at *timestamp*
+ by *delta*. Does nothing if *slug* is not in the
+ bucket meta table or if *timestamp* is past the range
+ of buckets we are currently tracking;
+* **bump_amount_bucket_stat** (slug,object,timestamp,amount):
+ similar to the above, just for statistics of type amount;
+* **bump_number_interval_stat** (slug,object,timestamp,delta):
+ increases the *slug* interval counter of *object* at
+ *timestamp* by *delta*;
+* **bump_amount_interval_stat** (slug,object,timestamp,amount):
+ similar to the above, just for statistics of type amount;
+* **bump_number_stat** (slug, object, timestamp, delta):
+ increases both bucket and interval counters for *slug*
+ of *object* at *timestamp* by *delta*;
+* **bump_amount_stat** (slug, object, timestamp, delta):
+ similar to the above, just for statistics of type amount;
+
+We furthermore provide a few stored procedures to
+access the interval statistics (while making sure
+they are current). These are:
+
+* **statistic_interval_number_get** (slug, object):
+ returns all non-zero counters and time intervals that
+ we are tracking for the given *slug* and *object*;
+ the intervals are updated and events are possibly
+ discarded or coarsened (if numeric value of the
+ precision for larger intervals is larger and thus
+ allows for more rounding);
+* **statistic_interval_amount_get** (slug, object):
+ returns all non-zero amounts and time intervals that
+ we are tracking for the given *slug* and *object*;
+ the intervals are updated and events are possibly
+ discarded or coarsened (if numeric value of the
+ precision for larger intervals is larger and thus
+ allows for more rounding);
+
+Finally, there are some helpers for cleaning up:
+
+* **statistic_bucket_gc** ():
+ Removes buckets and events past the range for which
+ we track statistics; also coarsens events to the
+ precision of the range into which they are falling.
+* **exchange_drop_customization** (schema):
+ special function for the exchange which allows dropping
+ customizations by *schema*.
+
+Schema
+------
+
+For the exchange, all triggers (and associated stored
+procedures) should be stored in one or more payment
+service provider specific schema. Furthermore, the name
+of the schema MUST be provided in the *schema* column of
+the meta data table entries. This way, all triggers,
+stored procedures and statistics can be removed simply
+by DROPing the SCHEMA and removing the associated
+entries from the meta data tables (the foreign key
+constraints then CASCADE and clean up the statistics
+themselves).
+
+The SCHEMA name must also be used as the prefix
+for the SQL files that inject the triggers. The
+usual database versioning should also be used,
+except this time using the SCHEMA name instead of
+"exchange-". "exchange" is not allowed as a name
+for customization SCHEMA.
+
+**taler-exchange-dbinit** is extended with command line arguments to load the
+latest version of a customization schema or to call
+**exchange_drop_customization** to remove one.
+
Definition of Done
@@ -85,7 +228,7 @@ Drawbacks
- ongoing baseline cost for statistics even if nobody looks
at them (but only if the respective statistic is enabled)
-
+
Discussion / Q&A
================