taler-docs

Documentation for GNU Taler components, APIs and protocols
Log | Files | Refs | README | LICENSE

commit b0687b27250fd57014c7146b7944c9d1ab69f1ec
parent 70156edcae310206bd812528b1373ab6c7328aea
Author: Christian Grothoff <grothoff@gnunet.org>
Date:   Wed, 26 Mar 2025 21:10:12 +0900

starting text for DD59

Diffstat:
Mdesign-documents/059-statistics.rst | 145++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 144 insertions(+), 1 deletion(-)

diff --git a/design-documents/059-statistics.rst b/design-documents/059-statistics.rst @@ -57,6 +57,149 @@ Requirements Proposed Solution ================= +At a high-level, we use SQL TRIGGERs to update statistics on-the-fly whenever +an INSERT or UPDATE changes values in the database that would affect the +statistics we are tracking. The statistics themselves are also stored in the +database. Additionally, we also store meta-data in the database that +determines the granularity of the statistics to keep. This way, the SQL +TRIGGERs can be generic. + + +Tables +------ + +The schema consists of several tables: + +- **statistic_bucket_meta**: + meta data for statistics we track in buckets, including + a slug used as the main key to identify the statistic, + the data type (amount or number), and a pair of + range of the bucket (year, quarter, month, etc.) + and how many ages (generations) of that bucket to keep. + Basically, now minus age times range tells us how far + into the past statistics are kept. +- **statistic_interval_meta**: + meta data used for statistics we keep by time interval, + with again a slug and data type, but then pairs + consisting of a number of seconds for the range + and a precision. A range of 60s implies that we + count the events of the last 60s in the interval. + Usually multiple ranges are given, like 60s, 120s + and 180s to keep events for the last one, two and + three minutes. Note that in the database the data + kept for the 120s range excludes the data from the + first 60s, and the correct amount for any interval + is computed on-the-fly by adding up the amounts for + the smaller intervals. A precision of 5s means that + event timestamps are normalized (and rounded down) + to multiples of 5s and thus events at 64s may still + be counted for the 60s interval. Note that both + intervals and precisions must be monotonically + increasing in their respective arrays. +- **statistic_bucket_counter**: contains the current + numeric (non-amount) value for a particular bucket + (given by starting point and range) and object; + objects in the exchange are (normalized) payto + hashes identifying bank accounts, while objects for + the merchant backend are the instances of the merchant. +- **statistic_bucket_amount**: same as the counter + bucket, except for an amount; in the case of the + merchant the *currency* is an additional dimension + as amounts are tracked per currency. +- **statistic_counter_event**: represents the sum of + all (counter) events at a particular time-*slot* in + time, again given per object; the *slot* is the + time of the event rounded to the *presision* of the + interval the event falls into. +- **statistic_amount_event**: same as the counter + event, except for amounts. +- **statistic_interval_counter**: cumulative value + of the counter for the given range; additionally + includes the unique ID of the oldest event that + is included in the counter (as the range is given + as a relative time, we need to track which events + are included). +- **statistic_interval_amount**: same as the + interval counter, except for amounts. + +Stored procedures +----------------- + +We additionally provide a few stored procedures to +update the statistics. These are: + +* **bump_number_bucket_stat** (slug,object,timestamp,delta): + increases the *slug* bucket counter of *object* at *timestamp* + by *delta*. Does nothing if *slug* is not in the + bucket meta table or if *timestamp* is past the range + of buckets we are currently tracking; +* **bump_amount_bucket_stat** (slug,object,timestamp,amount): + similar to the above, just for statistics of type amount; +* **bump_number_interval_stat** (slug,object,timestamp,delta): + increases the *slug* interval counter of *object* at + *timestamp* by *delta*; +* **bump_amount_interval_stat** (slug,object,timestamp,amount): + similar to the above, just for statistics of type amount; +* **bump_number_stat** (slug, object, timestamp, delta): + increases both bucket and interval counters for *slug* + of *object* at *timestamp* by *delta*; +* **bump_amount_stat** (slug, object, timestamp, delta): + similar to the above, just for statistics of type amount; + +We furthermore provide a few stored procedures to +access the interval statistics (while making sure +they are current). These are: + +* **statistic_interval_number_get** (slug, object): + returns all non-zero counters and time intervals that + we are tracking for the given *slug* and *object*; + the intervals are updated and events are possibly + discarded or coarsened (if numeric value of the + precision for larger intervals is larger and thus + allows for more rounding); +* **statistic_interval_amount_get** (slug, object): + returns all non-zero amounts and time intervals that + we are tracking for the given *slug* and *object*; + the intervals are updated and events are possibly + discarded or coarsened (if numeric value of the + precision for larger intervals is larger and thus + allows for more rounding); + +Finally, there are some helpers for cleaning up: + +* **statistic_bucket_gc** (): + Removes buckets and events past the range for which + we track statistics; also coarsens events to the + precision of the range into which they are falling. +* **exchange_drop_customization** (schema): + special function for the exchange which allows dropping + customizations by *schema*. + +Schema +------ + +For the exchange, all triggers (and associated stored +procedures) should be stored in one or more payment +service provider specific schema. Furthermore, the name +of the schema MUST be provided in the *schema* column of +the meta data table entries. This way, all triggers, +stored procedures and statistics can be removed simply +by DROPing the SCHEMA and removing the associated +entries from the meta data tables (the foreign key +constraints then CASCADE and clean up the statistics +themselves). + +The SCHEMA name must also be used as the prefix +for the SQL files that inject the triggers. The +usual database versioning should also be used, +except this time using the SCHEMA name instead of +"exchange-". "exchange" is not allowed as a name +for customization SCHEMA. + +**taler-exchange-dbinit** is extended with command line arguments to load the +latest version of a customization schema or to call +**exchange_drop_customization** to remove one. + Definition of Done @@ -85,7 +228,7 @@ Drawbacks - ongoing baseline cost for statistics even if nobody looks at them (but only if the respective statistic is enabled) - + Discussion / Q&A ================