taler-docs

Documentation for GNU Taler components, APIs and protocols
Log | Files | Refs | README | LICENSE

011-auditor-db-sync.rst (9237B)


      1 DD 11: Auditor-Exchange Database Synchronization
      2 ################################################
      3 
      4 Summary
      5 =======
      6 
      7 Ways for the auditor to obtain a current copy of the exchange database (to
      8 verify that the exchange is operating correctly) are discussed.
      9 
     10 
     11 Motivation
     12 ==========
     13 
     14 The Taler auditor is expected to check that the exchange is operating
     15 correctly.  For this, it needs (read-only) access to the exchange database and
     16 to the bank account of the exchange.  Bank account access is a matter of
     17 setting up an additional user with limited rights with the bank and is out of
     18 the scope of this document.  For database access, the auditor should not trust
     19 the exchange. In particular, the auditor must assume that the exchange may
     20 violate basic database constraints (like foreign keys) or delete/alter records
     21 maliciously. However, we also do not want to complicate the auditor logic to
     22 cope with with violations of well-formedness constraints (like foreign keys or
     23 non-NULL values or size constraints on fields).  Finally, the mechanism by
     24 which the auditor obtains the database must provide a reasonably current
     25 database and the process must perform reasonably well.
     26 
     27 
     28 Requirements
     29 ============
     30 
     31 * The solution must allow data to be copied incrementally.
     32 * The solution must tolerate network outages and recover after connectivity
     33   between exchange and auditor is restored.
     34 * The solution must enable the auditor database to serve as a full backup
     35   of the exchange's database (even if possibly slightly outdated due to
     36   asynchronous replication or network outages).
     37 * The solution must scale, in particular if the exchange shards the database,
     38   the auditor must be also able to use the same kind of sharding and the
     39   synchronization should be possible per shard.
     40 * The synchronization mechanism must not allow an attacker controlling the
     41   exchange database to delete or modify arbitrary data from the auditor's copy
     42   via the synchronization mechanism (in other words, some tables are
     43   append-only and unalterable).
     44 * The solution must support database schema updates. Those may require some
     45   downtime and closely coordinated work between exchange and auditor.
     46 * The solution must enable eventual garbage collection at the exchange to
     47   be permitted and replicated at the auditor (e.g. DELETE on usually append-only
     48   tables due to a CASCADE from expired denomination keys).
     49 * The synchronization mechanism should raise an alert if the exchange violates basic
     50   constraints (unexpected schema changes, deletion/motification on append-only
     51   tables) and then NOT replicate those changes. The auditor's internal asynchronous
     52   helper may then soft-fail (log and exit) until the exchange has rectified the
     53   problem (by manual, human intervention resulting in an exchange master database
     54   that again maintains the required invariants).
     55   After the corrected master database has been again synchronized with the
     56   primary copy of the auditor, the auditor's helper is resumed and can continue to
     57   copy the (now valid) database records into the auditor's internal version.
     58 * A good solution would work independently of the specific database used.
     59 
     60 
     61 Proposed Solution
     62 =================
     63 
     64 * Use "common" incremental database replication (whichever is
     65   appropriate for the exchange database setup, synchronous
     66   or asynchronous) to make a 1:1 copy of the exchange database
     67   at the auditor.  This should work for any full-featured
     68   modern database. This "ingress" copy cannot be trusted, as constraint
     69   violations or deletions would also be replicated.
     70 * Use helper process to SELECT against the local "ingress" copy (by
     71   SERIAL ID => make sure all append-only tables have one!)
     72   to copy append-only tables to a second, "trusted" and fully
     73   auditor-controlled copy of the database.
     74   Order (or transactionally group) SELECT
     75   statements to ensure foreign key constraints are maintained.
     76   For mutable tables (basically, only current reserve balance)
     77   do not make another copy, but do have logic to recompute mutable
     78   tables from other data *if* we need to recover from backup.
     79 * On schema migration, halt exchange, once auditor DB has
     80   synchronized, update all DB schema (the "ingress" DB schema
     81   may be updated automatically when the exchange DB schema is
     82   migrated, but the "trusted" DB of the auditor must most likely
     83   be manually migrated), then finally resume "ingress" to "trusted"
     84   helper-based DB synchronization and restart the exchange.
     85 * For GC, simply run GC logic also on auditor's "trusted" copy.
     86   (The synchronization mechanism will take care of the primary copy,
     87   and the helper to copy should not be disturbed by the DELETE operations
     88   anyway.)
     89 * The auditor's "ingress" database should be well isolated from
     90   the rest of the auditor's system and database
     91   (different user accounts). The reason is that we should not
     92   assume that the PostgreSQL replication code is battle-tested with
     93   malicious parties in mind.
     94 * The canonical PostgreSQL synchronization between exchange and the
     95   auditor's "ingress" database must use transport security.
     96 
     97 The above solution does not gracefully handle mutable tables on which
     98 the exchange performs UPDATE statements, as such updates will not bump
     99 the BIGSERIAL and thus would not be replicated by the helper.  Thus, we
    100 need to consider all tables that the exchange ever performs UPDATE on.
    101 Those are:
    102 
    103 * /reserves/ --- the exchange updates the remaining reserve balance;
    104   here the auditor currently performs a sanity check against
    105   its own reserve balance calculation.  The proposed way to address
    106   this is to make this sanity check optional and to be only used if
    107   the auditor auditor runs against the "primary" exchange database
    108   (like an internal audit).  This is acceptable, as an inaccurate
    109   reserve balance is mostly used to raise an early warning and not
    110   indicative of any actualized financial gains or losses from the
    111   exchange.
    112 * /deposits/ --- the exchange updates the /tiny/ and /done/ bit
    113   fields. /tiny/ can be trivially established by the auditor, and
    114   we can simply avoid the auditor considering that bit. /done/
    115   was so far only used to enrich the reporting.  The proposed way
    116   to address the uses of both fields is thus to only use them in
    117   internal audits (against the primary exchange database). Both
    118   can be safely ignored by the external audit.
    119 * /prewire/ --- the exchange updates the /finished/ and /failed/
    120   bits. The entire table is not used by the auditor and its
    121   main values cannot be validated by the auditor anyway.
    122 * /auditors/ --- the exchange updates the /is_active/ and /last_change/
    123   fields.  The entire table is of no concern to the auditor.
    124 
    125 A good order for replicating the tables should be:
    126 
    127 * exchange_sign_keys
    128 * signkey_revocations
    129 * auditors
    130 * denominations
    131 * denomination_revocations
    132 * auditor_denom_sigs
    133 * reserves
    134 * reserves_out
    135 * reserves_in
    136 * reserves_close
    137 * known_coins
    138 * deposits
    139 * refunds
    140 * refresh_commitments
    141 * refresh_transfer_keys
    142 * refresh_revealed_coins
    143 * recoup_refresh
    144 * recoup
    145 
    146 
    147 
    148 Alternatives
    149 ============
    150 
    151 * Copy the PostgreSQL WAL, filter it for "illegal" operations
    152   and then apply it at the auditor end.  Disadvantages: WAL
    153   filtering is not a common operation (format documented?),
    154   this would be highly PostgreSQL-specific, and would require
    155   complex work to write the filter.  Also unsure how one
    156   could later recover gracefully from transient errors
    157   (say where the exchange recified a bogus DELETE).
    158 * Directly SELECT against the (remote) exchange DB and then
    159   INSERT/UPDATE at the auditor's local copy. Disadvantages:
    160   remote SELECT likely very expensive due to high latency.
    161   Diagnostics more difficult. May expose exchange to additional
    162   risks from auditor, such as attacks exhausting DB resources
    163   by running expensive SELECTs.
    164 
    165 
    166 
    167 
    168 Drawbacks
    169 =========
    170 
    171 * SERIAL IDs required in all tables that are "append-only" / immutable.
    172 * Additional custom logic required to recompute mutable tables
    173   on-demand.
    174 * Limited ability to cope with mutable tables, imposes restrictions
    175   on future exchange database evolution.
    176 * Helper logic to SELECT data in batches that will certainly
    177   maintain invariants may be a bit tricky, but in principle
    178   the foreign key constraints should form a DAG, simply dictating
    179   the order in which new entries are to be copied.  It may also
    180   be that simply running "big" transactions across all tables
    181   is the answer, to be investigated what performs better.
    182 * A malicious exchange could theoretically send expensive transactions
    183   to the auditor via the replication mechanism (possibly ones that
    184   it did not even execute locally itself) to DoS the "ingress"
    185   database. This would be noticed primarily by load
    186   monitoring or even the auditor lagging unusually far behind the
    187   exchange's transaction history. We believe this is acceptable,
    188   as it would imply highly visible malicious exchange behavior for
    189   virtually no significant gain.
    190 * The proposed solution does not create a transactional, synchronous
    191   write-only log as suggested by CodeBlau (see audit report, Section 9.4).
    192   We believe doing so would be overly costly, both in terms of
    193   complexity and performance, for limited gains.
    194 
    195 
    196 Discussion / Q&A
    197 ================