Design Doc 011: Auditor-Exchange Database Synchronization
Ways for the auditor to obtain a current copy of the exchange database (to
verify that the exchange is operating correctly) are discussed.
The Taler auditor is expected to check that the exchange is operating
correctly. For this, it needs (read-only) access to the exchange database and
to the bank account of the exchange. Bank account access is a matter of
setting up an additional user with limited rights with the bank and is out of
the scope of this document. For database access, the auditor should not trust
the exchange. In particular, the auditor must assume that the exchange may
violate basic database constraints (like foreign keys) or delete/alter records
maliciously. However, we also do not want to complicate the auditor logic to
cope with with violations of well-formedness constraints (like foreign keys or
non-NULL values or size constraints on fields). Finally, the mechanism by
which the auditor obtains the database must provide a reasonably current
database and the process must perform reasonably well.
* The solution must allow data to be copied incrementally.
* The solution must tolerate network outages and recover after connectivity
between exchange and auditor is restored.
* The solution must enable the auditor database to serve as a full backup
of the exchange's database (even if possibly slightly outdated due to
asynchronous replication or network outages).
* The solution must scale, in particular if the exchange shards the database,
the auditor must be also able to use the same kind of sharding and the
synchronization should be possible per shard.
* The synchronization mechanism must not allow an attacker controlling the
exchange database to delete or modify arbitrary data from the auditor's copy
via the synchronization mechanism (in other words, some tables are
append-only and unalterable).
* The solution must support database schema updates. Those may require some
downtime and closely coordinated work between exchange and auditor.
* The solution must enable eventual garbage collection at the exchange to
be permitted and replicated at the auditor (e.g. DELETE on usually append-only
tables due to a CASCADE from expired denomination keys).
* The synchronization mechanism should raise an alert if the exchange violates basic
constraints (unexpected schema changes, deletion/motification on append-only
tables) and then NOT replicate those changes. It may then soft-fail until the
exchange has rectified the problem.
* A good solution would work independently of the specific database used.
* Use "common" incremental database replication (whichever is
approproate for the exchange database setup, synchronous
or asynchronous) to make a 1:1 copy of the exchange database
at the auditor. This should work for any full-featured
modern database. The copy cannot be trusted, as constraint
violations or deletions would also be replicated.
* Use helper process to SELECT against the local copy (by
SERIAL ID => make sure all append-only tables have one!)
to copy append-only tables to 2nd auditor-controlled copy
of the database. Order (or transactionally group) SELECT
statements to ensure foreign key constraints are maintained.
For mutable tables (basically, only current reserve balance)
do not make another copy, but do have logic to recompute mutable
tables from other data *if* we need to recover from backup.
* On schema migration, halt exchange, once auditor DB has
synchronized, update all DB schema, then resume DB synchronization
and then restart exchange.
* For GC, simply run GC logic also on auditor's "secure" copy.
(The synchronization mechanism will take care of the primary copy,
and the helper to copy should not be disturbed by the DELETE operations
* Copy the Postgres WAL, filter it for "illegal" operations
and then apply it at the auditor end. Disadvantages: WAL
filtering is not a common operation (format documented?),
this would be highly Postgres-specific, and would require
complex work to write the filter. Also unsure how one
could later recover gracefully from transient errors
(say where the exchange recified a bogus DELETE).
* Directly SELECT against the (remote) exchange DB and then
INSERT/UPDATE at the auditor's local copy. Disadvantages:
remote SELECT likely very expensive due to high latency.
Diagnostics more difficult. May expose exchange to additional
risks from auditor, such as attacks exhausting DB resources
by running expensive SELECTs.
* SERIAL IDs required in all tables that are "append-only" / immutable.
* Additional custom logic required to recompute mutable tables
* Limited ability to cope with mutable tables, imposes restrictions
on future exchange database evolution.
* Helper logic to SELECT data in batches that will certainly
maintain invariants may be a bit tricky, but in principle
the foreign key constraints should form a DAG, simply dictating
the order in which new entries are to be copied. It may also
be that simply running "big" transactions across all tables
is the answer, to be investigated what performs better.
Discussion / Q&A