011-auditor-db-sync.rst (9237B)
1 DD 11: Auditor-Exchange Database Synchronization 2 ################################################ 3 4 Summary 5 ======= 6 7 Ways for the auditor to obtain a current copy of the exchange database (to 8 verify that the exchange is operating correctly) are discussed. 9 10 11 Motivation 12 ========== 13 14 The Taler auditor is expected to check that the exchange is operating 15 correctly. For this, it needs (read-only) access to the exchange database and 16 to the bank account of the exchange. Bank account access is a matter of 17 setting up an additional user with limited rights with the bank and is out of 18 the scope of this document. For database access, the auditor should not trust 19 the exchange. In particular, the auditor must assume that the exchange may 20 violate basic database constraints (like foreign keys) or delete/alter records 21 maliciously. However, we also do not want to complicate the auditor logic to 22 cope with with violations of well-formedness constraints (like foreign keys or 23 non-NULL values or size constraints on fields). Finally, the mechanism by 24 which the auditor obtains the database must provide a reasonably current 25 database and the process must perform reasonably well. 26 27 28 Requirements 29 ============ 30 31 * The solution must allow data to be copied incrementally. 32 * The solution must tolerate network outages and recover after connectivity 33 between exchange and auditor is restored. 34 * The solution must enable the auditor database to serve as a full backup 35 of the exchange's database (even if possibly slightly outdated due to 36 asynchronous replication or network outages). 37 * The solution must scale, in particular if the exchange shards the database, 38 the auditor must be also able to use the same kind of sharding and the 39 synchronization should be possible per shard. 40 * The synchronization mechanism must not allow an attacker controlling the 41 exchange database to delete or modify arbitrary data from the auditor's copy 42 via the synchronization mechanism (in other words, some tables are 43 append-only and unalterable). 44 * The solution must support database schema updates. Those may require some 45 downtime and closely coordinated work between exchange and auditor. 46 * The solution must enable eventual garbage collection at the exchange to 47 be permitted and replicated at the auditor (e.g. DELETE on usually append-only 48 tables due to a CASCADE from expired denomination keys). 49 * The synchronization mechanism should raise an alert if the exchange violates basic 50 constraints (unexpected schema changes, deletion/motification on append-only 51 tables) and then NOT replicate those changes. The auditor's internal asynchronous 52 helper may then soft-fail (log and exit) until the exchange has rectified the 53 problem (by manual, human intervention resulting in an exchange master database 54 that again maintains the required invariants). 55 After the corrected master database has been again synchronized with the 56 primary copy of the auditor, the auditor's helper is resumed and can continue to 57 copy the (now valid) database records into the auditor's internal version. 58 * A good solution would work independently of the specific database used. 59 60 61 Proposed Solution 62 ================= 63 64 * Use "common" incremental database replication (whichever is 65 appropriate for the exchange database setup, synchronous 66 or asynchronous) to make a 1:1 copy of the exchange database 67 at the auditor. This should work for any full-featured 68 modern database. This "ingress" copy cannot be trusted, as constraint 69 violations or deletions would also be replicated. 70 * Use helper process to SELECT against the local "ingress" copy (by 71 SERIAL ID => make sure all append-only tables have one!) 72 to copy append-only tables to a second, "trusted" and fully 73 auditor-controlled copy of the database. 74 Order (or transactionally group) SELECT 75 statements to ensure foreign key constraints are maintained. 76 For mutable tables (basically, only current reserve balance) 77 do not make another copy, but do have logic to recompute mutable 78 tables from other data *if* we need to recover from backup. 79 * On schema migration, halt exchange, once auditor DB has 80 synchronized, update all DB schema (the "ingress" DB schema 81 may be updated automatically when the exchange DB schema is 82 migrated, but the "trusted" DB of the auditor must most likely 83 be manually migrated), then finally resume "ingress" to "trusted" 84 helper-based DB synchronization and restart the exchange. 85 * For GC, simply run GC logic also on auditor's "trusted" copy. 86 (The synchronization mechanism will take care of the primary copy, 87 and the helper to copy should not be disturbed by the DELETE operations 88 anyway.) 89 * The auditor's "ingress" database should be well isolated from 90 the rest of the auditor's system and database 91 (different user accounts). The reason is that we should not 92 assume that the PostgreSQL replication code is battle-tested with 93 malicious parties in mind. 94 * The canonical PostgreSQL synchronization between exchange and the 95 auditor's "ingress" database must use transport security. 96 97 The above solution does not gracefully handle mutable tables on which 98 the exchange performs UPDATE statements, as such updates will not bump 99 the BIGSERIAL and thus would not be replicated by the helper. Thus, we 100 need to consider all tables that the exchange ever performs UPDATE on. 101 Those are: 102 103 * /reserves/ --- the exchange updates the remaining reserve balance; 104 here the auditor currently performs a sanity check against 105 its own reserve balance calculation. The proposed way to address 106 this is to make this sanity check optional and to be only used if 107 the auditor auditor runs against the "primary" exchange database 108 (like an internal audit). This is acceptable, as an inaccurate 109 reserve balance is mostly used to raise an early warning and not 110 indicative of any actualized financial gains or losses from the 111 exchange. 112 * /deposits/ --- the exchange updates the /tiny/ and /done/ bit 113 fields. /tiny/ can be trivially established by the auditor, and 114 we can simply avoid the auditor considering that bit. /done/ 115 was so far only used to enrich the reporting. The proposed way 116 to address the uses of both fields is thus to only use them in 117 internal audits (against the primary exchange database). Both 118 can be safely ignored by the external audit. 119 * /prewire/ --- the exchange updates the /finished/ and /failed/ 120 bits. The entire table is not used by the auditor and its 121 main values cannot be validated by the auditor anyway. 122 * /auditors/ --- the exchange updates the /is_active/ and /last_change/ 123 fields. The entire table is of no concern to the auditor. 124 125 A good order for replicating the tables should be: 126 127 * exchange_sign_keys 128 * signkey_revocations 129 * auditors 130 * denominations 131 * denomination_revocations 132 * auditor_denom_sigs 133 * reserves 134 * reserves_out 135 * reserves_in 136 * reserves_close 137 * known_coins 138 * deposits 139 * refunds 140 * refresh_commitments 141 * refresh_transfer_keys 142 * refresh_revealed_coins 143 * recoup_refresh 144 * recoup 145 146 147 148 Alternatives 149 ============ 150 151 * Copy the PostgreSQL WAL, filter it for "illegal" operations 152 and then apply it at the auditor end. Disadvantages: WAL 153 filtering is not a common operation (format documented?), 154 this would be highly PostgreSQL-specific, and would require 155 complex work to write the filter. Also unsure how one 156 could later recover gracefully from transient errors 157 (say where the exchange recified a bogus DELETE). 158 * Directly SELECT against the (remote) exchange DB and then 159 INSERT/UPDATE at the auditor's local copy. Disadvantages: 160 remote SELECT likely very expensive due to high latency. 161 Diagnostics more difficult. May expose exchange to additional 162 risks from auditor, such as attacks exhausting DB resources 163 by running expensive SELECTs. 164 165 166 167 168 Drawbacks 169 ========= 170 171 * SERIAL IDs required in all tables that are "append-only" / immutable. 172 * Additional custom logic required to recompute mutable tables 173 on-demand. 174 * Limited ability to cope with mutable tables, imposes restrictions 175 on future exchange database evolution. 176 * Helper logic to SELECT data in batches that will certainly 177 maintain invariants may be a bit tricky, but in principle 178 the foreign key constraints should form a DAG, simply dictating 179 the order in which new entries are to be copied. It may also 180 be that simply running "big" transactions across all tables 181 is the answer, to be investigated what performs better. 182 * A malicious exchange could theoretically send expensive transactions 183 to the auditor via the replication mechanism (possibly ones that 184 it did not even execute locally itself) to DoS the "ingress" 185 database. This would be noticed primarily by load 186 monitoring or even the auditor lagging unusually far behind the 187 exchange's transaction history. We believe this is acceptable, 188 as it would imply highly visible malicious exchange behavior for 189 virtually no significant gain. 190 * The proposed solution does not create a transactional, synchronous 191 write-only log as suggested by CodeBlau (see audit report, Section 9.4). 192 We believe doing so would be overly costly, both in terms of 193 complexity and performance, for limited gains. 194 195 196 Discussion / Q&A 197 ================