033-database.rst (6417B)
1 DD 33: Database Schema and Versioning 2 ##################################### 3 4 Summary 5 ======= 6 7 This document describes how we version database schema in GNU Taler 8 and enable migrations. 9 10 11 Motivation 12 ========== 13 14 As Taler evolves, it will be necessary to add new tables, change existing 15 tables, modify indices and make other changes to the database schema. As 16 production systems exist, existing data will need to be migrated to the new 17 schema, and we need to do this in a systematic way. While certain major 18 changes may require radical or manual interventions, we should have a 19 systematic way of dealing with minor or modest modifications to the schema. 20 21 22 Requirements 23 ============ 24 25 * The approach should be language-independent 26 * Schema migration should be reasonably performant 27 * Schema migration must be usable, and in particular safe to use for 28 operators without significant risk 29 * We need to support key database features we might need to use, 30 such as partitioning or sharding 31 32 33 Proposed Solution 34 ================= 35 36 We use the "versioning.sql" system to store the current set of patches that 37 have been applied to the database so far in a "_v" schema. This allows us to 38 quickly test which version of the database we are on and which migrations may 39 still need to be applied. 40 41 For each component, all tables are placed into a SCHEMA named after the 42 component. 43 44 We then use a set of numbered SQL files that create, alter or drop tables and 45 indices (like exchange-0001.sql, exchange-0002.sql, ...) to setup the 46 database. However, some setups need additional arguments, such as the number 47 of partitions. Those setups are then NOT performed explicitly, but by creating 48 stored procedures and registering those stored procedures in a general global 49 "master" table to be called from the main setup logic with arguments in a 50 particular order under certain conditions. 51 52 When setting up a database, there is no point in incrementally defining 53 ordinary stored procedures that are used at runtime (not the ones to setup the 54 tables we talked about above). Thus, all of the stored procedures used by the 55 runtime system are placed in a file "procedures.sql" which is loaded 56 last. This makes changes to stored procedures particularly easy, as one simply 57 edits "procedures.sql". Loading "procedures.sql" also does not change "_v". 58 59 A "drop.sql" file is created that DROPs the main SCHEMA of the component and 60 additionally unregisters all patches from the "_v" schema. The drop script 61 is run during tests to fully reset the database. 62 63 Exchange details 64 ^^^^^^^^^^^^^^^^ 65 66 The exchange uses "exchange_tables" to create the master 67 table mentioned above. In "exchange_tables", entries are 68 executed in the order of the "table_serial_id". Each 69 entry has a "name", which is the name of the affected table 70 (or at least the prefix in the case of partitioned or sharded 71 tables). The "version" field stores which "exchange-XXXX.sql" 72 file setup the respective table entry, but is for now mostly 73 for internal documentation. The "action" defines both the 74 condition under which to run a function. Specifically, 75 actions can be: 76 77 * create --- run on the master table and each shard; used to create or alter the main table 78 * constrain --- run only on the partitions/shards, or on master if there are no partitions; used to setup constraints like uniqueness that only apply to the lowest levels of the table 79 * master -- run only on the master table; used to setup triggers and other constraints that only apply to the master table 80 * foreign -- run only on the master table and only if there are no partition; used to setup foreign key constraints that are not supported on partitioned or sharded tables 81 82 The "partitioned" field indicates that this table is partitioned and instructs the functions to create partitions (or shards) 83 for this table. 84 85 The "by_range" field indicates if the table is partitioned by 86 range, which prevents automatic generation of partitions as 87 is done if partitioned by hash. 88 89 The "finished" field is initially false, but set to TRUE once the respective 90 function has been executed. 91 92 The main "do_create_tables" function triggers the unfinished actions 93 registered in the "exchange_tables" table. It is given arguments to control 94 the number of partitions, the use of partitions and (in the future) the use of 95 sharding. 96 97 The individual actions use helper functions ("create_partitioned_table", 98 "comment_partitioned_table" and "comment_partitioned_column") to facilitate 99 the creation of tables and associated comments. These functions are used so 100 that we can only define the schema or comment once, and have it applied to 101 tables with names and creation syntax that changes slightly if we use shards 102 or partitions. 103 104 Some additional logic will be needed to deal nicely with 105 sharding (which is currently not supported), as with 106 sharing we will need to detach shards, migrate shards, and 107 re-attach shards. So this will require additional stored 108 procedures to support operations on shards. 109 110 111 Merchant details 112 ^^^^^^^^^^^^^^^^ 113 114 The merchant does not (yet) need any type of master table, as we do not 115 (yet) use any kind of sharding or partitioning. There are also no 116 stored procedures being used by the backend. Hence, it is simply the 117 "versioning.sql"-controlled table creation/alteration sequence 118 (merchant-0001.sql, etc.) and the "drop.sql" to reset everything. 119 120 121 Alternatives 122 ============ 123 124 * We might want to consider storing more meta-data 125 in the database, such as the use of sharding, the 126 names of the shard servers, or even just the number 127 of partitions. 128 129 * We could require dumping out the old database and 130 loading it in via some explicit importer during each 131 migration; having migration logic in C would enable more 132 powerful migrations, but dumping and reloading the entire 133 database would also be more expensive. It would have the 134 advantage of basically having the old database around in 135 case of migration trouble, so the cost disadvantage might 136 not be so bad (as admins are likely to make a backup anyway). 137 OTOH, doing the migration with the equivalent of a 138 taler-auditor-sync would require quite a bit more code 139 than the simple ALTER/CREATE statements in an SQL file. 140 141 142 Drawbacks 143 ========= 144 145 * not exactly trival logic 146 * some complexity to implement 147 148 149 Discussion / Q&A 150 ================ 151 152 (This should be filled in with results from discussions on mailing lists / personal communication.)