diff options
Diffstat (limited to 'design-documents/033-database.rst')
-rw-r--r-- | design-documents/033-database.rst | 152 |
1 files changed, 152 insertions, 0 deletions
diff --git a/design-documents/033-database.rst b/design-documents/033-database.rst new file mode 100644 index 00000000..ee37cc32 --- /dev/null +++ b/design-documents/033-database.rst @@ -0,0 +1,152 @@ +DD 33: Database Schema and Versioning +##################################### + +Summary +======= + +This document describes how we version database schema in GNU Taler +and enable migrations. + + +Motivation +========== + +As Taler evolves, it will be necessary to add new tables, change existing +tables, modify indices and make other changes to the database schema. As +production systems exist, existing data will need to be migrated to the new +schema, and we need to do this in a systematic way. While certain major +changes may require radical or manual interventions, we should have a +systematic way of dealing with minor or modest modifications to the schema. + + +Requirements +============ + +* The approach should be language-independent +* Schema migration should be reasonably performant +* Schema migration must be usable, and in particular safe to use for + operators without significant risk +* We need to support key database features we might need to use, + such as partitioning or sharding + + +Proposed Solution +================= + +We use the "versioning.sql" system to store the current set of patches that +have been applied to the database so far in a "_v" schema. This allows us to +quickly test which version of the database we are on and which migrations may +still need to be applied. + +For each component, all tables are placed into a SCHEMA named after the +component. + +We then use a set of numbered SQL files that create, alter or drop tables and +indices (like exchange-0001.sql, exchange-0002.sql, ...) to setup the +database. However, some setups need additional arguments, such as the number +of partitions. Those setups are then NOT performed explicitly, but by creating +stored procedures and registering those stored procedures in a general global +"master" table to be called from the main setup logic with arguments in a +particular order under certain conditions. + +When setting up a database, there is no point in incrementally defining +ordinary stored procedures that are used at runtime (not the ones to setup the +tables we talked about above). Thus, all of the stored procedures used by the +runtime system are placed in a file "procedures.sql" which is loaded +last. This makes changes to stored procedures particularly easy, as one simply +edits "procedures.sql". Loading "procedures.sql" also does not change "_v". + +A "drop.sql" file is created that DROPs the main SCHEMA of the component and +additionally unregisters all patches from the "_v" schema. The drop script +is run during tests to fully reset the database. + +Exchange details +^^^^^^^^^^^^^^^^ + +The exchange uses "exchange_tables" to create the master +table mentioned above. In "exchange_tables", entries are +executed in the order of the "table_serial_id". Each +entry has a "name", which is the name of the affected table +(or at least the prefix in the case of partitioned or sharded +tables). The "version" field stores which "exchange-XXXX.sql" +file setup the respective table entry, but is for now mostly +for internal documentation. The "action" defines both the +condition under which to run a function. Specifically, +actions can be: + +* create --- run on the master table and each shard; used to create or alter the main table +* constrain --- run only on the partitions/shards, or on master if there are no partitions; used to setup constraints like uniqueness that only apply to the lowest levels of the table +* master -- run only on the master table; used to setup triggers and other constraints that only apply to the master table +* foreign -- run only on the master table and only if there are no partition; used to setup foreign key constraints that are not supported on partitioned or sharded tables + +The "partitioned" field indicates that this table is partitioned and instructs the functions to create partitions (or shards) +for this table. + +The "by_range" field indicates if the table is partitioned by +range, which prevents automatic generation of partitions as +is done if partitioned by hash. + +The "finished" field is initially false, but set to TRUE once the respective +function has been executed. + +The main "do_create_tables" function triggers the unfinished actions +registered in the "exchange_tables" table. It is given arguments to control +the number of partitions, the use of partitions and (in the future) the use of +sharding. + +The individual actions use helper functions ("create_partitioned_table", +"comment_partitioned_table" and "comment_partitioned_column") to facilitate +the creation of tables and associated comments. These functions are used so +that we can only define the schema or comment once, and have it applied to +tables with names and creation syntax that changes slightly if we use shards +or partitions. + +Some additional logic will be needed to deal nicely with +sharding (which is currently not supported), as with +sharing we will need to detach shards, migrate shards, and +re-attach shards. So this will require additional stored +procedures to support operations on shards. + + +Merchant details +^^^^^^^^^^^^^^^^ + +The merchant does not (yet) need any type of master table, as we do not +(yet) use any kind of sharding or partitioning. There are also no +stored procedures being used by the backend. Hence, it is simply the +"versioning.sql"-controlled table creation/alteration sequence +(merchant-0001.sql, etc.) and the "drop.sql" to reset everything. + + +Alternatives +============ + +* We might want to consider storing more meta-data + in the database, such as the use of sharding, the + names of the shard servers, or even just the number + of partitions. + +* We could require dumping out the old database and + loading it in via some explicit importer during each + migration; having migration logic in C would enable more + powerful migrations, but dumping and reloading the entire + database would also be more expensive. It would have the + advantage of basically having the old database around in + case of migration trouble, so the cost disadvantage might + not be so bad (as admins are likely to make a backup anyway). + OTOH, doing the migration with the equivalent of a + taler-auditor-sync would require quite a bit more code + than the simple ALTER/CREATE statements in an SQL file. + + +Drawbacks +========= + +* not exactly trival logic +* some complexity to implement + + +Discussion / Q&A +================ + +(This should be filled in with results from discussions on mailing lists / personal communication.) |