taler-docs

Documentation for GNU Taler components, APIs and protocols
Log | Files | Refs | README | LICENSE

033-database.rst (6417B)


      1 DD 33: Database Schema and Versioning
      2 #####################################
      3 
      4 Summary
      5 =======
      6 
      7 This document describes how we version database schema in GNU Taler
      8 and enable migrations.
      9 
     10 
     11 Motivation
     12 ==========
     13 
     14 As Taler evolves, it will be necessary to add new tables, change existing
     15 tables, modify indices and make other changes to the database schema. As
     16 production systems exist, existing data will need to be migrated to the new
     17 schema, and we need to do this in a systematic way.  While certain major
     18 changes may require radical or manual interventions, we should have a
     19 systematic way of dealing with minor or modest modifications to the schema.
     20 
     21 
     22 Requirements
     23 ============
     24 
     25 * The approach should be language-independent
     26 * Schema migration should be reasonably performant
     27 * Schema migration must be usable, and in particular safe to use for
     28   operators without significant risk
     29 * We need to support key database features we might need to use,
     30   such as partitioning or sharding
     31 
     32 
     33 Proposed Solution
     34 =================
     35 
     36 We use the "versioning.sql" system to store the current set of patches that
     37 have been applied to the database so far in a "_v" schema. This allows us to
     38 quickly test which version of the database we are on and which migrations may
     39 still need to be applied.
     40 
     41 For each component, all tables are placed into a SCHEMA named after the
     42 component.
     43 
     44 We then use a set of numbered SQL files that create, alter or drop tables and
     45 indices (like exchange-0001.sql, exchange-0002.sql, ...) to setup the
     46 database. However, some setups need additional arguments, such as the number
     47 of partitions. Those setups are then NOT performed explicitly, but by creating
     48 stored procedures and registering those stored procedures in a general global
     49 "master" table to be called from the main setup logic with arguments in a
     50 particular order under certain conditions.
     51 
     52 When setting up a database, there is no point in incrementally defining
     53 ordinary stored procedures that are used at runtime (not the ones to setup the
     54 tables we talked about above). Thus, all of the stored procedures used by the
     55 runtime system are placed in a file "procedures.sql" which is loaded
     56 last. This makes changes to stored procedures particularly easy, as one simply
     57 edits "procedures.sql".  Loading "procedures.sql" also does not change "_v".
     58 
     59 A "drop.sql" file is created that DROPs the main SCHEMA of the component and
     60 additionally unregisters all patches from the "_v" schema. The drop script
     61 is run during tests to fully reset the database.
     62 
     63 Exchange details
     64 ^^^^^^^^^^^^^^^^
     65 
     66 The exchange uses "exchange_tables" to create the master
     67 table mentioned above. In "exchange_tables", entries are
     68 executed in the order of the "table_serial_id". Each
     69 entry has a "name", which is the name of the affected table
     70 (or at least the prefix in the case of partitioned or sharded
     71 tables).  The "version" field stores which "exchange-XXXX.sql"
     72 file setup the respective table entry, but is for now mostly
     73 for internal documentation.  The "action" defines both the
     74 condition under which to run a function.  Specifically,
     75 actions can be:
     76 
     77 * create --- run on the master table and each shard; used to create or alter the main table
     78 * constrain --- run only on the partitions/shards, or on master if there are no partitions; used to setup constraints like uniqueness that only apply to the lowest levels of the table
     79 * master -- run only on the master table; used to setup triggers and other constraints that only apply to the master table
     80 * foreign -- run only on the master table and only if there are no partition; used to setup foreign key constraints that are not supported on partitioned or sharded tables
     81 
     82 The "partitioned" field indicates that this table is partitioned and instructs the functions to create partitions (or shards)
     83 for this table.
     84     
     85 The "by_range" field indicates if the table is partitioned by
     86 range, which prevents automatic generation of partitions as
     87 is done if partitioned by hash.
     88 
     89 The "finished" field is initially false, but set to TRUE once the respective
     90 function has been executed.
     91 
     92 The main "do_create_tables" function triggers the unfinished actions
     93 registered in the "exchange_tables" table.  It is given arguments to control
     94 the number of partitions, the use of partitions and (in the future) the use of
     95 sharding.
     96 
     97 The individual actions use helper functions ("create_partitioned_table",
     98 "comment_partitioned_table" and "comment_partitioned_column") to facilitate
     99 the creation of tables and associated comments.  These functions are used so
    100 that we can only define the schema or comment once, and have it applied to
    101 tables with names and creation syntax that changes slightly if we use shards
    102 or partitions.
    103 
    104 Some additional logic will be needed to deal nicely with
    105 sharding (which is currently not supported), as with
    106 sharing we will need to detach shards, migrate shards, and
    107 re-attach shards. So this will require additional stored
    108 procedures to support operations on shards.
    109 
    110 
    111 Merchant details
    112 ^^^^^^^^^^^^^^^^
    113 
    114 The merchant does not (yet) need any type of master table, as we do not
    115 (yet) use any kind of sharding or partitioning.  There are also no
    116 stored procedures being used by the backend.  Hence, it is simply the
    117 "versioning.sql"-controlled table creation/alteration sequence
    118 (merchant-0001.sql, etc.) and the "drop.sql" to reset everything.
    119 
    120 
    121 Alternatives
    122 ============
    123 
    124 * We might want to consider storing more meta-data
    125   in the database, such as the use of sharding, the
    126   names of the shard servers, or even just the number
    127   of partitions.
    128 
    129 * We could require dumping out the old database and
    130   loading it in via some explicit importer during each
    131   migration; having migration logic in C would enable more
    132   powerful migrations, but dumping and reloading the entire
    133   database would also be more expensive. It would have the
    134   advantage of basically having the old database around in
    135   case of migration trouble, so the cost disadvantage might
    136   not be so bad (as admins are likely to make a backup anyway).
    137   OTOH, doing the migration with the equivalent of a
    138   taler-auditor-sync would require quite a bit more code
    139   than the simple ALTER/CREATE statements in an SQL file.
    140   
    141 
    142 Drawbacks
    143 =========
    144 
    145 * not exactly trival logic
    146 * some complexity to implement
    147 
    148 
    149 Discussion / Q&A
    150 ================
    151 
    152 (This should be filled in with results from discussions on mailing lists / personal communication.)