taler-docs

Documentation for GNU Taler components, APIs and protocols
Log | Files | Refs | README | LICENSE

018-contract-json.rst (6246B)


      1 DD 18: Forgettable Data in JSON Contract Terms
      2 ##############################################
      3 
      4 Summary
      5 =======
      6 
      7 This document defines concepts and algorithms for handling the JSON format of
      8 contract terms with forgettable data in Taler payments.
      9 
     10 Motivation
     11 ==========
     12 
     13 The contract terms JSON format used in Taler describes various aspects of a
     14 payment request, such as the amount to be paid, accepted payment service
     15 providers, a human-readable summary, a list of products and shipping
     16 information.
     17 
     18 To support data minimization, it would be nice if some pieces of information
     19 stored in the contract terms (either in the storage of the merchant or the
     20 customer's wallet) could be deleted as soon as they are not strictly required
     21 anymore.
     22 
     23 However, the cryptographic hash of the contract terms is used throughout the
     24 Taler protocol as an opaque handle for the payment and associated processes.
     25 In an audit, a merchant might be asked to reveal the plain-text contract terms for a
     26 particular hash.
     27 
     28 Thus the hashing of the contract terms needs to take into account the
     29 forgettable parts of a contract terms.  The contract terms hash needs to be the
     30 same before and after forgetting a forgettable part of the contract terms.
     31 
     32 Proposed Solution
     33 =================
     34 
     35 Members of objects can be marked as forgettable by adding metadata to the
     36 contract terms JSON.  Before hashing the contract terms JSON, it is first
     37 scrubbed and canonicalized.  Scrubbing replaces forgettable members with a
     38 salted hash of their (recursively scrubbed and canonicalized) value.  To
     39 prevent attempts at guessing the value of forgotten members, a salt is
     40 generated and stored in the contract terms for each forgettable member.
     41 
     42 Constraints on Contract Terms JSON
     43 ----------------------------------
     44 
     45 In order to make it easy to get a canonical representation for JSON contract
     46 terms, the following restrictions apply:
     47 
     48 * Member names are restricted:  Only strings matching the regular expression
     49   ``^[0-9A-Z_a-z]+$`` or the literal names ``$forgettable`` or ``$forgotten`` are
     50   allowed.  This makes the sorting of object members easier, as RFC8785
     51   requires sorting by UTF-16 code points.
     52 * Floating point numbers are forbidden.  Numbers must be integers in the range
     53   ``-(2**53 - 1)`` to ``(2**52) - 1``.
     54 
     55 
     56 Marking Members as Forgettable
     57 ------------------------------
     58 
     59 A property is marked as forgettable by including the property
     60 name as a key in the special ``$forgettable`` field of the property's
     61 parent object.
     62 
     63 .. code-block:: json
     64 
     65    {
     66     "delivery_address": "...",
     67     "$forgettable": {
     68       "delivery_address": "<salt>"
     69     },
     70    }
     71 
     72 Clients that write contract terms might not be able to easily generate the salt value.
     73 Thus, the merchant backend must also allow the following syntax in the order creation request:
     74 
     75 .. code-block:: json
     76 
     77    {
     78     "$forgettable": {
     79       "delivery_address": true
     80     },
     81    }
     82 
     83 However, a JSON object with such a forgettable specification must be considered an
     84 invalid contract terms object.
     85 
     86 Forgetting a Forgettable Member
     87 -------------------------------
     88 
     89 To forget a forgettable member, it is removed from
     90 the parent object, and the salted hash of the member's
     91 scrubbed and canonicalized value is put into the special ``$forgotten$``
     92 member of the parent object.
     93 
     94 
     95 .. code-block:: javascript
     96 
     97    {
     98     ...props,
     99     "delivery_address": "...",
    100     "$forgettable": {
    101       "delivery_address": "<memb_salt>"
    102     },
    103    }
    104 
    105    =>
    106 
    107    {
    108     ...props,
    109     "$forgotten": {
    110       "delivery_address": "<memb_salted_hash>"
    111     },
    112     "$forgettable": {
    113       "delivery_address": "<memb_salt>"
    114     },
    115    }
    116 
    117 The hash of a member value ``memb_val`` with salt ``memb_salt`` is computed as follows:
    118 
    119 .. code-block:: javascript
    120 
    121    memb_val_canon = canonicalized_json(scrub(memb_val));
    122 
    123    memb_salted_hash = hkdf_sha512({
    124      output_length: 64,
    125      input_key_material: memb_val_canon,
    126      salt: memb_salt,
    127    });
    128 
    129 When encoding ``memb_salted_hash`` with base32-crockford, the resulting output
    130 must be upper-case.
    131 
    132 
    133 Scrubbing
    134 ---------
    135 
    136 A JSON object is scrubbed by recursively identifying and forgetting all
    137 forgettable fields.
    138 
    139 
    140 Canonicalized Hashing
    141 ---------------------
    142 
    143 A JSON object is canonicalized by converting it to an ASCII byte array with the
    144 algorithm specified in `RFC 8785 <https://tools.ietf.org/html/rfc8785>`__.  The
    145 resulting bytes are terminated with a single 0-byte and then hashed with
    146 SHA512.
    147 
    148 
    149 Test vector
    150 -----------
    151 
    152 The following input contains top-level and nested forgettable
    153 fields, as well as booleans, integers, strings and objects
    154 as well as non-forgettable fields.  It is thus suitable as
    155 a minimal interoperability test:
    156 
    157 .. code-block:: json
    158 
    159   {
    160     "k1": 1,
    161     "_forgettable": {
    162       "k1": "SALT"
    163     },
    164     "k2": {
    165       "n1": true,
    166       "_forgettable": {
    167         "n1": "salt"
    168       }
    169     },
    170     "k3": {
    171       "n1": "string"
    172     }
    173   }
    174 
    175 Hashing the above contract results in the following Crockford base32 encoded
    176 hash
    177 ``287VXK8T6PXKD05W8Y94QJNEFCMRXBC9S7KNKTWGH2G2J2D7RYKPSHNH1HG9NT1K2HRHGC67W6QM6GEC4BSN1DPNEBCS0AVDT2DBP5G``.
    178 
    179 Note that typically the salt values must be chosen at random, only for this test we use static salt values.
    180 
    181 
    182 
    183 Discussion / Q&A
    184 ================
    185 
    186 * It is not completely clear which parts of the contract terms
    187   should be forgettable.  This should be individually decided
    188   by the merchant based on applicable legislation.
    189 
    190 * Is it really necessary that there is one salt per forgettable member?
    191   We could also have a "contract terms global" salt, and then
    192   use the global salt **and** the path of the forgettable field
    193   as the salt for hashing.
    194 
    195 * Why do we require the 0-termination in the hash / kdf?  Doesn't seem to match what
    196   e.g. ``shasum`` does.
    197 
    198 * Why do we not supply any "info" string (= context chunks in the GNUNET_CRYPTO_kdf terminology)
    199   to the hkdf?  Does it matter?
    200 
    201 * We could also delete the corresponding ``$forgettable`` entry after
    202   forgetting a member.  This would save storage.  But to prove that a certain
    203   forgettable info matches the contract terms, the prover would need to
    204   also store/provide the salt.
    205 
    206 * What validations should the wallet do?  Should the wallet ever accept
    207   contract terms where fields are already forgotten?