018-contract-json.rst (6246B)
1 DD 18: Forgettable Data in JSON Contract Terms 2 ############################################## 3 4 Summary 5 ======= 6 7 This document defines concepts and algorithms for handling the JSON format of 8 contract terms with forgettable data in Taler payments. 9 10 Motivation 11 ========== 12 13 The contract terms JSON format used in Taler describes various aspects of a 14 payment request, such as the amount to be paid, accepted payment service 15 providers, a human-readable summary, a list of products and shipping 16 information. 17 18 To support data minimization, it would be nice if some pieces of information 19 stored in the contract terms (either in the storage of the merchant or the 20 customer's wallet) could be deleted as soon as they are not strictly required 21 anymore. 22 23 However, the cryptographic hash of the contract terms is used throughout the 24 Taler protocol as an opaque handle for the payment and associated processes. 25 In an audit, a merchant might be asked to reveal the plain-text contract terms for a 26 particular hash. 27 28 Thus the hashing of the contract terms needs to take into account the 29 forgettable parts of a contract terms. The contract terms hash needs to be the 30 same before and after forgetting a forgettable part of the contract terms. 31 32 Proposed Solution 33 ================= 34 35 Members of objects can be marked as forgettable by adding metadata to the 36 contract terms JSON. Before hashing the contract terms JSON, it is first 37 scrubbed and canonicalized. Scrubbing replaces forgettable members with a 38 salted hash of their (recursively scrubbed and canonicalized) value. To 39 prevent attempts at guessing the value of forgotten members, a salt is 40 generated and stored in the contract terms for each forgettable member. 41 42 Constraints on Contract Terms JSON 43 ---------------------------------- 44 45 In order to make it easy to get a canonical representation for JSON contract 46 terms, the following restrictions apply: 47 48 * Member names are restricted: Only strings matching the regular expression 49 ``^[0-9A-Z_a-z]+$`` or the literal names ``$forgettable`` or ``$forgotten`` are 50 allowed. This makes the sorting of object members easier, as RFC8785 51 requires sorting by UTF-16 code points. 52 * Floating point numbers are forbidden. Numbers must be integers in the range 53 ``-(2**53 - 1)`` to ``(2**52) - 1``. 54 55 56 Marking Members as Forgettable 57 ------------------------------ 58 59 A property is marked as forgettable by including the property 60 name as a key in the special ``$forgettable`` field of the property's 61 parent object. 62 63 .. code-block:: json 64 65 { 66 "delivery_address": "...", 67 "$forgettable": { 68 "delivery_address": "<salt>" 69 }, 70 } 71 72 Clients that write contract terms might not be able to easily generate the salt value. 73 Thus, the merchant backend must also allow the following syntax in the order creation request: 74 75 .. code-block:: json 76 77 { 78 "$forgettable": { 79 "delivery_address": true 80 }, 81 } 82 83 However, a JSON object with such a forgettable specification must be considered an 84 invalid contract terms object. 85 86 Forgetting a Forgettable Member 87 ------------------------------- 88 89 To forget a forgettable member, it is removed from 90 the parent object, and the salted hash of the member's 91 scrubbed and canonicalized value is put into the special ``$forgotten$`` 92 member of the parent object. 93 94 95 .. code-block:: javascript 96 97 { 98 ...props, 99 "delivery_address": "...", 100 "$forgettable": { 101 "delivery_address": "<memb_salt>" 102 }, 103 } 104 105 => 106 107 { 108 ...props, 109 "$forgotten": { 110 "delivery_address": "<memb_salted_hash>" 111 }, 112 "$forgettable": { 113 "delivery_address": "<memb_salt>" 114 }, 115 } 116 117 The hash of a member value ``memb_val`` with salt ``memb_salt`` is computed as follows: 118 119 .. code-block:: javascript 120 121 memb_val_canon = canonicalized_json(scrub(memb_val)); 122 123 memb_salted_hash = hkdf_sha512({ 124 output_length: 64, 125 input_key_material: memb_val_canon, 126 salt: memb_salt, 127 }); 128 129 When encoding ``memb_salted_hash`` with base32-crockford, the resulting output 130 must be upper-case. 131 132 133 Scrubbing 134 --------- 135 136 A JSON object is scrubbed by recursively identifying and forgetting all 137 forgettable fields. 138 139 140 Canonicalized Hashing 141 --------------------- 142 143 A JSON object is canonicalized by converting it to an ASCII byte array with the 144 algorithm specified in `RFC 8785 <https://tools.ietf.org/html/rfc8785>`__. The 145 resulting bytes are terminated with a single 0-byte and then hashed with 146 SHA512. 147 148 149 Test vector 150 ----------- 151 152 The following input contains top-level and nested forgettable 153 fields, as well as booleans, integers, strings and objects 154 as well as non-forgettable fields. It is thus suitable as 155 a minimal interoperability test: 156 157 .. code-block:: json 158 159 { 160 "k1": 1, 161 "_forgettable": { 162 "k1": "SALT" 163 }, 164 "k2": { 165 "n1": true, 166 "_forgettable": { 167 "n1": "salt" 168 } 169 }, 170 "k3": { 171 "n1": "string" 172 } 173 } 174 175 Hashing the above contract results in the following Crockford base32 encoded 176 hash 177 ``287VXK8T6PXKD05W8Y94QJNEFCMRXBC9S7KNKTWGH2G2J2D7RYKPSHNH1HG9NT1K2HRHGC67W6QM6GEC4BSN1DPNEBCS0AVDT2DBP5G``. 178 179 Note that typically the salt values must be chosen at random, only for this test we use static salt values. 180 181 182 183 Discussion / Q&A 184 ================ 185 186 * It is not completely clear which parts of the contract terms 187 should be forgettable. This should be individually decided 188 by the merchant based on applicable legislation. 189 190 * Is it really necessary that there is one salt per forgettable member? 191 We could also have a "contract terms global" salt, and then 192 use the global salt **and** the path of the forgettable field 193 as the salt for hashing. 194 195 * Why do we require the 0-termination in the hash / kdf? Doesn't seem to match what 196 e.g. ``shasum`` does. 197 198 * Why do we not supply any "info" string (= context chunks in the GNUNET_CRYPTO_kdf terminology) 199 to the hkdf? Does it matter? 200 201 * We could also delete the corresponding ``$forgettable`` entry after 202 forgetting a member. This would save storage. But to prove that a certain 203 forgettable info matches the contract terms, the prover would need to 204 also store/provide the salt. 205 206 * What validations should the wallet do? Should the wallet ever accept 207 contract terms where fields are already forgotten?