Design Doc 018: Forgettable Data in JSON Contract Terms ####################################################### Summary ======= This document defines concepts and algorithms for handling the JSON format of contract terms with forgettable data in Taler payments. Motivation ========== The contract terms JSON format used in Taler describes various aspects of a payment request, such as the amount to be paid, accepted payment service providers, a human-readable summary, a list of products and shipping information. To support data minimization, it would be nice if some pieces of information stored in the contract terms (either in the storage of the merchant or the customer's wallet) could be deleted as soon as they are not strictly required anymore. However, the cryptographic hash of the contract terms is used throughout the Taler protocol as an opaque handle for the payment and associated processes. In an audit, a merchant might be asked to reveal the plain-text contract terms for a particular hash. Thus the hashing of the contract terms needs to take into account the forgettable parts of a contract terms. The contract terms hash needs to be the same before and after forgetting a forgettable part of the contract terms. Proposed Solution ================= Members of objects can be marked as forgettable by adding metadata to the contract terms JSON. Before hashing the contract terms JSON, it is first scrubbed and canonicalized. Scrubbing replaces forgettable members with a salted hash of their (recursively scrubbed and canonicalized) value. To prevent attempts at guessing the value of forgotten members, a salt is generated and stored in the contract terms for each forgettable member. Constraints on Contract Terms JSON ---------------------------------- In order to make it easy to get a canonical representation for JSON contract terms, the following restrictions apply: * Member names are restricted: Only strings matching the regular expression ``0-9A-Z_a-z`` or the literal names ``$forgettable`` or ``$forgotten`` are allowed. This makes the sorting of object members easier, as RFC8785 requires sorting by UTF-16 code points. * Floating point numbers are forbidden. Numbers must be integers in the range ``-(2**53 - 1)`` to ``(2**52) - 1``. Marking Members as Forgettable ------------------------------ A property is marked as forgettable by including the property name as a key in the special ``$forgettable`` field of the property's parent object. .. code-block:: json { "delivery_address": ..., "$forgettable": { "delivery_address": "" }, } Clients that write contract terms might not be able to easily generate the salt value. Thus, the merchant backend must also allow the following syntax in the order creation request: .. code-block:: json { "$forgettable": { "delivery_address": true }, } However, a JSON object with such a forgettable specification must be considered an invalid contract terms object. Forgetting a Forgettable Member ------------------------------- To forget a forgettable member, it is removed from the parent object, and the salted hash of the member's scrubbed and canonicalized value is put into the special ``$forgotten$`` member of the parent object. .. code-block:: json { ...props, "delivery_address": ..., "$forgettable": { "delivery_address": "" }, } => { ...props, "$forgotten": { "delivery_address": "" }, "$forgettable": { "delivery_address": "" }, } The hash of a member value ``memb_val`` with salt ``memb_salt`` is computed as follows: .. code-block:: javascript memb_val_canon = canonicalized_json(scrub(memb_val)); memb_salted_hash = hkdf_sha512({ output_length: 64, input_key_material: memb_val_canon, salt: memb_salt, }); Scrubbing --------- A JSON object is scrubbed by recursively identifying and forgetting all forgettable fields. Canonicalized Hashing --------------------- A JSON object is canonicalized by converting it to an ASCII byte array with the algorithm specified in `RFC 8785 `__. The resulting bytes are terminated with a single 0-byte and then hashed with SHA512. Discussion / Q&A ================ * It is not completely clear which parts of the contract terms should be forgettable. This should be individually decided by the merchant based on applicable legislation. * Is it really necessary that there is one salt per forgettable member? We could also have a "contract terms global" salt, and then use the global salt **and** the path of the forgettable field as the salt for hashing. * Why do we require the 0-termination? * Why do we not supply any "info" string (= context chunks in the GNUNET_CRYPTO_kdf terminology) to the hkdf? Does it matter? * We could also delete the corresponding ``$forgettable`` entry after forgetting a member. This would save storage. But to prove that a certain forgettable info matches the contract terms, the prover would need to also store/provide the salt.