summaryrefslogtreecommitdiff
path: root/design-documents/018-contract-json.rst
blob: d7f782579e5186a5e07bdc46522587f78db338b4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
Design Doc 018: Forgettable Data in JSON Contract Terms
#######################################################

Summary
=======

This document defines concepts and algorithms for handling the JSON format of
contract terms with forgettable data in Taler payments.

Motivation
==========

The contract terms JSON format used in Taler describes various aspects of a
payment request, such as the amount to be paid, accepted payment service
providers, a human-readable summary, a list of products and shipping
information.

To support data minimization, it would be nice if some pieces of information
stored in the contract terms (either in the storage of the merchant or the
customer's wallet) could be deleted as soon as they are not strictly required
anymore.

However, the cryptographic hash of the contract terms is used throughout the
Taler protocol as an opaque handle for the payment and associated processes.
In an audit, a merchant might be asked to reveal the plain-text contract terms for a
particular hash.

Thus the hashing of the contract terms needs to take into account the
forgettable parts of a contract terms.  The contract terms hash needs to be the
same before and after forgetting a forgettable part of the contract terms.

Proposed Solution
=================

Members of objects can be marked as forgettable by adding metadata to the
contract terms JSON.  Before hashing the contract terms JSON, it is first
scrubbed and canonicalized.  Scrubbing replaces forgettable members with a
salted hash of their (recursively scrubbed and canonicalized) value.  To
prevent attempts at guessing the value of forgotten members, a salt is
generated and stored in the contract terms for each forgettable member.

Constraints on Contract Terms JSON
----------------------------------

In order to make it easy to get a canonical representation for JSON contract
terms, the following restrictions apply:

* Member names are restricted:  Only strings matching the regular expression
  ``^[0-9A-Z_a-z]+$`` or the literal names ``$forgettable`` or ``$forgotten`` are
  allowed.  This makes the sorting of object members easier, as RFC8785
  requires sorting by UTF-16 code points.
* Floating point numbers are forbidden.  Numbers must be integers in the range
  ``-(2**53 - 1)`` to ``(2**52) - 1``.


Marking Members as Forgettable
------------------------------

A property is marked as forgettable by including the property
name as a key in the special ``$forgettable`` field of the property's
parent object.

.. code-block:: json

   {
    "delivery_address": "...",
    "$forgettable": {
      "delivery_address": "<salt>"
    },
   }

Clients that write contract terms might not be able to easily generate the salt value.
Thus, the merchant backend must also allow the following syntax in the order creation request:

.. code-block:: json

   {
    "$forgettable": {
      "delivery_address": true
    },
   }

However, a JSON object with such a forgettable specification must be considered an
invalid contract terms object.

Forgetting a Forgettable Member
-------------------------------

To forget a forgettable member, it is removed from
the parent object, and the salted hash of the member's
scrubbed and canonicalized value is put into the special ``$forgotten$``
member of the parent object.


.. code-block:: json

   {
    ...props,
    "delivery_address": "...",
    "$forgettable": {
      "delivery_address": "<memb_salt>"
    },
   }

   =>

   {
    ...props,
    "$forgotten": {
      "delivery_address": "<memb_salted_hash>"
    },
    "$forgettable": {
      "delivery_address": "<memb_salt>"
    },
   }

The hash of a member value ``memb_val`` with salt ``memb_salt`` is computed as follows:

.. code-block:: javascript

   memb_val_canon = canonicalized_json(scrub(memb_val));

   memb_salted_hash = hkdf_sha512({
     output_length: 64,
     input_key_material: memb_val_canon,
     salt: memb_salt,
   });

When encoding ``memb_salted_hash`` with base32-crockford, the resulting output
must be upper-case.


Scrubbing
---------

A JSON object is scrubbed by recursively identifying and forgetting all
forgettable fields.


Canonicalized Hashing
---------------------

A JSON object is canonicalized by converting it to an ASCII byte array with the
algorithm specified in `RFC 8785 <https://tools.ietf.org/html/rfc8785>`__.  The
resulting bytes are terminated with a single 0-byte and then hashed with
SHA512.


Test vector
-----------

The follwing input contains top-level and nested forgettable
fields, as well as booleans, integers, strings and objects
as well as non-forgettable fields.  It is thus suitable as
a minimal interoperability test:

.. code-block:: json

  {
    "k1": 1,
    "_forgettable": {
      "k1": "SALT"
    },
    "k2": {
      "n1": true,
      "_forgettable": {
        "n1": "salt"
      }
    },
    "k3": {
      "n1": "string"
    }
  }

Hashing the above contract results in the following Crockford base32 encoded
hash
``287VXK8T6PXKD05W8Y94QJNEFCMRXBC9S7KNKTWGH2G2J2D7RYKPSHNH1HG9NT1K2HRHGC67W6QM6GEC4BSN1DPNEBCS0AVDT2DBP5G``.

Note that typically the salt values must be chosen at random, only for this test we use static salt values.



Discussion / Q&A
================

* It is not completely clear which parts of the contract terms
  should be forgettable.  This should be individually decided
  by the merchant based on applicable legislation.

* Is it really necessary that there is one salt per forgettable member?
  We could also have a "contract terms global" salt, and then
  use the global salt **and** the path of the forgettable field
  as the salt for hashing.

* Why do we require the 0-termination in the hash / kdf?  Doesn't seem to match what
  e.g. ``shasum`` does.

* Why do we not supply any "info" string (= context chunks in the GNUNET_CRYPTO_kdf terminology)
  to the hkdf?  Does it matter?

* We could also delete the corresponding ``$forgettable`` entry after
  forgetting a member.  This would save storage.  But to prove that a certain
  forgettable info matches the contract terms, the prover would need to
  also store/provide the salt.

* What validations should the wallet do?  Should the wallet ever accept
  contract terms where fields are already forgotten?