summaryrefslogtreecommitdiff
path: root/design-documents/023-taler-kyc.rst
blob: 85478d8e767c649fe6993dd4885cfeed1557361c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
DD 23: Taler KYC
################

Summary
=======

This document discusses the Know-your-customer (KYC) and Anti-Money Laundering
(AML) processes supported by Taler.


Motivation
==========

To legally operate, Taler has to comply with KYC/AML regulation that requires
banks to identify parties involved in transactions at certain points.


Requirements
============

Taler needs to take *measures* based on the following primary *triggers*:

* Customer withdraws money over a monthly threshold

  * exchange triggers KYC
  * key: IBAN (encoded as payto:// URI)

* Wallet receives (via refunds) money resulting in a balance over a threshold

  * this is a client-side restriction
  * key: reserve (=KYC account) long term public key per wallet (encoded as payto:// URI)

* Wallet receives money via P2P payments

  * there are two sub-cases: PUSH and PULL payments
  * key: reserve (=KYC account) long term public key per wallet (encoded as payto:// URI)

* Merchant receives money (Q: any money, or above a monthly threshold?)

  * key: IBAN (encoded as payto:// URI)

* Reserve is "opened" for invoicing.

  * key: reserve (=KYC account) long term public key per wallet (encoded as payto:// URI)

* Import of new sanctions lists and triggering of measures against matches of existing
  customer records against the list


Process requirements
^^^^^^^^^^^^^^^^^^^^

The key consideration here is *plausibilization*: staff needs to
check that the client-provided information is plausible. As this
is highly case-dependent, this cannot be automated.

For the different *measures*, there are various different possible KYC/AML
*checks* that could happen:

* In-person validation by AML staff
* Various forms to be filled by AML staff
* Validation involving local authorities and post-office
* Online validation, sometimes with multiple options (like KYC for multiple people):

  * Forms to be supplied by user (different types of ID)
  * Interactive video
  * Documents to be supplied (business register)
  * Address validation (e-mail or phone or postal)

Additionally, the process is dynamic and conditional upon various decisions:

* Individual vs. business
* PEP or non-PEP
* Hit on sanctions list
* Type of business (trust, foundation, listed on stock market, etc.)
* Need for plausibilization (via documents by user or staff research)
* Periodic updates (of customer data, of sanction lists) and re-assessment

There are also various *outcomes*:

* normal operation (with expiration date)
* normal operation but with AML staff investigating (new measure)
* held, requesting customer documentation (new measure)
* held, AML staff reviewing evidence for plausibilization (new measure)
* automatically frozen until certain day (due to sanctions)
* institutionally frozen until certain day (due to order by state authority)

The outcome of a *check* can trigger further *measures* (including
expiration of the outcome state).

As a result, we largely end up in a large state machine where the AML staff has
serious flexibiltiy while the user needs guidance as to the possible next moves
and/or to the current state of their account (where some information must not be
disclosed).


Documentation requirements
^^^^^^^^^^^^^^^^^^^^^^^^^^

For each account we must:

* define risk-profile (902.4, 905.1)
* document the specific setup, likely not just the INI file
* should have some key AMLA file attributes, such as:

  * File opened, file closed (keep data for X years afterwards!)
  * low-risk or high-risk business relationship
  * PEP status
  * business domain
  * authority notification dates (possibly multiple) with
    voluntary or mandatory notification classification

Finally, we need to produce statistics:

* There must be a page with an overview of AMLA files with opening
  and closing dates and an easy way to determine for any day the
  number of open AMLA files
* Technically, we also need a list of at-risk transactions and of
  frozen transactions, but given that we can really only freeze
  on an account-basis, I think there is nothing to do here
* number of incidents reported (voluntarily, required)
* number of business relationships at any point in time
* number of risky business relationships (PEP, etc.)
* number of frozen transactions (authority vs. sanction) with start-date and end-date
* start-data and end-date of relationships (data retained for X years after end of relationship)

For this high-level monitoring, we need certain designated critical events to
be tracked in the system statistics:

* account opened
* set to high risk
* set to low risk
* suspicious activity report filed with authority
* account frozen
* account unfrozen
* account closed


Further considerations
^^^^^^^^^^^^^^^^^^^^^^

On top of all of this, we need to plan some *diagnostics* to determine when
components fail (such as scripts or external services providing malformed
results).

Optionally, in the future, the solution should support fees to be paid by the
user for *voluntary* KYC processes related to attestation (#7365).


Proposed Solution
=================

We allow certain *conditions* to *trigger* a single specific *measures*.
For the different *measures*, we define:

* Who has to do something (AML staff, user, nobody)
* Contextual input data to be provided (with templating, e.g. amount set dynamically based on the *trigger*)
* A *check* to be performed (user-interactive or staff-interactive)
* Another *measure* to take on failure of a user-interactive check
* A *program* that uses data from the *check* as well as *context* data
  to determine an *outcome* which is the specific operational state
  (normal, held on staff, held on user, frozen, etc.) the account is to transition to
* What information about the state to show to the user (normal, information required, pending, etc.)

For the user-interactive checks we need a SPA (for KYC) that is given:

* instructions to render (with either a form to fill or links to external checks);
  here the context could provide an array of choices!
* possibly an external check that was set up (if any); for cost-reasons, we should only do one at a time,
  and probably should then always redirect the browser to that check.

For the staff-interactive checks we need a SPA (for AML):

* to file forms and upload documentation (without state transition)
* to decide on next measure (providing context); here, the exchange needs
  to expose the list of available *measures* and required *context* for each

For non-interactive measures (normal operation, account frozen) we need:

* Expiration time (in context)
* Measure to trigger upon expiration, again with context
  (renew documents, resume normal operation, etc.)

We need some customer-driven interactivity in KYB/KYC process, for example the
user may need to be given choices (address vs. phone, individual vs. business,
order in which to provide KYC data of beneficiaries). As a result, the
exchange needs to serve some SPA for measures where the user is shown the next
step(s) or choices (which person to collect KYC data on, whether to run
challenger on phone number of physical address, etc.).  The SPA should also
potentially contain a form to allow the customer to directly upload documents
to us (like business registration) instead of to some KYC provider. This is
because KYC providers may not be flexible enough.

Similarly, the AML staff will need to be able to trigger rather complex
KYB/KYC processes, like "need KYC on X and Y and Z" or "phone number or
mailing address" or "please upload form A/T/S".  Here in particular it
should be possible to request not only filled forms, but arbitrary
documents.


Terminology
^^^^^^^^^^^

* **Check**: A check establishes a particular attribute of a user, such as their name based on an ID document and lifeness, mailing address, phone number, taxpayer identity, etc.  Checks may be given *context* (such as whether a customer is an individual or a business) to run correctly. Checks can also be AML staff inserting information for plausibilization.  Checks result in an *outcome* being decided by an external AML *program*.

* **Condition**: A condition specifies when KYC is required. Conditions include the *type of operation*, a threshold amount (e.g. above EUR:1000) and possibly a time period (e.g. over the last month).

* **Configuration**: The configuration determines the *legitimization rules*, and specifies which providers offer which *checks* at what *cost*.

* **Context**: Context is information provided as input into a *check* and *program* to customize their execution. The context is initially set by the *trigger*, but may evolve as the *account* undergoes *measures*. For each *check* and *program*, the required *context* data must be specified.

* **Cost**: How much would a client have to pay for a KYC process (if they voluntarily choose to do so for attestation).

* **Expiration**: KYC legitimizations may be outdated. Expiration rules determine when *checks* have to be performed again.

* **Legitimization rules**: The legitimization rules determine under which *conditions* which *checks* must be performend and the *expiration* time period for the *checks*.

* **Logic**: Logic refers to a specific bit of code (realized as an exchange plugin) that enables the interaction with a specific *provider*.  Logic typically requires *configuration* for access control (such as an authorization token) and possibly the endpoint of the specific *provider* implementing the respective API.

* **Measure**: Describes the possible outgoing edges from one state in the state machine (including how to show the current state). Each edge is given some *context* and a *check* to be performed as well as a *program* to decide the *outcome* and the next *measure*.

* **Outcome**: Describes the account state that an account ends up in due to the result of a *check*. Outcomes can be that an account is frozen (no transactions possible until freeze expires), held (no transactions possible until another *measure* has been taken), or operating normally.

* **Provider**: A provider performs a specific set of *checks* at a certain *cost*. Interaction with a provider is performed by provider-specific *logic*.

* **Program**: An AML helper *program* is given *context* about the current state of an account and the data from a *check* to compute the *outcome*.  For example, a *program* may look at the "PEP" field of a KYC check and decide if the outcome is to put the account into ``normal`` or ``held-for-manual-review`` state.  A *program* operating on an AML form filed by AML staff will likely be trivial and directly apply the explicit decision taken by the staff member.

* **Type of operation**: The operation type determines which Taler-specific operation has triggered the KYC requirement. We support four types of operation: withdraw (by customer), deposit (by merchant), P2P receive (by wallet) and (high) wallet balance.


New Endpoints
^^^^^^^^^^^^^

The new ``/kyc-check/`` endpoint is based on the legitimization requirements
serial number and receives the business vs. individual status from the client.
Access is ``authenticated`` by also passing the hash of the payto://-URI.
(Weak authentication is acceptable, as the KYC status or the ability to
initiate a KYC process are not very sensitive.)  Given this triplet, the
``/kyc-check/`` endpoint returns either the (positive) KYC status or redirects
the client (202) to the next required stage of the KYC process.  The
redirection must be for an HTTP(S) endpoint to be triggered via a simple HTTP
GET.  It must always be the same endpoint for the same client, as the
wallet/merchant backend are not required to check for changes to this
endpoint.

A new set of ``/kyc-spa/$HASH`` GET endpoints is created per client ``$HASH`` that
serves the KYC SPA.  This is where the ``/kyc-check/`` endpoint will redirect
clients unless all KYC/AML requirements are satisfied.  The KYC SPA will
use the ``$HASH`` of its URL to initialize itself via the ``/kyc-info/$HASH``
endpoint family.

A new set of ``/kyc-info/$HASH`` GET endpoints is created per client ``$HASH``
to return information about the state of the KYC or AML process to the client.
The SPA uses this information to show the user an appropriate dialog. The SPA
should also long-poll this endpoint for changes to the AML/KYC state. Note
that this is a client-facing endpoint, so it will only provide a restricted
amount of information to the customer (as some laws may forbid us to inform
particular customers about their true status).  The endpoint will typically
inform the SPA about possible choices to proceed, such as directly uploading
files, contacting AML staff, or proceeding with a particular KYC process at
an external provider (such as Challenger).  If the user chooses to initate a
KYC process at an external provider, the SPA must request the respective
process to be set-up by the exchange via the ``/kyc-start/`` endpoint.

The new ``/kyc-upload/$ID`` POST endpoint allows the SPA to upload
client-provided evidence.  The ``$ID`` will be provided as part of the
``/kyc-info`` body.  This is for checks of type ``FORM``.

The new ``/kyc-start/$ID`` POST endpoint allows the SPA to set up a new
external KYC process. It will return the (GET) URL that the client must open
to begin the KYC process. The SPA should probably open this URL in a new
window or tab.  The ``$ID`` will be provided as part of the ``/kyc-info``
body.  As this endpoint is involved in every KYC check at the beginning, this
is also the place where we could integrate the payment process for the KYC fee
in the future.

Upon completion of the process at the external KYC provider, the provider must
trigger a GET request to a new ``/kyc-proof/$H_PAYTO/$PROVIDER_SECTION``
endpoint.  This may be done either by redirecting the browser of the user to
that endpoint.  Once this endpoint is triggered, the exchange will pass the
received arguments to the respective logic plugin.  The logic plugin will then
(asynchronously) update the KYC status of the user.  The logic plugin should
return a human-readable HTML page with the KYC result to the user.

Alternatively, the KYC confirmation may be triggered by a ``/kyc-webhook``
request. As KYC providers do not necessarily support passing detailed
information in the URL arguments, the ``/kyc-webhook`` only needs to specify
either the ``PROVIDER_SECTION`` *or* the ``LOGIC`` (the name of the plugin
implementing the KYC API).  The API-specific webhook logic must then figure
out what exactly the webhook is about on its own.  The ``/kyc-webhook/``
endpoint works for GET or POST, again as details depend on the KYC provider.
In contrast to ``kyc-proof``, the response does NOT go to the end-users'
browser and should thus only indicate success or failure.

The new ``/wallet-kyc`` POST endpoint allows a wallet to notify an exchange
if it will cross a balance threshold.  Here, the ``balance`` specified should
be the threshold (from the ``wallet_balance_limit_without_kyc`` array) that
the wallet would cross, and *not* the *exact* balance of the wallet.  The
exchange will respond with a wire target UUID. The wallet can then use this
UUID to being the KYC process at ``/kyc-check/``. The wallet must only proceed
to obtain funds exceeding the threshold after the KYC process has
concluded. While wallets could be "hacked" to bypass this measure (we cannot
cryptographically enforce this), such modifications are a terms of service
violation which may have legal consequences for the user.


To enable the AML staff SPA to give AML staff a choice of possible measures, a
new endpoint ``/aml/$OFFICER_PUB/measures`` is added that allows the AML SPA
to dynamically GET the list of available measures.
It returns a list of known KYC checks (by name)
with their descriptions and a list of AML programs
with information about the required context.


Modifications to existing endpoints
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When withdrawing, the exchange checks if the KYC status is acceptable.  If no
KYC was done and if either the amount withdrawn over a particular timeframe
exceeds the threshold or the reserve received received a P2P transfer, then a
``451 Unavailable for Legal Reasons`` is returned which redirects the consumer
to the new ``/kyc-check/`` handler.

When depositing, the exchange aggregator (!) checks the KYC status and if
negative, returns an additional information field via the
``aggregation_transient`` table which is returned via GET ``/deposts/`` to the
merchant.  This way, the merchant learns the ``requirement_row`` needed to
begin the KYC process (this is independent of the amount) at the new
``/kyc-check/`` handler.

When merging into a reserve, the KYC status is checked and again the
merge fails with ``451 Unavailable for Legal Reasons`` to trigger the
KYC process.

To allow the wallet to do the KYC check if it is about to exceed a set balance
threshold, we modify the ``/keys`` response to add an optional array
``wallet_balance_limit_without_kyc`` of threshold amounts is returned.
Whenever the wallet crosses one of these thresholds for the first time, it
should trigger the KYC process.  If this field is absent, there is no limit.
If the field is provided, a correct wallet must create a long-term
account-reserve key pair. This should be the same key that is also used to
receive wallet-to-wallet payments. Then, *before* a wallet performs an
operation that would cause it to exceed the balance threshold in terms of
funds held from a particular exchange, it *should* first request the user to
complete the KYC process.  For that, the wallet should POST to the new
``/wallet-kyc`` endpoint, providing its long-term reserve-account public key
and a signature requesting permission to exceed the account limit.


Configuration of external KYC providers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For each KYC provider that could contribute to checks the configuration
specifies a ``$PROVIDER_SECTION`` for each authentication procedure.  For each
(enabled) provider, the exchange has a logic plugin which (asynchronously)
determines the redirect URL for a given wire target. See below for a
description of the high-level process for different providers.

  .. code-block:: ini

    [kyc-provider-$PROVIDER_ID]

    # Which plugin is responsible for this provider?
    LOGIC = PLUGIN_NAME

    # Optional cost, useful if clients want to voluntarily
    # trigger authentication procedures for attestation.
    COST = EUR:5

    # Plus additional logic-specific options, e.g.:
    AUTHORIZATION_TOKEN = superdupersecret

    # Other logic-specific internal options (example):
    FORM_ID = business_legi_form

    # Description of the outputs provided by the check.
    # Basically, the check's output is expected to
    # provide the following fields as inputs into
    # a subsequent AML program.
    ATTRIBUTES = business_name street city country registration



Configuration of possible KYC/AML checks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The configuration specifies a set of possible KYC checks offered by external
providers, one per configuration section:

  .. code-block:: ini

    [kyc-check-$CHECK_NAME]

    # Which type of check is this? Also determines
    # the SPA form to show to the user for this check.
    #
    # INFO: wait for staff or contact staff out-of band
    #          (only information shown, no SPA action)
    # FORM: SPA should show an inline (HTML) form
    # LINK: SPA may start external KYC process or upload
    #
    TYPE = INFO|LINK|FORM

    # Optional. Set to YES to allow this check be
    # done voluntarily by a client (they may then
    # still have to pay for it). Used to offer the
    # SPA to display checks even if they are
    # not required. Default is NO.
    VOLUNTARY = YES/NO

    # Provider name, if type is LINK
    PROVIDER_NAME = name

    # Provider name, if type is FORM
    FORM_NAME = name

    # Descriptions to use in the SPA to display the check.
    DESCRIPTION = "Upload your passport picture"
    DESCRIPTION_I18N = "{"en":"Upload scan of your passport"}"

    # ';'-separated list of fields that the CONTEXT must
    # provided as inputs to this check. For example,
    # for a FORM of type CHOICE, this might state
    # ``choices: String[];``. The type after the ":"
    # is for now purely for documentation and is
    # not checked. However, it may be shown to AML staff
    # when they configure measures.
    REQUIRES = requirement;

    # **original** measure to take if the check fails
    # (for any reason, e.g. provider or form fail to
    # satisfy constraints or provider signals user error)
    # Usually should point to a measure that requests
    # AML staff to investigate.  The fallback measure
    # context always includes the reasons for the
    # failure.
    FALLBACK = MEASURE_NAME

The list of possible FORM names is fixed in the SPA
for a particular exchange release.

The outcome of *any* check should always be uploaded encrypted into the
``kyc_attributes`` table.  It MUST include an ``expiration_time``.


Configuration of legitimization requirement triggers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The configuration also specifies a set of legitimization rules including the
condition and the measure the condition triggers, one condition per
configuration section:

  .. code-block:: ini

    [kyc-rule-$RULE_NAME]

    # Operation that triggers this legitimization.
    # Must be one of WITHDRAW, DEPOSIT, P2P-RECEIVE
    # or WALLET-BALANCE.
    OPERATION_TYPE = WITHDRAW

    # Next measures to be performed. The SPA should
    # display *all* of these measures to the user.
    # (they have a choice of either which ones, or in
    # which order they are to be performed).
    NEXT_MEASURES = SWISSNESS KYB

    # Context for each of the above measures, optional.
    MEASURE_CONTEXT_$NAME = CONTEXT

    # AND if all REQUIRED_MEASURES will eventually need
    # to be satisfied, OR if the user has a choice between
    # them. Not actually enforced by the exchange, but
    # primarily used to inform the user whether this is
    # an "and" or "or".
    COMBINATOR = AND|OR

    # Threshold amount above which the legitimization is
    # triggered.  The total must be exceeded in the given
    # timeframe. Can be 'forever'.
    THRESHOLD = AMOUNT

    # Timeframe over which the amount to be compared to
    # the  THRESHOLD is calculated.
    # Ignored for WALLET-BALANCE.
    TIMEFRAME = DURATION

    # Enabled (default is NO)
    ENABLED = NO


AML programs
^^^^^^^^^^^^

AML programs are helper programs that can:

* Generate a list of *required* context field names
  for the helper (introspection!) using the "--required-context"
  command-line switch. The output should use the same
  syntax as the REQUIRES clause of ``[kyc-check-]``
  configuration sections, except that new lines
  MUST be used to separate fields instead of ";".
* Generate a list of *required* attribute names
  for the helper (introspection!) using the "--required-attributes"
  command-line switch. The output should use the same
  list of names as the ATTRIBUTES in the
  ``[kyc-provider-]`` configuration section
  (but may also include FORM field names).
* Process an input JSON object with context and
  attributes into an *outcome*.  This is the
  default behavior if no command-line switches
  are provided.

AML programs are listed in the configuration file,
one program per section:

  .. code-block:: ini

    [aml-program-$PROG_NAME]

    # Program to run.
    COMMAND = taler-helper-aml-pep

    # Enabled (default is NO)
    ENABLED = NO

    # **original** measure to take if COMMAND fails
    # Usually points to a measure that asks AML staff
    # to contact the systems administrator. The fallback measure
    # context always includes the reasons for the
    # failure.
    FALLBACK = MEASURE_NAME

The JSON input of an AML program consists of three parts:

* "context": JSON object that was provided as
  part of the *measure*.  This JSON object is
  provided under "context" in the main JSON object
  input to the AML program.  This "context" should
  satify both the REQUIRES clause of the respective
  check and the output of "--requires" from the
  AML program's command-line option.
* "attributes": JSON object that captures the
  output of a ``[kyc-provider-]`` or (HTML) FORM.
  The keys in the JSON object will be the attribute
  names and the values must be strings representing
  the data. In the case of file uploads, the data
  MUST be base64-encoded.
* "history": JSON array with the results of historic
  data collected about the user.

The output of an AML programs must be JSON objects which must state:

* outcome: what to do with the client's account
* expiration_date: when does the decision expire (zero to take the next measure immediately)
* combinator: "AND" if all of the 'next_measures' will eventually need to be satisfied, "OR" if those are choices and the user only has to satisfy one of them.
* next_measures: measures to trigger upon expiration of the current outcome; array entries must be either a string with the name of an **original** context-free measure, or JSON objects with the same information that is usually in ``[kyc-measures-]`` configuration sections (with a check name, context, and AML program name).

If the AML program fails (exits with a failure code or
does not provide well-formed JSON output) the AML/KYC
process continues with the FALLBACK measure. This should
usually be one that asks AML staff to contact the
systems administrator.


Configuration of measures
^^^^^^^^^^^^^^^^^^^^^^^^^

Finally, the configuration specifies a set of
**original** *measures* one per configuration section:

  .. code-block:: ini

    [kyc-measure-$MEASURE_NAME]

    # Possible check for this measure.  Optional.
    # If not given, PROGRAM should be run immediately
    # (on an empty set of attributes).
    CHECK_NAME = IB_FORM

    # Context for the check.
    # The context can be
    # just an empty JSON object if there is none.
    CONTEXT = {"choices":["individual","business"]}

    # Program to run on the context and check data to
    # determine the outcome and next measure.
    PROGRAM = taler-aml-program

If no ``CHECK_NAME`` is provided at all, the
AML ``PROGRAM`` is to be run immediately.  This is
useful if no client-interaction is required to
arrive at a decision.

.. note::

   The list of *measures* is not complete: AML staff may freely define new
measures dynamically, usually by selecting checks, an AML program, and
providing context.


Sanity checking
^^^^^^^^^^^^^^^

On start-up, ``taler-exchange-httpd`` should sanity-check its
configuration. Specifically, it should validate that for all AML programs the
input requirements (attributes and context) are claimed to be satisfied by the
respective checks that may trigger those programs, and similarly that for all
checks the original measures satisfy the context requirements for their KYC
checks.

As a result, any component (AML program, form or external check) is warranted
to be always called with the declared required inputs. Furthermore, we can
detect if a component fails to produce the required output and the
configuration contains (presumably safe) FALLBACKs to address this case.  The
exchange *MUST* detect circular failures, like when a FALLBACK triggers a
measure that itself immediately triggers again the same FALLBACK.


Exchange database schema
^^^^^^^^^^^^^^^^^^^^^^^^

We introduce a new ``wire_targets`` table into the exchange database. This
table is referenced as the source or destination of payments (regular deposits
and also P2P payments).  A positive side-effect is that we reduce duplication
in the ``reserves_in``, ``wire_out`` and ``deposits`` tables as they can
reference this table.

We introduce a new ``legitimization_processes`` table that tracks the status
of a legitimization process at a provider, including the configuration section
name, the user/account name at the provider, and some legitimization
identifier for the process at the provider.  In this table, we additionally
store information related to the KYC status of the underlying payto://-URI, in
particular when the KYC expires (0 if it was never done).

Finally, we introduce a new ``legitimization_requirements`` table that
contains a list of checks required for a particular wire target.  When KYC is
triggered (say when some endpoint returns an HTTP status code of 451) a
new requirement is first put into the requirements table. Then, when the
client identifies as business or individual the specific legitimization
process is started.  When the taler-exchange-aggregator triggers a KYC check
the merchant can observe this when a 202 (Accepted) status code is returned
on GET ``/deposits/`` with the respective legitimization requirement row.


.. sourcecode:: sql

  CREATE TABLE wire_targets
    (wire_target_serial_id BIGSERIAL UNIQUE
    ,h_payto BYTEA NOT NULL CHECK (LENGTH(h_payto)=64),
    ,payto_uri STRING NOT NULL
    ,PRIMARY KEY (h_payto)
    )
    PARTITION BY HASH (h_payto);

  COMMENT ON TABLE wire_targets
    IS 'All recipients of money via the exchange';
  COMMENT ON COLUMN wire_targets.payto_uri
    IS 'Can be a regular bank account, or also be a URI identifying a reserve-account (for P2P payments)';
  COMMENT ON COLUMN wire_targets.h_payto
    IS 'Unsalted hash of payto_uri';

  CREATE TABLE IF NOT EXISTS legitimization_measures
    (legitimization_measure_serial_id BIGINT GENERATED BY DEFAULT AS IDENTITY
    ,h_payto BYTEA NOT NULL CHECK (LENGTH(h_payto)=32)
     REFERENCES wire_targets (h_payto)
    ,start_time INT8 NOT NULL
    ,jmeasures VARCHAR[] NOT NULL
    ,is_and_combinator BOOL NOT NULL DEFAULT(FALSE)
    ,is_finished BOOL NOT NULL DEFAULT(FALSE)
    )
    PARTITION BY HASH (h_payto);

  COMMENT ON COLUMN legitimization_requirements.start_time
    IS 'Time when the measure was triggered (by decision or rule)';
  COMMENT ON COLUMN legitimization_requirements.is_finished
    IS 'Set to TRUE if this set of measures was processed; used to avoid indexing measures that are done';
  COMMENT ON COLUMN legitimization_requirements.is_and_combinator
    IS 'Set to TRUE each of the measures will ultimately need to be satisfied; FALSE if the user has the choice to satisfy one of them';
  COMMENT ON COLUMN legitimization_requirements.jmeasures
    IS 'array with KYC/AML measures for the account encoded in JSON';

  CREATE INDEX ON legitimization_measures (h_payto)
    WHERE NOT finished;

  CREATE TABLE legitimization_outcomes
    (outcome_serial_id BIGINT GENERATED BY DEFAULT AS IDENTITY
    ,h_payto BYTEA CHECK (LENGTH(h_payto)=32)
     REFERENCES wire_targets (h_payto)
    ,decision_time INT8 NOT NULL DEFAULT(0)
    ,expiration_time INT8 NOT NULL DEFAULT(0)
    ,jproperties VARCHAR,
    ,to_investigate BOOL NOT NULL
    ,is_frozen BOOL NOT NULL
    ,is_reported BOOL NOT NULL
    ,is_active BOOL NOT NULL DEFAULT(TRUE)
    ,new_rules NOT NULL TEXT
    )
    PARTITION BY HASH (h_payto);

  COMMENT ON TABLE legitimization_outcomes
    IS 'Outcomes can come from AML programs';
  COMMENT ON COLUMN legitimization_outcomes.h_payto
    IS 'hash of the payto://-URI this outcome is about';
  COMMENT ON COLUMN legitimization_outcomes.decision_time
    IS 'when was this outcome decided';
  COMMENT ON COLUMN legitimization_outcomes.expiration_time
    IS 'time when the decision expires and the expiration new_rules should be applied';
  COMMENT ON COLUMN legitimization_outcomes.jproperties
    IS 'JSON with account properties, such as PEP status, business domain, risk assessment, etc.';
  COMMENT ON COLUMN legitimization_outcomes.to_investigate
    IS 'AML staff should investigate the activity of this account';
  COMMENT ON COLUMN legitimization_outcomes.is_frozen
    IS 'Transactions with this account should be held (until expiration data or AML staff action)';
  COMMENT ON COLUMN legitimization_outcomes.is_reported
    IS 'Set to TRUE if the activity of the account was reported to authorities';
  COMMENT ON COLUMN legitimization_outcomes.is_active
    IS 'TRUE if this is the current authoritative legitimization outcome';
  COMMENT ON COLUMN legitimization_outcomes.new_rules
    IS 'JSON-encoding of KYC-rules to apply to the various operation types for this account; KYC check should first check if active new rules for a given account exist (and apply specified measures); if not, it should check the default rules to decide if a measure is required';

  CREATE INDEX legitimization_outcomes_active
    ON legitimization_outcomes(h_payto)
    WHERE is_active;

  CREATE TABLE kyc_setups
    (kyc_setup_serial_id BIGSERIAL UNIQUE
    ,h_payto BYTEA NOT NULL CHECK (LENGTH(h_payto)=64)
     REFERENCES wire_targets (h_payto)
    ,start_time INT8 NOT NULL
    ,expiration_time INT8 NOT NULL DEFAULT (0)
    ,legitimization_measure_serial_id BIGINT
     REFERENCES legitimization_measures (legitimization_measure_serial_id)
    ,measure_index INT8'
    ,provider_section VARCHAR NOT NULL
    ,provider_user_id VARCHAR DEFAULT NULL
    ,provider_legitimization_id VARCHAR DEFAULT NULL
    ,redirect_url TEXT DEFAULT NULL
    ,finished BOOLEAN DEFAULT (FALSE)
    )
    PARTITION BY HASH (h_payto);

  COMMENT ON TABLE kyc_setups
    IS 'here we track KYC processes we initiated with external providers; the main reason is so that we do not initiate a second process when an equivalent one is still active; note that h_payto, provider_section, jcontext must match and the process must not be finished or expired for an existing redirect_url to be re-used; given that clients may voluntarily initiate KYC processes, there may not always be a legitimization_measure that triggered the setup';
  COMMENT ON COLUMN kyc_setups.h_payto
    IS 'foreign key linking the entry to the wire_targets table, NOT a primary key (multiple KYC setups are possible per wire target)';
  COMMENT ON COLUMN kyc_setups.start_time
    IS 'when was the legitimization process initiated';
  COMMENT ON COLUMN kyc_setups.expiration_time
    IS 'when does the process expire (and needs to be manually set up again)';
  COMMENT ON COLUMN kyc_setups.measure_index
    IS 'index of the measure in legitimization_measures that was selected for this KYC setup; NULL if legitimization_measure_serial_id is NULL; enables determination of the context data provided to the external process';
  COMMENT ON COLUMN kyc_setups.provider_section
    IS 'Configuration file section with details about this provider';
  COMMENT ON COLUMN kyc_setups.provider_user_id
    IS 'Identifier for the user at the provider that was used for the legitimization. NULL if provider is unaware.';
  COMMENT ON COLUMN kyc_setups.provider_legitimization_id
    IS 'Identifier for the specific legitimization process at the provider. NULL if legitimization was not started.';
  COMMENT ON COLUMN kyc_setups.legitimization_measure_serial_id
    IS 'measure that enabled this setup, NULL if client voluntarily initiated the process';
  COMMENT ON COLUMN kyc_setups.redirect_url
    IS 'Where the user should be redirected for this external KYC process';
  COMMENT ON COLUMN kyc_setups.finished
    IS 'set to TRUE when the specific legitimization process is finished';

  CREATE TABLE kyc_attributes
    (kyc_attributes_serial_id BIGINT GENERATED BY DEFAULT AS IDENTITY
    ,h_payto BYTEA PRIMARY KEY CHECK (LENGTH(h_payto)=32)
     REFERENCES wire_targets (h_payto)
    ,kyc_prox BYTEA NOT NULL CHECK (LENGTH(kyc_prox)=32)
    ,kyc_setup_serial_id INT8 NOT NULL
     REFERENCES kyc_setups (kyc_setup_serial_id)
    ,collection_time INT8 NOT NULL
    ,expiration_time INT8 NOT NULL
    ,trigger_outcome_serial INT8 NOT NULL
     REFERENCES legitimization_outcomes(outcome_serial_id)
    ,encrypted_attributes BYTEA NOT NULL
    ) PARTITION BY HASH (h_payto);

  COMMENT ON COLUMN kyc_attributes.h_payto
    IS 'identifies the account this is about';
  COMMENT ON COLUMN kyc_attributes.kyc_prox
    IS 'for proximity search on encrypted data';
  COMMENT ON COLUMN kyc_attributes.kyc_setup_serial_id
    IS 'serial ID of the KYC setup that resulted in these attributes';
  COMMENT ON COLUMN kyc_attributes.collection_time
    IS 'when were these attributes collected';
  COMMENT ON COLUMN kyc_attributes.expiration_time
    IS 'when are these attributes expected to expire';
  COMMENT ON COLUMN kyc_attributes.trigger_outcome_serial
    IS 'ID of the outcome that was returned by the AML program based on the KYC data collected';
  COMMENT ON COLUMN kyc_attributes.encrypted_attributes
    IS 'encrypted JSON object with the attribute data the check provided';

  CREATE TABLE aml_history
    (aml_history_serial_id BIGINT GENERATED BY DEFAULT AS IDENTITY
    ,h_payto BYTEA CHECK (LENGTH(h_payto)=32)
     REFERENCES wire_targets (h_payto)
    ,legitimization_outcome INT8 NOT NULL
     REFERENCES legitimization_outcomes (outcome_serial_id)
    ,justification TEXT NOT NULL
    ,decider_pub BYTEA CHECK (LENGTH(decider_pub)=32)
    ,decider_sig BYTEA CHECK (LENGTH(decider_sig)=64);

  COMMENT ON TABLE aml_history
    IS 'Records decisions by AML staff with the respective signature and free-form justification.';
  COMMENT ON COLUMN aml_history.legitimization_outcome
    IS 'Actual outcome for the account (included in what decider_sig signs over)';
  COMMENT ON COLUMN aml_history.decider_sig
    IS 'Signature key of the staff member affirming the AML decision; of type AML_DECISION';

  CREATE TABLE kyc_events
    (event_serial_id BIGINT GENERATED BY DEFAULT AS IDENTITY
    ,event_timestamp INT8 NOT NULL
    ,event_type TEXT NOT NULL);

  COMMENT ON TABLE kyc_events
    IS 'Records of key events for statistics. Populated via triggers.';
  COMMENT ON COLUMN kyc_events.event_type
    IS 'Name of the event, such as account-open or sar-filed';

  CREATE INDEX kyc_event_index
    ON kyc_events(event_type,event_timestamp);

KYC forms
^^^^^^^^^

The KYC SPA run by clients needs to support three TYPEs of checks. INFO is
only about displaying the provided information, LINK is about setting up an
exteral KYC check and redirecting there. FORM is about displaying a particular
(HTML) form to the user and POSTing the entered information directly with the
exchange.  Here we describe the forms that must be supported:

* CHOICE: Asks the client a multiple-choice question.
  The context must include "choices: String[]" with
  a list of choices to show.  Used, for example, to
  ask a client if they are an individual or a business.
  The resulting HTML FORM field name must be
  "choice" and it must be mapped to strings from the
  choices list.
* UPLOAD: Asks the client to upload a single file.
  The context may include "extensions: String[]" with
  a list of allowed file extensions the client's file
  must end with (e.g. "png", "pdf", "gif").  In the
  absence of this context, any file may be uploaded.
  The context may also include "size_limit: Integer" with
  the maximum file size in bytes that can be uploaded.
  The resulting HTML FORM must have two fields,
  "filename" and "filedata".  "filename" must be
  set to the basename of the original file (to the
  extend that it is available), and "filedata"
  to the base64-encoding of the uploaded data.

As with other SPA checks, the KYC form should also show
the description of the check.


Merchant modifications
^^^^^^^^^^^^^^^^^^^^^^

A new setting is required where the merchant backend
can be configured for a business (default) or individual.

We introduce new ``kyc_ok``, ``aml_decision``, ``kyc_timestamp`` and
``exchange_kyc_serial`` fields into a new table ``merchant_kyc`` with primary
keys ``exchange_url`` and ``account_serial``.  This status is updated whenever
a deposit is created or tracked, or whenever the mechant backend receives a
``/kyc-check/`` response from the exchange.  Initially,
``exchange_kyc_serial`` is zero, indicating that the merchant has not yet made
any deposits and thus does not have an account at the exchange.

A new private endpoint ``/kyc`` is introduced which allows frontends to
request the ``/kyc`` status of any configured account (including with long
polling).  If the KYC status is negative or the ``kyc_timestamp`` not recent
(say older than one month), the merchant backend will re-check the KYC status
at the exchange (and update its cached status).  The endpoint then returns
either that the KYC is OK, or information (same as from the exchange endpoint)
to begin the KYC process.

The merchant backend uses the new field to remember that a KYC is pending
(after detection in ``taler-merchant-depositcheck``) and the SPA then shows a
notification whenever the staff is logged in to the system.  The notification
can be hidden for the current day (remembered in local storage).

The notification links to a (new) KYC status page. When opened, the KYC SPA
first re-checks the KYC status with the exchange.  If the KYC is still
unfinished, that SPA will show forms, links or contact information to begin
the KYC process (for example, redirecting to the OAuth 2.0 login page of the
legitimization resource server), otherwise it shows that the KYC process is
done. If the KYC is unfinished, the merchant SPA should use long-polling on
the KYC status on this page to ensure it is always up-to-date, and change to
``KYC satisfied`` should the long-poller return with positive news.

  ..note::

    Semi-related: The TMH_setup_wire_account() is changed to use
    128-bit salt values (to keep ``deposits`` table small) and checks for salt
    to be well-formed should be added "everywhere".



Bank requirements
^^^^^^^^^^^^^^^^^

The exchange primarily requires a KYC provider to be operated by the
bank that offers an endpoint for with an API implemented by one of
the logic plugins (and the respective legitimization configuration).


Logic plugins
^^^^^^^^^^^^^

The ``$PROVIDER_SECTION`` is based on the name of the configuration section,
not on the name of the logic plugin (that we call ``$LOGIC``).  Using the
configuration section, the exchange then determines the logic plugin to use.

This section describes the general API for all of the supported KYC providers,
as well as some details of how this general API could be implemented by the
logic for different APIs.


General KYC Logic Plugin API
----------------------------

This section provides a sketch of the proposed API for the KYC logic plugins.

* initiation of KYC check (``kyc-check``):

  - inputs:
    + provider_section (for additional configuration)
    + h_payto
  - outputs:
    + success/provider-failure
    + redirect URL (or NULL)
    + provider_user_id (or NULL)
    + provider_legitimization_id (or NULL)

* KYC status check (``kyc-proof``):

  - inputs:
    + provider_section (for additional configuration)
    + h_payto
    + provider_user_id (or NULL)
    + provider_legitimization_id (or NULL)
  - outputs:
    + success/pending/user-aborted/user-failure/provider-failure status code
    + HTML response for end-user

* Webhook notification handler (``kyc-webhook``):

  - inputs:
    + HTTP method (GET/POST)
    + rest of URL (after provider_section)
    + HTTP body (if applicable!)
  - outputs:
    + success/pending/user-aborted/user-failure/provider-failure status code
    + h_payto (for DB status update)
    + HTTP response to be returned to KYC provider

The plugins do not directly interact with the database, the caller sets the
expiration on ``success`` and also updates ``provider_user_id`` and
``provider_legitimization_id`` in the tables as required.


For the webhook, we need a way to lookup ``h_payto`` by other data, so the
KYC logic plugin API should be provided a method lookup with:

  - inputs:
    + ``provider_section``
    + ``provider_legitimization_id``
  - outputs:
    + ``h_payto``
    + ``legitimization_process_row``


OAuth 2.0 specifics
-------------------

In terms of configuration, the OAuth 2.0 logic requires the respective client
credentials to be configured apriori to enable access to the legitimization
service.

For the ``/kyc-check/`` endpoint, the OAuth 2.0 logic may need to create and
store a nonce to be used during ``/kyc-proof/``, depending on the OAuth
variant used.  This may require another exchange table.  The OAuth 2.0 process
must then be set up to end at the new ``/kyc-proof/$PROVIDER_ID/`` endpoint.

This ``/kyc-proof/oauth2/`` endpoint must query the OAuth 2.0 server using the
``code`` argument provided as a query parameter. Based on the result, it then
updates the KYC table of the exchange with the legitimization status and
returns a human-readable KYC status page.

The ``/kyc-webhook/`` is not applicable.


Persona specifics
-----------------

We would use the hosted flow. Endpoints return a ``request-id``, which we should
log for diagnosis.

For ``/kyc-check/``:

* Post to ``/api/v1/accounts`` using ``reference-id`` set to our ``h_payto``.
  Returns ``id`` (account_id).

* Create ``/verify`` endpoint using ``template-id`` (from configuration),
  and ``account_id`` (from previous step) and a ``reference-id`` (use
  the ``legitimization_serial_id`` for the new process). Set
  ``redirect-uri`` to ``/kyc-proof/$PROVIDER_ID/``.  However, we cannot
  rely on the user clicking this, so we must also configure a webhook.
  The request returns a '``verification-id``.  That we store under
  the ``provider_legitimization_id`` in the database.

For ``/kyc-proof/``:

* Use the ``/api/v1/verifications`` endpoint to get the verification
  status. Requires the ``verification-id`` from the previous step.
  Results include: created/pending/completed/expired (aborted)/failed.

For ``/kyc-webhook/``:

* The webhook is authenticated using a shared secret, which should
  be in the configuration.  So all we should have to do is parse
  the POSTed body to find the status and the ``verification-id`` to
  lookup ``h_payto`` and return the result.


KYC AID specifics
-----------------

For ``/kyc-check/``:

* Post to ``/applicants`` with a type (person or company) to
  obtain ``applicant_id``. Store that under ``provider_user_id``.
  ISSUE: *we* need to get the company_name, business_activity_id
  and registration_country before this somehow!

* start with create form URL ``/forms/$FORM_ID/urls``
  providing our ``h_payto`` as the ``external_applicant_id``,
  using the ``applicant_id`` from above,
  and the ``/kyc-proof/$PROVIDER_ID`` for the ``redirect_url``.

* redirect customer to the ``form_url``,
  store the ``verification_id`` under ``provider_legitimization_id``
  in the database.

For ``/kyc-proof/``:

* Not needed, just return an error.

For ``/kyc-webhook/``:

* For security, we should probably simply trigger the GET on
  ``/verifications/{verification_id}`` to not trust an unsigned POST
  to tell us anything for sure.  The result is then returned.



Alternatives
============

We could also store the access token (returned by OAuth 2.0), but that seems
slightly more dangerous and given the close business relationship is
unnecessary. Furthermore, not all APIs offer this.

We could extend the KYC logic API to return key attributes about the user
(such as legal name, phone number, address, etc.) which we could then sign and
return to the user.  This would be useful in P2P payments to identify the
origin of an invoice.  However, we might want to be careful to not disclose
the key attributes via the API by accident.  This could likely be done by
limiting access to the respective endpoint to messages with a signature by the
reserve private key (which is the only case where we care to certify things
anyway).


Drawbacks
=========


Discussion / Q&A
================

(This should be filled in with results from discussions on mailing lists / personal communication.)