DD 023: Taler KYC
#################

Summary
=======

This document discusses the Know-your-customer (KYC) processes supported by Taler.


Motivation
==========

To legally operate, Taler has to comply with KYC regulation that requires
banks to identify parties involved in transactions at certain points.


Requirements
============

Taler needs to run KYC checks in the following circumstances:

* Customer withdraws money over a monthly threshold

  * exchange triggers KYC
  * key: IBAN (encoded as payto:// URI)

* Wallet receives (via refunds) money resulting in a balance over a threshold

  * this is a client-side restriction
  * key: reserve (=KYC account) long term public key per wallet (encoded as payto:// URI)

* Wallet receives money via P2P payments

  * key: reserve (=KYC account) long term public key per wallet (encoded as payto:// URI)

* Merchant receives money (Q: any money, or above a monthly threshold?)

  * key: IBAN (encoded as payto:// URI)


Proposed Solution
=================

Terminology
^^^^^^^^^^^

* **Check**: A check establishes a particular attribute of a user, such as their name based on an ID document and lifeness, mailing address, phone number, taxpayer identity, etc.

* **Condition**: A condition specifies when KYC is required. Conditions include the *type of operation*, a threshold amount (e.g. above EUR:1000) and possibly a time period (e.g. over the last month).

* **Configuration**: The configuration determines the *legitimization rules*, and specifies which providers offer which *checks* at what *cost*.

* **Cost**: Metric for the business expense for a KYC check at a certain *provider*. Not in any currency, costs are simply relative and non-negative values. Costs are considered when multiple choices are allowed by the *configuration*.

* **Expiration**: KYC legitimizations may be outdated. Expiration rules determine when *checks* have to be performed again.

* **Legitimization rules**: The legitimization rules determine under which *conditions* which *checks* must be performend and the *expiration* time period for the *checks*.

* **Logic**: Logic refers to a specific bit of code (realized as an exchange plugin) that enables the interaction with a specific *provider*.  Logic typically requires *configuration* for access control (such as an authorization token) and possibly the endpoint of the specific *provider* implementing the respective API.

* **Provider**: A provider performs a specific set of *checks* at a certain *cost*. Interaction with a provider is performed by provider-specific *logic*.

* **Type of operation**: The operation type determines which Taler-specific operation has triggered the KYC requirement. We support four types of operation: withdraw (by customer), deposit (by merchant), P2P receive (by wallet) and (high) wallet balance.


New Endpoints
^^^^^^^^^^^^^

We introduce a new ``wire_targets`` table into the exchange database. This
table is referenced as the source or destination of payments (regular deposits
and also P2P payments).  A positive side-effect is that we reduce duplication
in the ``reserves_in``, ``wire_out`` and ``deposits`` tables as they can
reference this table.  In this table, we additionally store information
related to the KYC status of the underlying payto://-URI.

The new ``/kyc-check/`` endpoint is based on the ``wire_targets`` serial
number.  Access is ``authenticated`` by also passing the hash of the
payto://-URI.  (Weak authentication is acceptable, as the KYC status or the
ability to initiate a KYC process are not very sensitive).  Additionally, a
``type`` argument determines the type of the operation for which the KYC
status is to be checked.  Finally, the client must specify whether the KYC
check is for an individual or a business.  Given this quadruplet, the
``/kyc-check/`` endpoint returns either the (positive) KYC status or redirects
the client (202) to the next required stage of the KYC process.  The
redirection must be for an HTTP(S) endpoint to be triggered via a simple HTTP
GET.

.. Note::

   operation type and individual vs. business are new here, API change!

The specific KYC provider to be executed depends on the configuration (see
below) which specifies a ``$PROVIDER_ID`` for each authentication procedure.
For each (enabled) provider, the exchange has a logic plugin which
(asynchronously) determines the redirect URL for a given wire target. See
below for a description of the high-level process for different providers.

Upon completion of the process at the KYC provider, the provider must trigger
a GET request to a new ``/kyc-proof/$PROVIDER_ID/$H_PAYTO`` endpoint.  This
may be done either by redirecting the browser of the user to that endpoint, or
by using a webhook (which is used may depend on the provider).  Once this
endpoint is triggered, the exchange will pass the received arguments to the
respective logic plugin.  The logic plugin will then (asynchronously) update
the KYC status of the user.  The logic plugin should return a human-readable
HTML page with the KYC result to the user (which will be ignored in case of
a webhook).

.. Note::

   provider ID is new here, API change!


Additionally, a new ``/kyc-webhook/$PROVIDER_ID`` POST endpoint is
required, as some KYC providers send us the result per POST, and here the
response does NOT go to the end-users' browser.  We again should trigger the
plugin-specific logic.

.. Note::

   ``/kyc-webhook/`` is new here, new endpoint!


Legitimization Hooks
^^^^^^^^^^^^^^^^^^^^

When withdrawing, the exchange checks if the KYC status is acceptable.  If no
KYC was done and if either the amount withdrawn over a particular timeframe
exceeds the threshold or the reserve received received a P2P transfer, then a
``202 Accepted`` is returned which redirects the consumer to the new
``/kyc-check/`` handler.

When depositing, the exchange checks the KYC status and if negative, returns
an additional information field that tells the merchant the
``wire_target_serial`` number needed to begin the KYC process (this is
independent of the amount) at the new ``/kyc-check/`` handler.  When tracking
deposits, the exchange also adds the ``wire_target_serial`` to the reply if
the KYC status is negative.  Furthermore, the aggregator is modified to only
SELECT deposits where the ``wire_target`` has the KYC status set to positive
(unless KYC is disabled in the exchange configuration).

FIXME: describe KYC on P2P transfer here.

To allow the wallet to do the KYC check if it is about to exceed a set balance
threshold, we modify the ``/keys`` response to add an optional field
``wallet_balance_limit_without_kyc`` the wallet is allowed to hold in coins
from this exchange without KYC.  If this field is absent, there is no limit.
If the field is provided, a correct wallet must create a long-term
account-reserve key pair. This should be the same key that is also used to
receive wallet-to-wallet payments. Then, before a wallet performs an operation
that would cause it to exceed the balance threshold in terms of funds held
from a particular exchange, it should first request the user to complete the
KYC process.  For that, the wallet should POST to the new ``/wallet-kyc``
endpoint, providing its long-term reserve-account public key and a signature
requesting permission to exceed the account limit.  The exchange will respond
with a wire target UUID. The wallet can then use this UUID to being the KYC
process at ``/kyc-check/``. The wallet must only proceed to obtain funds
exceeding the threshold after the KYC process has concluded. While wallets
could be "hacked" to bypass this measure (we cannot cryptographically enforce
this), such modifications are a terms of service violation which may have
legal consequences for the user.


  ..note::

    Unrelated: We may want to consider directly deleting prewire records
    instead of setting them to ``finished`` in ``taler-exchange-transfer``.


Configuration Options
^^^^^^^^^^^^^^^^^^^^^

The configuration specifies a set of providers, one
per configuration section:

[kyc-provider-$PROVIDER_ID]
# How expensive is it to use this provider?
# Used to pick the cheapest provider possible.
COST = NUMBER
# Which plugin is responsible for this provider?
LOGIC = PLUGIN_NAME
# Which type of user does this provider handle?
# Either INDIVIDUAL or BUSINESS.
USER_TYPE = INDIVIDUAL
# Which checks does this provider provide?
# List of strings, no specific semantics.
PROVIDED_CHECKS = SMS GOVID PHOTO
# Plus additional logic-specific options, e.g.:
AUTHORIZATION_TOKEN = superdupersecret
FORM_ID = business_legi_form
# How long is the check considered valid?
EXPIRATION = DURATION

The configuration also specifies a set of legitimization
requirements, one per configuration section:

[kyc-legitimization-$RULE_NAME]
# Operation that triggers this legitimization.
# Must be one of WITHDRAW, DEPOSIT, P2P-RECEIVE
# or WALLET-BALANCE.
OPERATION_TYPE = WITHDRAW
# Required checks to be performed.
# List of strings, must individually match the
# strings in one or more provider's PROVIDED_CHECKS.
REQUIRED_CHECKS = SMS GOVID
# Threshold amount above which the legitimization is
# triggered.  The total must be exceeded in the given
# timeframe. Can be 'forever'.
THRESHOLD = AMOUNT
# Timeframe over which the amount to be compared to
# the  THRESHOLD is calculated.
# Ignored for WALLET-BALANCE.
TIMEFRAME = DURATION

.. note::

   The required checks / forms generally depend on whether the
   user is an individual person or a business. Right now, we
   cannot tell which one it is! For deposit we may be able to
   presume it is a business and for the rest we could presume
   it is individuals, but this is far from assured (e.g. an
   individual may raise donations for themselves, or a business
   may have a wallet or receive p2p payments).  Thus, we need
   a way to be told the type of entity up-front!


Exchange Database Schema
^^^^^^^^^^^^^^^^^^^^^^^^

.. sourcecode:: sql

  CREATE TABLE IF NOT EXISTS wire_targets
  (wire_target_serial_id BIGSERIAL UNIQUE
  ,h_payto BYTEA NOT NULL CHECK (LENGTH(h_payto)=64),
  ,payto_uri STRING NOT NULL
  ,PRIMARY KEY (h_payto)
  ) SHARD BY (h_payto);
  COMMENT ON TABLE wire_targets
    IS 'All recipients of money via the exchange';
  COMMENT ON COLUMN wire_targets.payto_uri
    IS 'Can be a regular bank account, or also be a URI identifying a reserve-account (for P2P payments)';
  COMMENT ON COLUMN wire_targets.h_payto
    IS 'Unsalted hash of payto_uri';

  CREATE TABLE IF NOT EXISTS legitimizations
  (legitimization_serial_id BIGSERIAL UNIQUE
  ,h_payto BYTEA NOT NULL CHECK (LENGTH(h_payto)=64)
  ,expiration_time INT8 NOT NULL DEFAULT (0)
  ,provider_section VARCHAR NOT NULL
  ,provider_user_id VARCHAR DEFAULT NULL
  ,provider_legitimization_id VARCHAR DEFAULT NULL
  ) SHARD BY (h_payto);

  COMMENT ON COLUMN legitimizations.legitimization_serial_id
    IS 'unique ID for this legitimization process at the exchange';
  COMMENT ON COLUMN legitimizations.h_payto
    IS 'foreign key linking the entry to the wire_targets table, NOT a primary key (multiple legitimizations are possible per wire target)';
  COMMENT ON COLUMN legitimizations.expiration_time
    IS 'in the future if the respective KYC check was passed successfully';
  COMMENT ON COLUMN legitimizations.provider_section
    IS 'Configuration file section with details about this provider';
  COMMENT ON COLUMN legitimizations.provider_user_id
    IS 'Identifier for the user at the provider that was used for the legitimization. NULL if provider is unaware.';
  COMMENT ON COLUMN legitimizations.provider_legitimization_id
    IS 'Identifier for the specific legitimization process at the provider. NULL if legitimization was not started.';


Database API
------------

This section describes the new DB plugin functions.

* insert_legi (INSERT h_payto, provider_section),
  returns legitimization_serial_id

* get_legi (SELECT by legitimization_serial_id),
  returns provider_section, status.

* start_legi (UPDATE based on h_payto, provider_section,
  SETs provider_user_id, provider_legitimization_id)

* confirm_legi (UPDATE based on h_payto, provider_section,
  SETs expiration_time)

* get_legitimizations (SELECT by h_payto,
  WHERE NOT expired), returns provider_section list.

Additionally, we have to make:

* changes to the existing wire_targets API

* changes to existing KYC checks in stored procedures


Merchant modifications
^^^^^^^^^^^^^^^^^^^^^^

A new setting is required where the merchant backend
can be configured for a business (default) or individual.

.. note::

   This still needs to be done!

We introduce new ``kyc_status``, ``kyc_timestamp`` and ``kyc_serial`` fields
into a new table with primary keys ``exchange_url`` and ``account``.  This
status is updated whenever a deposit is created or tracked, or whenever the
mechant backend receives a ``/kyc-check/`` response from the exchange.  Initially,
``kyc_serial`` is zero, indicating that the merchant has not yet made any
deposits and thus does not have an account at the exchange.

A new private endpoint ``/kyc`` is introduced which allows frontends to
request the ``/kyc`` status of any configured account (including with long
polling).  If the KYC status is negative or the ``kyc_timestamp`` not recent
(say older than one month), the merchant backend will re-check the KYC status
at the exchange (and update its cached status).  The endpoint then returns
either that the KYC is OK, or information (same as from the exchange endpoint)
to begin the KYC process.

The merchant backend uses the new field to remember that a KYC is pending
(after ``/deposit``, or tracing deposits) and the SPA then shows a
notification whenever the staff is logged in to the system.  The notification
can be hidden for the current day (remembered in local storage).

The notification links to a (new) KYC status page. When opened, the KYC status
page first re-checks the KYC status with the exchange.  If the KYC is still
unfinished, that page contains another link to begin the KYC process
(redirecting to the OAuth 2.0 login page of the legitimization resource
server), otherwise it shows that the KYC process is done. If the KYC is
unfinished, the SPA should use long-polling on the KYC status on this page to
ensure it is always up-to-date, and change to ``KYC satisfied`` should the
long-poller return with positive news.

  ..note::

    Semi-related: The TMH_setup_wire_account() is changed to use
    128-bit salt values (to keep ``deposits`` table small) and checks for salt
    to be well-formed should be added "everywhere".


Bank requirements
^^^^^^^^^^^^^^^^^

The exchange primarily requires a KYC provider to be operated by the
bank that offers an endpoint for with an API implemented by one of
the logic plugins (and the respective legitimization configuration).


Logic plugins
^^^^^^^^^^^^^

The ``$PROVIDER_ID`` is based on the name of the configuration section,
not on the name of the logic plugin.  Using the configuration section,
the exchange then determines the logic plugin to use.

This section describes the general API for all of the supported KYC providers,
as well as some details of how this general API could be implemented by the logic for
different APIs.


General KYC Logic Plugin API
----------------------------

This section provides a sketch of the proposed API for the KYC logic plugins.

* initiation of KYC check (``kyc-check``):

  - inputs:
    + provider_section (for additional configuration)
    + h_payto
  - outputs:
    + success/provider-failure
    + redirect URL (or NULL)
    + provider_user_id (or NULL)
    + provider_legitimization_id (or NULL)

* KYC status check (``kyc-proof``):

  - inputs:
    + provider_section (for additional configuration)
    + h_payto
    + provider_user_id (or NULL)
    + provider_legitimization_id (or NULL)
  - outputs:
    + success/pending/user-aborted/user-failure/provider-failure status code
    + HTML response for end-user

* Webhook notification handler (``kyc-webhook``):

  - inputs:
    + HTTP method (GET/POST)
    + rest of URL (after provider_section)
    + HTTP body (if applicable!)
  - outputs:
    + success/pending/user-aborted/user-failure/provider-failure status code
    + h_payto (for DB status update)
    + HTTP response to be returned to KYC provider

The plugins do not directly interact with the database, the caller sets the
expiration on ``success`` and also updates ``provider_user_id`` and
``provider_legitimization_id`` in the tables as required.


For the webhook, we need a way to lookup ``h_payto`` by other data, so the
KYC logic plugin API should be provided a method lookup with:

  - inputs:
    + ``provider_section``
    + ``provider_legitimization_id``
  - outputs:
    + ``h_payto``


OAuth 2.0 specifics
-------------------

In terms of configuration, the OAuth 2.0 logic requires the respective client
credentials to be configured apriori to enable access to the legitimization
service.

For the ``/kyc-check/`` endpoint, the OAuth 2.0 logic may need to create and
store a nonce to be used during ``/kyc-proof/``, depending on the OAuth
variant used.  This may require another exchange table.  The OAuth 2.0 process
must then be set up to end at the new ``/kyc-proof/$PROVIDER_ID/`` endpoint.

This ``/kyc-proof/oauth2/`` endpoint must query the OAuth 2.0 server using the
``code`` argument provided as a query parameter. Based on the result, it then
updates the KYC table of the exchange with the legitimization status and
returns a human-readable KYC status page.

The ``/kyc-webhook/`` is not applicable.


Persona specifics
-----------------

We would use the hosted flow. Endpoints return a ``request-id``, which we should
log for diagnosis.

For ``/kyc-check/``:

* Post to ``/api/v1/accounts`` using ``reference-id`` set to our ``h_payto``.
  Returns ``id`` (account_id).

* Create ``/verify`` endpoint using ``template-id`` (from configuration),
  and ``account_id`` (from previous step) and a ``reference-id`` (use
  the ``legitimization_serial_id`` for the new process). Set
  ``redirect-uri`` to ``/kyc-proof/$PROVIDER_ID/``.  However, we cannot
  rely on the user clicking this, so we must also configure a webhook.
  The request returns a '``verification-id``.  That we store under
  the ``provider_legitimization_id`` in the database.

For ``/kyc-proof/``:

* Use the ``/api/v1/verifications`` endpoint to get the verification
  status. Requires the ``verification-id`` from the previous step.
  Results include: created/pending/completed/expired (aborted)/failed.

For ``/kyc-webhook/``:

* The webhook is authenticated using a shared secret, which should
  be in the configuration.  So all we should have to do is parse
  the POSTed body to find the status and the ``verification-id`` to
  lookup ``h_payto`` and return the result.


KYC AID specifics
-----------------

For ``/kyc-check/``:

* Post to ``/applicants`` with a type (person or company) to
  obtain ``applicant_id``. Store that under ``provider_user_id``.
  ISSUE: *we* need to get the company_name, business_activity_id
  and registration_country before this somehow!

* start with create form URL ``/forms/$FORM_ID/urls``
  providing our ``h_payto`` as the ``external_applicant_id``,
  using the ``applicant_id`` from above,
  and the ``/kyc-proof/$PROVIDER_ID`` for the ``redirect_url``.

* redirect customer to the ``form_url``,
  store the ``verification_id`` under ``provider_legitimization_id``
  in the database.

For ``/kyc-proof/``:

* Perform GET ``/verifications/{verification-id}`` to determine
  and return status.

For ``/kyc-webhook/``:

* For security, we should probably simply trigger the GET on
  ``/verifications/{verification_id}`` to not trust an unsigned POST
  to tell us anything for sure.  The result is then returned.


Alternatives
============

We could also store the access token (returned by OAuth 2.0), but that seems
slightly more dangerous and given the close business relationship is
unnecessary. Furthermore, not all APIs offer this.


Drawbacks
=========


Discussion / Q&A
================

(This should be filled in with results from discussions on mailing lists / personal communication.)