DD 23: Taler KYC ################ Summary ======= This document discusses the Know-your-customer (KYC) processes supported by Taler. Motivation ========== To legally operate, Taler has to comply with KYC regulation that requires banks to identify parties involved in transactions at certain points. Requirements ============ The solution should support fees to be paid by the user for the KYC process (#7365). Taler needs to take *measures* based on the following primary *triggers*: * Customer withdraws money over a monthly threshold * exchange triggers KYC * key: IBAN (encoded as payto:// URI) * Wallet receives (via refunds) money resulting in a balance over a threshold * this is a client-side restriction * key: reserve (=KYC account) long term public key per wallet (encoded as payto:// URI) * Wallet receives money via P2P payments * there are two sub-cases: PUSH and PULL payments * key: reserve (=KYC account) long term public key per wallet (encoded as payto:// URI) * Merchant receives money (Q: any money, or above a monthly threshold?) * key: IBAN (encoded as payto:// URI) * Reserve is "opened" for invoicing. * key: reserve (=KYC account) long term public key per wallet (encoded as payto:// URI) * Import of new sanctions lists and triggering of measures against matches of existing customer records against the list For the different *measures*, there are various different possible KYC/AML *checks* that could happen: * In-person validation by AML staff * Various forms to be filled by AML staff * Validation involving local authorities and post-office * Online validation, sometimes with multiple options (like KYC for multiple people): * Forms to be supplied by user (different types of ID) * Interactive video * Documents to be supplied (business register) * Address validation (e-mail or phone or postal) Additionally, the process is dynamic and conditional upon various decisions: * Individual vs. business * PEP or non-PEP * Hit on sanctions list * Type of business (trust, foundation, listed on stock market, etc.) * Need for plausibilization (via documents by user or staff research) * Periodic updates (of customer data, of sanction lists) and re-assessment There are also various *outcomes*: * normal operation (with expiration date) * normal operation but with AML staff investigating (new measure) * held, requesting customer documentation (new measure) * held, AML staff reviewing evidence for plausibilization (new measure) * automatically frozen until certain day (due to sanctions) * institutionally frozen until certain day (due to order by state authority) The outcome of a *check* can trigger further *measures* (including expiration of the outcome state). Finally, we need to produce statistics: * number of incidents reported (voluntarily, required) * number of business relationships at any point in time * number of risky business relationships (PEP, etc.) * number of frozen transactions (authority vs. sanction) with start-date and end-date * start-data and end-date of relationships (data retained for X years after end of relationship) For this high-level monitoring, we need certain designated critical events to be tracked in the system statistics: * account opened * set to high risk * set to low risk * suspicious activity report filed with authority * account frozen * account unfrozen * account closed As a result, we largely end up in a large state machine where the AML staff has serious flexibiltiy while the user needs guidance as to the possible next moves and/or to the current state of their account (where some information must not be disclosed). Proposed Solution ================= We allow certain *conditions* to *trigger* a single specific *measures*. For the different *measures*, we define: * Who has to do something (AML staff, user, nobody) * Contextual input data to be provided (with templating, e.g. amount set dynamically based on the *trigger*) * A *check* to be performed * A *program* that uses data from the *check* as well as *context* data to determine an *outcome* which is the specific operational state (normal, held on staff, held on user, frozen, etc.) the account is to transition to * What information about the state to show to the user (normal, information required, pending, etc.) * For user-interactive checks: * Web page template with instructions to render (with either a form to fill or links to external checks); here the context could provide an array of choices! * Possibly an external check to set up (if any); for cost-reasons, we should only do one at a time, and probably should then always redirect the browser to that check. * A *measure* to take on failure of the external check * For (AML) staff-interactive checks: * UI to file forms and upload documentation (without state transition) * UI to decide on next measure (providing context); here, the exchange needs to expose the list of available *measures* and required *context* for each * Non-interactive measures (normal operation, account frozen) need: * Expiration time (in context) * Measure to trigger upon expiration, again with context (renew documents, resume normal operation, etc.) We need some customer-driven interactivity in KYB/KYC process, for example the user may need to be given choices (address vs. phone, individual vs. business, order in which to provide KYC data of beneficiaries). As a result, the exchange needs to serve some "master" page for measures where the user is shown the next step(s) or choices (which person to collect KYC data on, whether to run challenger on phone number of physical address, etc.). That page should also potentially contain a form to allow the customer to directly upload documents to us (like business registration) instead of to some KYC provider. This is because KYC providers may not be flexible enough. Similarly, the AML staff will need to be able to trigger rather complex KYB/KYC processes, like "need KYC on X and Y and Z" or "phone number or mailing address" or "please upload form A/T/S". Here in particular it should be possible to request not only filled forms, but arbitrary documents. Documentation ^^^^^^^^^^^^^ * We must define risk-profile (902.4, 905.1) * We must document the specific setup, likely not just the INI file * We probably should have some key AMLA file attributes, such as: * File opened, file closed (keep data for X years afterwards!) * low-risk or high-risk business relationship * PEP status * business domain * authority notification dates (possibly multiple) with voluntary or mandatory notification classification * There must be a page with an overview of AMLA files with opening and closing dates and an easy way to determine for any day the number of open AMLA files * Technically, we also need a list of at-risk transactions and of frozen transactions, but given that we can really only freeze on an account-basis, I think there is nothing to do here Terminology ^^^^^^^^^^^ * **Check**: A check establishes a particular attribute of a user, such as their name based on an ID document and lifeness, mailing address, phone number, taxpayer identity, etc. Checks may be given *context* (such as whether a customer is an individual or a business) to run correctly. Checks can also be AML staff inserting information for plausibilization. Checks result in an *outcome* being decided by an external AML *program*. * **Condition**: A condition specifies when KYC is required. Conditions include the *type of operation*, a threshold amount (e.g. above EUR:1000) and possibly a time period (e.g. over the last month). * **Configuration**: The configuration determines the *legitimization rules*, and specifies which providers offer which *checks* at what *cost*. * **Context**: Context is information provided as input into a *check* and *program* to customize their execution. The context is initially set by the *trigger*, but may evolve as the *account* undergoes *measures*. For each *check* and *program*, the required *context* data must be specified. * **Cost**: Metric for the business expense for a KYC check at a certain *provider*. Not in any currency, costs are simply relative and non-negative values. Costs are considered when multiple choices are allowed by the *configuration*. * **Expiration**: KYC legitimizations may be outdated. Expiration rules determine when *checks* have to be performed again. * **Legitimization rules**: The legitimization rules determine under which *conditions* which *checks* must be performend and the *expiration* time period for the *checks*. * **Logic**: Logic refers to a specific bit of code (realized as an exchange plugin) that enables the interaction with a specific *provider*. Logic typically requires *configuration* for access control (such as an authorization token) and possibly the endpoint of the specific *provider* implementing the respective API. * **Measure**: Describes the possible outgoing edges from one state in the state machine (including how to show the current state). Each edge is given some *context* and a *check* to be performed as well as a *program* to decide the *outcome* and the next *measure*. * **Outcome**: Describes the account state that an account ends up in due to the result of a *check*. Outcomes can be that an account is frozen (no transactions possible until freeze expires), held (no transactions possible until another *measure* has been taken), or operating normally. * **Provider**: A provider performs a specific set of *checks* at a certain *cost*. Interaction with a provider is performed by provider-specific *logic*. * **Program**: An AML helper *program* is given *context* about the current state of an account and the data from a *check* to compute the *outcome*. For example, a *program* may look at the "PEP" field of a KYC check and decide if the outcome is to put the account into ``normal`` or ``held-for-manual-review`` state. A *program* operating on an AML form filed by AML staff will likely be trivial and directly apply the explicit decision taken by the staff member. * **Type of operation**: The operation type determines which Taler-specific operation has triggered the KYC requirement. We support four types of operation: withdraw (by customer), deposit (by merchant), P2P receive (by wallet) and (high) wallet balance. New Endpoints ^^^^^^^^^^^^^ We introduce a new ``wire_targets`` table into the exchange database. This table is referenced as the source or destination of payments (regular deposits and also P2P payments). A positive side-effect is that we reduce duplication in the ``reserves_in``, ``wire_out`` and ``deposits`` tables as they can reference this table. We introduce a new ``legitimization_processes`` table that tracks the status of a legitimization process at a provider, including the configuration section name, the user/account name at the provider, and some legitimization identifier for the process at the provider. In this table, we additionally store information related to the KYC status of the underlying payto://-URI, in particular when the KYC expires (0 if it was never done). Finally, we introduce a new ``legitimization_requirements`` table that contains a list of checks required for a particular wire target. When KYC is triggered (say when some endpoint returns an HTTP status code of 451) a new requirement is first put into the requirements table. Then, when the client identifies as business or individual the specific legitimization process is started. When the taler-exchange-aggregator triggers a KYC check the merchant can observe this when a 202 (Accepted) status code is returned on GET ``/deposits/`` with the respective legitimization requirement row. The new ``/kyc-check/`` endpoint is based on the legitimization requirements serial number and receives the business vs. individual status from the client. Access is ``authenticated`` by also passing the hash of the payto://-URI. (Weak authentication is acceptable, as the KYC status or the ability to initiate a KYC process are not very sensitive.) Given this triplet, the ``/kyc-check/`` endpoint returns either the (positive) KYC status or redirects the client (202) to the next required stage of the KYC process. The redirection must be for an HTTP(S) endpoint to be triggered via a simple HTTP GET. As this endpoint is involved in every KYC check at the beginning, this is also the place where we can integrate the payment process for the KYC fee. The specific KYC provider to be executed depends on the configuration (see below) which specifies a ``$PROVIDER_SECTION`` for each authentication procedure. For each (enabled) provider, the exchange has a logic plugin which (asynchronously) determines the redirect URL for a given wire target. See below for a description of the high-level process for different providers. Upon completion of the process at the KYC provider, the provider must trigger a GET request to a new ``/kyc-proof/$H_PAYTO/$PROVIDER_SECTION`` endpoint. This may be done either by redirecting the browser of the user to that endpoint. Once this endpoint is triggered, the exchange will pass the received arguments to the respective logic plugin. The logic plugin will then (asynchronously) update the KYC status of the user. The logic plugin should return a human-readable HTML page with the KYC result to the user. Alternatively, the KYC confirmation may be triggered by a ``/kyc-webhook`` request. As KYC providers do not necessarily support passing detailed information in the URL arguments, the ``/kyc-webhook`` only needs to specify either the ``PROVIDER_SECTION`` *or* the ``LOGIC`` (the name of the plugin implementing the KYC API). The API-specific webhook logic must then figure out what exactly the webhook is about on its own. The ``/kyc-webhook/`` endpoint works for GET or POST, again as details depend on the KYC provider. In contrast to ``kyc-proof``, the response does NOT go to the end-users' browser and should thus only indicate success or failure. Legitimization Hooks ^^^^^^^^^^^^^^^^^^^^ When withdrawing, the exchange checks if the KYC status is acceptable. If no KYC was done and if either the amount withdrawn over a particular timeframe exceeds the threshold or the reserve received received a P2P transfer, then a ``451 Unavailable for Legal Reasons`` is returned which redirects the consumer to the new ``/kyc-check/`` handler. When depositing, the exchange aggregator (!) checks the KYC status and if negative, returns an additional information field via the ``aggregation_transient`` table which is returned via GET ``/deposts/`` to the merchant. This way, the merchant learns the ``requirement_row`` needed to begin the KYC process (this is independent of the amount) at the new ``/kyc-check/`` handler. When merging into a reserve, the KYC status is checked and again the merge fails with ``451 Unavailable for Legal Reasons`` to trigger the KYC process. To allow the wallet to do the KYC check if it is about to exceed a set balance threshold, we modify the ``/keys`` response to add an optional array ``wallet_balance_limit_without_kyc`` of threshold amounts is returned. Whenever the wallet crosses one of these thresholds for the first time, it should trigger the KYC process. If this field is absent, there is no limit. If the field is provided, a correct wallet must create a long-term account-reserve key pair. This should be the same key that is also used to receive wallet-to-wallet payments. Then, *before* a wallet performs an operation that would cause it to exceed the balance threshold in terms of funds held from a particular exchange, it *should* first request the user to complete the KYC process. For that, the wallet should POST to the new ``/wallet-kyc`` endpoint, providing its long-term reserve-account public key and a signature requesting permission to exceed the account limit. Here, the ``balance`` specified should be the threshold (from the ``wallet_balance_limit_without_kyc`` array) that the wallet would cross, and *not* the *exact* balance of the wallet. The exchange will respond with a wire target UUID. The wallet can then use this UUID to being the KYC process at ``/kyc-check/``. The wallet must only proceed to obtain funds exceeding the threshold after the KYC process has concluded. While wallets could be "hacked" to bypass this measure (we cannot cryptographically enforce this), such modifications are a terms of service violation which may have legal consequences for the user. Configuration Options ^^^^^^^^^^^^^^^^^^^^^ The configuration specifies a set of providers, one per configuration section: [kyc-provider-$PROVIDER_ID] # Which plugin is responsible for this provider? LOGIC = PLUGIN_NAME # Which check does this provider provide? PROVIDED_CHECK = SMS # Plus additional logic-specific options, e.g.: AUTHORIZATION_TOKEN = superdupersecret FORM_ID = business_legi_form # How long is the data from this check considered valid? EXPIRATION = DURATION The configuration also specifies a set of legitimization requirements, one per configuration section: [kyc-legitimization-$RULE_NAME] # Operation that triggers this legitimization. # Must be one of WITHDRAW, DEPOSIT, P2P-RECEIVE # or WALLET-BALANCE. OPERATION_TYPE = WITHDRAW # Required measure to be performed. REQUIRED_MEASURE = SWISSNESS # Threshold amount above which the legitimization is # triggered. The total must be exceeded in the given # timeframe. Can be 'forever'. THRESHOLD = AMOUNT # Timeframe over which the amount to be compared to # the THRESHOLD is calculated. # Ignored for WALLET-BALANCE. TIMEFRAME = DURATION Finally, the configuration specifies a set of measures, one per configuration section: [aml-measure-$MEASURE_NAME] # Program to run on the context and check data to # determine the outcome and next measure. PROGRAM = /bin/true # Check to run as part of this measure. Optional. CHECK = CHECK_NAME # Form to show to the user as part of this measure. FORM = FORM_NAME For each FORM_NAME, there then must be * A HTML template (Mustach) that is instantiated with the JSON form context to produce a page to be shown to the user * A helper program (named after the form) that can: * Generate a list of required context attributes for the helper (!) * Validate and convert an input JSON with context attributes into the JSON form context Exchange Database Schema ^^^^^^^^^^^^^^^^^^^^^^^^ .. sourcecode:: sql CREATE TABLE IF NOT EXISTS wire_targets (wire_target_serial_id BIGSERIAL UNIQUE ,h_payto BYTEA NOT NULL CHECK (LENGTH(h_payto)=64), ,payto_uri STRING NOT NULL ,PRIMARY KEY (h_payto) ) SHARD BY (h_payto); COMMENT ON TABLE wire_targets IS 'All recipients of money via the exchange'; COMMENT ON COLUMN wire_targets.payto_uri IS 'Can be a regular bank account, or also be a URI identifying a reserve-account (for P2P payments)'; COMMENT ON COLUMN wire_targets.h_payto IS 'Unsalted hash of payto_uri'; CREATE TABLE IF NOT EXISTS legitimization_requirements (legitimization_requirement_serial_id BIGINT GENERATED BY DEFAULT AS IDENTITY ,h_payto BYTEA NOT NULL CHECK (LENGTH(h_payto)=32) ,required_checks VARCHAR NOT NULL ,UNIQUE (h_payto, required_checks); ) PARTITION BY HASH (h_payto); CREATE TABLE IF NOT EXISTS legitimization_processes (legitimization_serial_id BIGSERIAL UNIQUE ,h_payto BYTEA NOT NULL CHECK (LENGTH(h_payto)=64) ,expiration_time INT8 NOT NULL DEFAULT (0) ,provider_section VARCHAR NOT NULL ,provider_user_id VARCHAR DEFAULT NULL ,provider_legitimization_id VARCHAR DEFAULT NULL ) PARTITION BY HASH (h_payto); COMMENT ON COLUMN legitimizations.legitimization_serial_id IS 'unique ID for this legitimization process at the exchange'; COMMENT ON COLUMN legitimizations.h_payto IS 'foreign key linking the entry to the wire_targets table, NOT a primary key (multiple legitimizations are possible per wire target)'; COMMENT ON COLUMN legitimizations.expiration_time IS 'in the future if the respective KYC check was passed successfully'; COMMENT ON COLUMN legitimizations.provider_section IS 'Configuration file section with details about this provider'; COMMENT ON COLUMN legitimizations.provider_user_id IS 'Identifier for the user at the provider that was used for the legitimization. NULL if provider is unaware.'; COMMENT ON COLUMN legitimizations.provider_legitimization_id IS 'Identifier for the specific legitimization process at the provider. NULL if legitimization was not started.'; Merchant modifications ^^^^^^^^^^^^^^^^^^^^^^ A new setting is required where the merchant backend can be configured for a business (default) or individual. .. note:: This still needs to be done! We introduce new ``kyc_status``, ``kyc_timestamp`` and ``kyc_serial`` fields into a new table with primary keys ``exchange_url`` and ``account``. This status is updated whenever a deposit is created or tracked, or whenever the mechant backend receives a ``/kyc-check/`` response from the exchange. Initially, ``kyc_serial`` is zero, indicating that the merchant has not yet made any deposits and thus does not have an account at the exchange. A new private endpoint ``/kyc`` is introduced which allows frontends to request the ``/kyc`` status of any configured account (including with long polling). If the KYC status is negative or the ``kyc_timestamp`` not recent (say older than one month), the merchant backend will re-check the KYC status at the exchange (and update its cached status). The endpoint then returns either that the KYC is OK, or information (same as from the exchange endpoint) to begin the KYC process. The merchant backend uses the new field to remember that a KYC is pending (after ``/deposit``, or tracing deposits) and the SPA then shows a notification whenever the staff is logged in to the system. The notification can be hidden for the current day (remembered in local storage). The notification links to a (new) KYC status page. When opened, the KYC status page first re-checks the KYC status with the exchange. If the KYC is still unfinished, that page contains another link to begin the KYC process (redirecting to the OAuth 2.0 login page of the legitimization resource server), otherwise it shows that the KYC process is done. If the KYC is unfinished, the SPA should use long-polling on the KYC status on this page to ensure it is always up-to-date, and change to ``KYC satisfied`` should the long-poller return with positive news. ..note:: Semi-related: The TMH_setup_wire_account() is changed to use 128-bit salt values (to keep ``deposits`` table small) and checks for salt to be well-formed should be added "everywhere". An additional complication will arise once the exchange can trigger a KYC fee (402) on ``/kyc-check/``. In this case, the merchant SPA must show the QR code to the merchant to allow the merchant to pay the KYC fee with a wallet. Bank requirements ^^^^^^^^^^^^^^^^^ The exchange primarily requires a KYC provider to be operated by the bank that offers an endpoint for with an API implemented by one of the logic plugins (and the respective legitimization configuration). Logic plugins ^^^^^^^^^^^^^ The ``$PROVIDER_SECTION`` is based on the name of the configuration section, not on the name of the logic plugin (that we call ``$LOGIC``). Using the configuration section, the exchange then determines the logic plugin to use. This section describes the general API for all of the supported KYC providers, as well as some details of how this general API could be implemented by the logic for different APIs. General KYC Logic Plugin API ---------------------------- This section provides a sketch of the proposed API for the KYC logic plugins. * initiation of KYC check (``kyc-check``): - inputs: + provider_section (for additional configuration) + individual or business user + h_payto - outputs: + success/provider-failure + redirect URL (or NULL) + provider_user_id (or NULL) + provider_legitimization_id (or NULL) * KYC status check (``kyc-proof``): - inputs: + provider_section (for additional configuration) + h_payto + provider_user_id (or NULL) + provider_legitimization_id (or NULL) - outputs: + success/pending/user-aborted/user-failure/provider-failure status code + HTML response for end-user * Webhook notification handler (``kyc-webhook``): - inputs: + HTTP method (GET/POST) + rest of URL (after provider_section) + HTTP body (if applicable!) - outputs: + success/pending/user-aborted/user-failure/provider-failure status code + h_payto (for DB status update) + HTTP response to be returned to KYC provider The plugins do not directly interact with the database, the caller sets the expiration on ``success`` and also updates ``provider_user_id`` and ``provider_legitimization_id`` in the tables as required. For the webhook, we need a way to lookup ``h_payto`` by other data, so the KYC logic plugin API should be provided a method lookup with: - inputs: + ``provider_section`` + ``provider_legitimization_id`` - outputs: + ``h_payto`` + ``legitimization_process_row`` OAuth 2.0 specifics ------------------- In terms of configuration, the OAuth 2.0 logic requires the respective client credentials to be configured apriori to enable access to the legitimization service. For the ``/kyc-check/`` endpoint, the OAuth 2.0 logic may need to create and store a nonce to be used during ``/kyc-proof/``, depending on the OAuth variant used. This may require another exchange table. The OAuth 2.0 process must then be set up to end at the new ``/kyc-proof/$PROVIDER_ID/`` endpoint. This ``/kyc-proof/oauth2/`` endpoint must query the OAuth 2.0 server using the ``code`` argument provided as a query parameter. Based on the result, it then updates the KYC table of the exchange with the legitimization status and returns a human-readable KYC status page. The ``/kyc-webhook/`` is not applicable. Persona specifics ----------------- We would use the hosted flow. Endpoints return a ``request-id``, which we should log for diagnosis. For ``/kyc-check/``: * Post to ``/api/v1/accounts`` using ``reference-id`` set to our ``h_payto``. Returns ``id`` (account_id). * Create ``/verify`` endpoint using ``template-id`` (from configuration), and ``account_id`` (from previous step) and a ``reference-id`` (use the ``legitimization_serial_id`` for the new process). Set ``redirect-uri`` to ``/kyc-proof/$PROVIDER_ID/``. However, we cannot rely on the user clicking this, so we must also configure a webhook. The request returns a '``verification-id``. That we store under the ``provider_legitimization_id`` in the database. For ``/kyc-proof/``: * Use the ``/api/v1/verifications`` endpoint to get the verification status. Requires the ``verification-id`` from the previous step. Results include: created/pending/completed/expired (aborted)/failed. For ``/kyc-webhook/``: * The webhook is authenticated using a shared secret, which should be in the configuration. So all we should have to do is parse the POSTed body to find the status and the ``verification-id`` to lookup ``h_payto`` and return the result. KYC AID specifics ----------------- For ``/kyc-check/``: * Post to ``/applicants`` with a type (person or company) to obtain ``applicant_id``. Store that under ``provider_user_id``. ISSUE: *we* need to get the company_name, business_activity_id and registration_country before this somehow! * start with create form URL ``/forms/$FORM_ID/urls`` providing our ``h_payto`` as the ``external_applicant_id``, using the ``applicant_id`` from above, and the ``/kyc-proof/$PROVIDER_ID`` for the ``redirect_url``. * redirect customer to the ``form_url``, store the ``verification_id`` under ``provider_legitimization_id`` in the database. For ``/kyc-proof/``: * Not needed, just return an error. For ``/kyc-webhook/``: * For security, we should probably simply trigger the GET on ``/verifications/{verification_id}`` to not trust an unsigned POST to tell us anything for sure. The result is then returned. Alternatives ============ We could also store the access token (returned by OAuth 2.0), but that seems slightly more dangerous and given the close business relationship is unnecessary. Furthermore, not all APIs offer this. We could extend the KYC logic API to return key attributes about the user (such as legal name, phone number, address, etc.) which we could then sign and return to the user. This would be useful in P2P payments to identify the origin of an invoice. However, we might want to be careful to not disclose the key attributes via the API by accident. This could likely be done by limiting access to the respective endpoint to messages with a signature by the reserve private key (which is the only case where we care to certify things anyway). Drawbacks ========= Discussion / Q&A ================ (This should be filled in with results from discussions on mailing lists / personal communication.)