010-exchange-helpers.rst - taler-docs - Documentation for GNU Taler components, APIs and protocols

010-exchange-helpers.rst (9606B)
      1 DD 10: Exchange crypto helper design
      2 ####################################
      3 
      4 Summary
      5 =======
      6 
      7 A way to minimize the attack surface for extraction of the private online
      8 signing keys (RSA and EdDSA) from the exchange is described.
      9 
     10 
     11 Motivation
     12 ==========
     13 
     14 We want to provide an additional layer of protection for the private online
     15 signing keys used by the exchange. The exchange is network-facing, includes an
     16 HTTP server, PostgreSQL interaction, JSON parser and quite a bit of other logic
     17 which may all be theoretically vulnerable to remote exploitation.  Thus, it
     18 would be good from a security perspective to protect the private online
     19 signing keys via an additional layer of protection.
     20 
     21 
     22 Requirements
     23 ============
     24 
     25 * The solution should not result in a dramatic loss of performance.
     26 * An attacker with a successful arbitrary code execution on the exchange
     27   must not be able to extract the private keys.
     28 * Ideally, we should be able to determine the number of signatures
     29   obtained illicitly by the attacker.
     30 * Key management for operators should be simplified to improve usability.
     31 * Both RSA and EdDSA online signing keys need to be protected.
     32 * We should have a way to verify that the keys signed with the offline
     33   master private key are those originating from the isolated
     34   (software/hardware) security module.
     35 
     36 
     37 Proposed Solution
     38 =================
     39 
     40 The private keys are to be created, used and deleted by two helper processes
     41 running under a different user ID (UID), creating in effect a software
     42 security module.  The exchange's HTTP process will be required to interact
     43 with those helpers via a UNIX domain socket.
     44 
     45 Socket permission details:
     46 
     47 * The socket will be chmod 0620 (u+rw, g+w) regardless of umask.
     48 * That the group is the same group of the crypto helpers must
     49   still be ensured by the operator.
     50 
     51 General design details:
     52 
     53 * The helpers will process requests from the exchange to sign and revoke keys.
     54 * The helpers will create and destroy the private keys. They will no longer be
     55   created on the air-gapped machine with the (offline) master private key.
     56   The helpers will tell the exchange when keys are created or deleted/expired.
     57 * Each helper will sign freshly generated keys with a security module-specific
     58   private key. This key will be verified by the offline signing key process
     59   using either manual verification against log output from the security
     60   module's start-up routine, or via TOFU.  TOFU is considered sufficient,
     61   as an adversary breaking into the exchange process during the initial setup,
     62   when the exchange is not even yet operational because no keys have ever been
     63   provisioned, is considered highly unlikely.  Depending on how the exchange
     64   is initialized, access to security module logs may or may not be feasible,
     65   so TOFU is a good and usable alternative strategy.
     66 
     67 Helper design details:
     68 
     69 * SOCK_DGRAM will be used to avoid needing to parse a data stream.
     70 * The helpers will only know about (private) key lifetime. They will not know about
     71   details like currency, fee structure, master or auditor signatures.
     72   Those will be managed by the HTTP process to keep the helpers minimal.
     73 * The helpers will use a single-threaded, GNUnet-scheduler-driven event loop
     74   to process incoming requests from the UNIX domain sockets. However, the
     75   actual signing will be done by a thread pool of workers that only process
     76   signing requests from a work queue. Reference counting is used to avoid
     77   releasing private keys while workers are actively using them to sign requests.
     78 * The work queue is managed via a pthread-style semaphore.
     79 * The master thread is informed about completed work via an ``eventfd()``.
     80 * The master thread is responsible for handling revocations, creating future
     81   private keys and expiring old keys.  Revocations will also be triggered
     82   via a new ``/keys`` endpoint. The HTTP server will verify that the revocation
     83   is properly signed with the master private key before passing it on to the
     84   respective helper.
     85 
     86 Exchange design considerations:
     87 
     88 * The helpers are started by the system, say via systemd, not by the
     89   exchange. This simplifies the exchange, and we already needed the
     90   exchange operator to start four processes to operate an exchange.
     91   So this number simply increases to six (not even counting the
     92   PostgreSQL database and a reverse HTTP proxy for TLS termination).
     93 * Each exchange thread will create its own connection to the helpers, and will
     94   block while waiting on the helper to create a signature.  This keeps the
     95   exchange logic simple and similar to the existing in-line signing calls.
     96   Suspending and resuming would be difficult as we currently do not have a
     97   way to wait for a UNIX domain socket to resume the MHD logic.
     98   If a signal is received while waiting for the helper, the signature operation
     99   fails. Signature operations can also fail if the helper is not running or
    100   responding with incorrect data. However, signature operations do NOT have a
    101   timeout.
    102 
    103 New exchange endpoints:
    104 
    105 * The exchange will expose the corresponding public keys via a GET to
    106   ``/keys/future`` endpoint to the offline signing process.  For offline
    107   signing, tooling will be provided to first download to a file, then
    108   sign based on that file, and then upload the resulting signature back to
    109   the exchange. For this, master signatures will be POSTed to
    110   the exchange to the ``/keys`` endpoint.
    111   The exchange will keep those signatures in the PostgreSQL database.
    112 * A new endpoint (``/auditors``) will also allow adding/removing auditors
    113   (POST, DELETE) using requests signed with the offline master private key.
    114   Once an auditor has been added, the respective auditor signatures on exchange
    115   keys can also be POSTed to the REST API at
    116   ``/auditors/$AUDITOR_PUB/{denomination,signing}``.
    117 
    118 Overall, the result is that except for software updates and the fundamental
    119 configuration, the ``taler-exchange-http`` will be updated only via HTTP(S)
    120 and not via a signal and new files appearing in the directory hierarchy.
    121 All of the more volatile state of the HTTP process will be in the database.
    122 Only the helpers continue to keep files on disk.
    123 
    124 
    125 Alternatives
    126 ============
    127 
    128 * The helpers could have been given the information to validate the signing
    129   request. However, without database access, validating the reserve key
    130   signature (and others) is pretty useless. Thus, this direction would only
    131   complicate the helper (which we want to keep minimal to minimize attack
    132   surface) without real benefits. Even validating revocation requests (checking
    133   signatures by auditor or master public key) makes no sense, as if an attacker
    134   triggers a revocation, we should probably be thankful: That's a white-hat
    135   demonstrating that they got control in the least harmful way.
    136 * Instead of two helpers, we could have just one helper. But there is limited
    137   overlap between the (RSA) denomination key logic and the (EdDSA) signing
    138   key logic. Separation may improve security.
    139 * We could have proposed a helper per denomination. But as the code of all of
    140   these helpers would be identical, this would have no security advantages.
    141 * We could have implemented our own event loop and configuration parser,
    142   instead of relying on libgnunetutil. But this part of GNUnet is very
    143   robust.
    144 * We could have had a thread pool reading requests from the exchange clients,
    145   instead of a master thread doling out the work. But this would become really
    146   complicted with key revocations, and as really only the cryptography should
    147   be the bottleneck, performance advantages should be minimal. If IPC ever
    148   becomes the issue, then the entire idea of moving signatures to another
    149   process would be flawed.
    150 * More portable mechanisms (like a ``pipe()``) could be used for signaling
    151   instead of ``eventfd()``. But, this can always be implemented if we truly
    152   ever have an exchange operator needing support for such a platform.
    153 * We could have left the helper single-threaded, to avoid the complications
    154   arising from the use of threads. However, given that signing is expected to
    155   be a bottleneck of the exchange, this would have had serious performance
    156   implications for the entire system.
    157 * The helpers could have been started by the exchange. This would have
    158   required the helpers use SUID. Allowing the system administrator to start
    159   them as they see fit is more flexible with respect to the privilege
    160   configuration. Also, this avoid forcing the exchange to manage
    161   restarting on crashes and/or crash reporting.
    162 
    163 
    164 Drawbacks
    165 =========
    166 
    167 * Additional work to properly setup an exchange and to run
    168   our automated tests.
    169 * Slight (?) performance impact.
    170 * UNIX only. Likely Linux-only for now (but fixable).
    171 * If exchange receives ANY (not ignored) signal during signing
    172   operation, a discrepancy in the number of signatures created
    173   between exchange (DB) and the helper will arise.  Thus,
    174   auditors have to allow for small discrepancies (increasing
    175   over time).  Note that we only expect the exchange to receive
    176   signals if the software is updated or the process is terminated.
    177 * If helper is stopped (SIGSTOP), exchange HTTP will itself block
    178   (no timeout!). Timeout-based mitigation would additionally increase
    179   discrepancies in the count of the number of signatures created.
    180 * System administrator must not forget to start helpers, otherwise
    181   the exchange will not work (This is not a new problem: same applies
    182   for taler-exchange-transfer and other exchange processes).
    183 
    184 
    185 
    186 Discussion / Q&A
    187 ================
    188 
    189 (This should be filled in with results from discussions on mailing lists / personal communication.)
	taler-docs Documentation for GNU Taler components, APIs and protocols
	Log \| Files \| Refs \| README \| LICENSE