related_work.tex - anastasis - Credential backup and recovery protocol and service

related_work.tex (23192B)
      1 \section{Related work}
      2 
      3 This chapter explains some important cryptographic functions and which
      4 are used in Anastasis or related to our work. We also describe issues
      5 with existing solutions in this domain.
      6 
      7 \subsection{Cryptographic primitives}
      8 
      9 \subsubsection{Pseudo-randomness}
     10 
     11 A pseudo random generator (PRG) is an algorithm producing an infinite
     12 sequence of bits for which there is no efficient algorithm to
     13 distinguish it from a truly random sequence~\cite{vadhan2012}. The
     14 algorithm ``takes as input a short, perfectly random
     15 seed''~\cite{vadhan2012} which determines the output value.\\
     16 
     17 A pseudo random function (PRF) is a deterministic function which
     18 output is finite and indistinguishable from a true random
     19 function.~\cite{nielsen2002} PRFs can be constructed using
     20 PRGs.~\cite{GGM1986}
     21 
     22 \subsubsection{Hash function}
     23 
     24 Hash functions "compress a string of arbitrary length to a string of
     25 fixed length [...]"~\cite{Preneel1999}. The output of a hash function
     26 often is called a "hash".  Hash functions in general should be very
     27 fast to compute. Cryptographic hash functions need to fulfil
     28 additional security requirements which are
     29 
     30 \begin{itemize}
     31  \item (first) pre-image resistance,
     32  \item second pre-image resistance,
     33  \item collision resistance,
     34  \item pseudo randomness, and the
     35  \item avalanche effect.
     36 \end{itemize}
     37 
     38 Pre-image resistance, also called the ``one way property'', means that
     39 for a given hash function $H$ and a hash value $H(x)$, it is
     40 computationally infeasible to find $x$.~\cite{SG2012} For example,
     41 since in Anastasis we derive the key to encrypt the personal details
     42 required for user authentication (e.g. the mobile phone number for
     43 authentication via SMS) using functions based on hash functions (see
     44 HKDF), it is very important that you cannot derive the corresponding
     45 input values from the key.
     46 
     47 The second pre-image resistance is described by following: For a given
     48 hash function $H$ and a hash value $H(x)$, it is computationally
     49 infeasible to find $x$ and $x'$ such that $H(x) = H(x')$ and $x \not=
     50 x'$.~\cite{SG2012} In Anastasis hash functions also are involved in
     51 signing our so called recovery document. Hence an attacker should not
     52 be able to create a malicious recovery document with the same hash
     53 value as the original one.
     54 
     55 The definition of collision resistance slightly differs from the
     56 second pre-image resistance: For a given hash function $H$, it is
     57 computationally infeasible to find a pair $x, y$ such that $H(x) =
     58 H(y)$ \cite{SG2012}.
     59 %CG: the text below does NOT related to collision resistance!
     60 %As we are using HKDFs for deriving keys in
     61 %Anastasis, an attacker should not be able to find some other input
     62 %values also leading to the same keys we use.
     63 Anastasis does not rely upon collision resistance in its use of hash
     64 functions. % CG: at least no case comes to mind for me right now...
     65 
     66 A cryptographic hash function should also behave as a pseudo random
     67 function. This means that although a hash function is purely
     68 deterministic, the output must not be predictable.
     69 
     70 The avalanche effect describes the property of an algorithm that
     71 causes a significant change of the output value, usually a bit
     72 flipping of more than half the output is desired, if the input is
     73 changed slightly (for example, flipping a single bit).~\cite{RK2011}
     74 The more bits are flipping in the output value the higher the entropy
     75 of the randomness of the hash function.
     76 
     77 There are many applications for cryptographic hash functions. For
     78 example, you can store the hash value of a passphrase instead of the
     79 passphrase itself in a computer to protect the passphrase. Another
     80 important application is verification of message integrity: Before and
     81 after transmission of a message one can calculate the hash values of
     82 it and compare hashes later to determine if the message changed during
     83 transmission.
     84 
     85 In Anastasis we use SHA-512~\cite{GJW2011} for fast hash functions.
     86 
     87 \subsubsection{HMAC}
     88 
     89 When it comes to integrity of messages during communication of two
     90 parties over an insecure channel Keyed-Hash Message Authentication
     91 Codes (HMAC) are used as check values. An HMAC function is based on a
     92 hash function and takes two arguments, a key $K$ and a message $M$:
     93 
     94 $HMAC_{K}(M) = H(K \oplus opad,H(K \oplus ipad, M))$ with "ipad" and
     95 "opad" being constants which fill up the key $K$ to the blocksize of
     96 the hash function~\cite{BCK1996}. The blocksize of a modern hash
     97 function like SHA-512 is 64 bytes.
     98 
     99 \subsubsection{HKDF}
    100 
    101 A HKDF is a key derivation function (KDF) based on HMAC. A KDF ``is a
    102 basic and essential component of cryptographic systems: Its goal is
    103 to take a source of initial keying material, usually containing some
    104 good amount of randomness, but not distributed uniformly or for which
    105 an attacker has some partial knowledge, and derive from it one or more
    106 cryptographically strong secret keys''~\cite{krawczyk2010}.
    107 
    108 Anastasis uses HKDFs based on SHA-512 to derive symmetric keys for
    109 encryption.
    110 
    111 \subsubsection{Argon2}
    112 
    113 Hash functions like SHA-512 are designed to be very fast. Therefore
    114 passwords being stored using this kind of hash are vulnerable to
    115 dictionary attacks with new hardware architectures like
    116 FPGAs~\cite{trimberger2012} and dedicated ASIC~\cite{madurawe2006}
    117 modules. But those architectures ``experience difficulties when
    118 operating on large amount of memory''~\cite{BDK2016}.
    119 
    120 In contrast to standard hash functions there are functions designed to
    121 be memory-hard. Argon2 is such a memory-hard function that won the
    122 Password Hashing Competition in 2015. It minimizes time-memory
    123 tradeoff~\cite{stamp2003} and thus maximizes the costs to implement an
    124 ASIC for given CPU computing time~\cite{BDK2016}. Aside from the fact
    125 that Argon2 makes dictionary attacks much harder, Argon2 can be used
    126 for another feature too: Memory-hard schemes like Argon2 are very
    127 useful for key derivation from low-entropy sources~\cite{BDK2016}.
    128 
    129 Argon2 is used in Anastasis to derive an identifier for the user from
    130 the user's attributes, which serve as low-entropy inputs.
    131 
    132 
    133 \subsection{Secret sharing}
    134 
    135 Secret splitting, also known as secret sharing, is a technique for
    136 distributing a secret amongst multiple recipients. This is achieved by
    137 assigning a share of the secret to each recipient. By combining a
    138 sufficient number of those shares, it is possible to reconstruct the
    139 secret.  In a secret sharing theme the recipients of a share often are
    140 called \textit{players}. The figure who gives a share of the secret to
    141 the players is called \textit{dealer}.
    142 
    143 In Anastasis the user is the trusted dealer who splits the secret and
    144 also reconstructs it.
    145 
    146 \subsubsection{Shamir's secret sharing} \label{sec:rel:shamir}
    147 
    148 The algorithm ``Shamir's secret sharing'' is probably the most well
    149 known secret sharing scheme. It ``divide[s] data D into n pieces in
    150 such a way that D is easily reconstructible from any k pieces, but
    151 even complete knowledge of $k - 1$ pieces reveals absolutely no
    152 information about D''~\cite{shamir_sharing}.
    153 
    154 Shamir’s simple secret sharing scheme has two key limitations. First,
    155 it requires a trusted dealer who initially generates the secret to be
    156 distributed, and second the shares are not verifiable during
    157 reconstruction. Therefore, malicious shareholders could submit corrupt
    158 shares to prevent the system from reconstructing the secret --- without
    159 these corrupt shareholders being detectable as malicious. Furthermore,
    160 the dealer distributing the shares could be corrupt and distribute
    161 some inconsistent shares to the others. Also, in some scenarios the
    162 dealer cannot be trusted with the knowledge of the original core
    163 secret.
    164 
    165 Additionally, Shamir's secret sharing is inflexible because it is a
    166 simple $k$-out-of-$n$ threshold scheme.  While this makes the scheme
    167 reasonably efficient even for big values of $n$, efficiency with
    168 respect to a large number of escrow providers and authorization
    169 procedures is not important for Anastasis: it is already difficult to
    170 conceive users providing more than a handful of authentication methods
    171 (Section~\ref{sec:rel:authentication} describes common choices.)
    172 
    173 For Anastasis, we thus decided to opt for more flexible approach that
    174 allows complex policies for recovery authorization, instead of only
    175 $k$-out-of-$n$. Each user of Anastasis is also able to decide which
    176 combinations of \textit{players}, which in case of Anastasis are the 
    177 escrow providers, shall be permitted.
    178 
    179 \subsubsection{Verifiable secret sharing}
    180 
    181 Verifiability can be achieved by using so called commitment schemes
    182 like the Pederson commitment. It allows ``to distribute a secret to n
    183 persons such that each person can verify that he has received correct
    184 information about the secret without talking with other
    185 persons''~\cite{pedersen_sharing_0}. In his paper ``A Practical Scheme
    186 for Non-interactive Verifiable Secret
    187 Sharing''~\cite{feldman_sharing}, Paul Feldman combines the two
    188 schemes Shamir Secret Sharing and Pederson commitment. His algorithm
    189 for verifiable secret sharing (VSS), allows each recipient to
    190 verify the correctness of their share. But like in the Shamir Secret
    191 Sharing scheme, the dealer in the VSS scheme
    192 also can't be trusted with the knowledge of the original core secret.
    193 
    194 Because in Anastasis each user can act as their own trusted dealer,
    195 the shares must not be verified and therefore Anastasis do not need
    196 any form of VSS.
    197 
    198 \subsubsection{Distributed key generation}
    199 
    200 Distributed key generation (DKG) algorithms solve the problem of
    201 needing a trustworthy dealer by instead relying on a threshold of
    202 honest persons for key generation. Contrary to the above-mentioned
    203 schemes, in distributed key generation algorithms every participant is
    204 involved in key generation.  The Pederson DKG is such ``a secret
    205 sharing scheme without a mutually trusted
    206 authority''~\cite{pedersen_sharing_5.2}. Basically, this DKG works as
    207 follows: First, each involved party generates a pre-secret and
    208 distributes it to all parties using the verifiable secret sharing
    209 scheme of Feldman.  Afterwards, each party recombines the received
    210 shares, including its own pre-secret, to a share of the main
    211 secret. The main secret can be reconstructed by summing up each
    212 recombination of the shared pre-secrets.
    213 
    214 Because in Anastasis each user can act as their own trusted dealer, we
    215 also do not worry about the dealer learning the user's key and hence
    216 Anastasis do not need any form of DKG.
    217 
    218 \subsection{Authentication} \label{sec:rel:authentication}
    219 
    220 To build a secure authentication procedure, today multi-factor
    221 authentication is the standard~\cite{multifactor_authentication}. A
    222 single authentication method by itself is usually vulnerable.
    223 Multi-factor authentication combines multiple authentication
    224 procedures to enhance the security of the system.
    225 
    226 During procedure of some authentication methods a so called token is 
    227 sent to the user. The user than has to provide the token to authorize.\\
    228 The token should be a randomly generated passphrase which has at 
    229 least 128 bits of entropy. It is best practice for a token to have an 
    230 expiration time, although this is not relevant for security of Anastasis.\\
    231 
    232 Anastasis is designed to use a wide range of authentication methods to
    233 authenticate its users. Even though the user in Anastasis is free to
    234 specify only one authentication method, we strongly recommend the use
    235 of multi-factor authentication, typically using different
    236 authentication methods at different providers.
    237 
    238 A short overview of common authentication methods and issues with
    239 each of them is presented here.
    240 
    241 \subsubsection{Password authentication}
    242 
    243 Password authentication is probably the most widely used
    244 authentication procedure. But as studies show the procedure has its
    245 drawbacks~\cite{authentication_methods_review}. For example the
    246 handling of the passwords, like storage or transmission, often is done
    247 poorly. Another problem is that the user must remember his
    248 password. Therefore the password is limited to the capabilities of the
    249 user to remember it. Thus people tend to use passwords with low
    250 entropy. Those passwords are vulnerable to brute force attacks or
    251 dictionary attacks. Another problem using passwords is the possibility
    252 of replay attacks: A password can be stolen by an eavesdropper during
    253 online transmission and used by the attacker.
    254 
    255 Because passwords can be forgotten, we do not recommend using this
    256 method for provider-authentication in Anastasis. Users could easily
    257 add a passwords into their set of ``invariant'' attributes used to
    258 derive the identity key, and then would automatically obtain all of
    259 the possible benefits (and drawbacks) from using a password.
    260 Specifically, they must make sure that the password cannot be
    261 forgotten, even if it means that the password has low entropy.
    262 
    263 \subsubsection{Secure question}
    264 
    265 Similar to password authentication the use of an authentication method
    266 based on a secure question requires the user to remember the correct
    267 answer to a specific question. The difference here is that the
    268 question provides a context that helps the user to remember the answer
    269 and the user does not necessarily need to memorize something
    270 new~\cite{just2004}.
    271 
    272 There are several variations to implement authentication using a
    273 secure question:
    274 
    275 \begin{itemize}
    276  \item The questions and answers are predefined.
    277  \item Just the questions are predefined.
    278  \item The user is free to create custom questions and answers.
    279 \end{itemize}
    280 
    281 The first option is the easiest one. But predefining the answers has
    282 the disadvantage being impersonal and inflexible. The questions must
    283 inevitably be general, which may allow an attacker to obtain answers
    284 by collecting public information about the victim, or even simply
    285 solving the challenge by brute-forcing trying all possible choices.
    286 Therefore the first option is not ideal.
    287 
    288 The second option is more applicable but has some drawbacks, too. For
    289 example there may be questions whose answers have multiple syntactic
    290 representations (for example, ``St.'' versus
    291 ``Street'')~\cite{just2004}. Another problem could be a question whose
    292 answer may change over time. Asking for the favourite actor for
    293 example could be problematic. In addition, there is a challenge to
    294 define questions for all kind of people. Some people for example could
    295 not answer to the question, what the name of their pet is, because
    296 they do not have one.
    297 
    298 In case of the third option, we have all of the issues of the second
    299 one, but additionally there is the difficulty for the user to ask
    300 creative questions. A good question should only be answerable by the
    301 user. Also, it would be perfect to have the attacker on the wrong
    302 track by using ambiguous questions with word plays the adversary
    303 cannot easily comprehend.
    304 
    305 Authentication using a secure question requires checking the validity
    306 of an answer that may include private personal information.
    307 Consequently, Anastasis does not store the answers of secure questions
    308 in cleartext. Instead, Anastasis only stores the hash value of a
    309 (salted) answer.  Thus the user only has to provide the hash value of
    310 the answer and not disclose the answer itself.
    311 
    312 \subsubsection{SMS authentication}
    313 
    314 Another way to authenticate users that have a mobile phone is to use
    315 SMS authentication. The most popular use case is the so called Mobile
    316 TAN used to authorize online banking transactions. A Mobile TAN is an
    317 SMS based One-Time Password (OTP), short SMS OTP. SMS OTPs ``were
    318 introduced to counter phishing and other attacks against
    319 authentication and authorization of Internet
    320 services''~\cite{MBSS2013}.
    321 
    322 However, SMS authentication is not very secure, as it relies on the
    323 security of the mobile network, which has various
    324 vulnerabilities~\cite{rieck_detection}. There are also specialized
    325 mobile trojans which are used to eavesdrop on these messages directly
    326 on the user's mobile device.
    327 
    328 While likely not as sensitive as answers to security questions, we
    329 still consider user's phone numbers as private information that
    330 deserves protection.  Naturally, a service authenticating the user
    331 needs the phone number to send a message to the user during SMS
    332 authentication.
    333 
    334 Hence, Anastasis providers have to learn the phone number during SMS
    335 authentication.  However, we can use cryptography to ensure that the
    336 provider only gets the keys to decrypt the phone number when the
    337 authentication process is started by the user as part of a recovery
    338 operation. Thus, a compromise of the provider's database would not
    339 directly reveal the phone numbers to the attacker.
    340 
    341 
    342 \subsubsection{E-mail authentication}
    343 
    344 Authentication by email is similar to SMS authentication. Here,
    345 the user receives a token by email and has to provide it during the
    346 authentication process.
    347 
    348 It is important that the email should not already contain the
    349 requested information, so in the case of Anastasis the keyshare.  This
    350 is because the SMTP protocol used for email offers no hard security
    351 assurances. In particular, the email is likely to be stored for a
    352 indefinite period in the user's mailbox, which could be easily
    353 compromised and read by a mailbox provider.~\cite{emailauthowasp}
    354 
    355 Like with SMS authentication, Anastasis also encrypts the email
    356 addresses when they are stored at the provider.  The user has to
    357 provide the corresponding decryption key to the server during
    358 the authentication process.
    359 
    360 
    361 \subsubsection{VideoIdent}
    362 
    363 VideoIdent uses a video chat to verify the identity of a user. The
    364 user needs to show their face using a camera to an employee of the
    365 VideoIdent service. The service then verifies the identity of the user
    366 by comparing the video stream to a picture of the
    367 user~\cite{pohlmann2017}.
    368 
    369 Prerequisites for error-free identification are a video camera with
    370 good video quality and a high-resolution image of the user on which
    371 the face can be clearly seen. The user should also not change their
    372 outward appearance too much over time. For example, growing or
    373 trimming a beard could lead to the VideoIdent-service employee not
    374 being able to recognise a user with sufficient confidence.
    375 
    376 For an attacker who looks similar to the user, there is a chance that
    377 the employee incorrectly confirms the identification.
    378 
    379 %CG: that's IMO then e-mail based verification, should not mix
    380 %    the two: this is basically multi-factor, so I'd leave it out here!
    381 %Therefore, some interaction of the user is needed like for example
    382 %telling the employee a short code which has been sent right before to
    383 %the user by mail.
    384 
    385 In Anastasis, pictures of users for VideoIdent authentication are
    386 considered private information stored encrypted at the providers.
    387 During the authentication process, the user has to provide the correct
    388 key for decryption to the service.
    389 
    390 \subsubsection{PostIdent}
    391 
    392 It is also possible to sent a verification code to the user by
    393 physical mail. A major drawback of this authentication method is
    394 that it has high latency, and there is also the possibility that
    395 physical mail gets intercepted or lost during transmission.
    396 
    397 Anastasis providers using PostIndent would not store the address of
    398 their users in cleartext. Instead the address is encrypted by the user
    399 and the provider would receive the key to decrypt the address only
    400 during the authentication process.
    401 
    402 \subsubsection{Biometric authentication}
    403 
    404 Another way of authenticating is the biometric
    405 approach~\cite{biometric_auth}. Biometric authentication is based on
    406 ``something you are'', like your iris or your fingerprint.
    407 
    408 Biometric authentication is highly problematic because the attributes
    409 are invariant and frequently shared involuntarily.  Unlike passphrases
    410 or phone numbers, users cannot change their genome or fingerprint in
    411 case their private biometric information is exposed.  Furthermore,
    412 there are credible threats against biometric authentication, in
    413 paritcular there are documented inexpensive attacks against
    414 fingerprint and iris scan authentication. For example, a member of the
    415 German CCC e.V. was able to generate replicas from Angela Merkel's
    416 iris and Ursula von der Leyen's fingerprint~\cite{ccc_merkel}.
    417 
    418 
    419 
    420 \subsection{Existing solutions for key recovery}
    421 
    422 This section introduces some existing solutions for key recovery and
    423 why they are problematic.
    424 
    425 
    426 \subsubsection{Coinbase}
    427 
    428 Coinbase\footnote{\url{https://www.coinbase.com/}} is a global digital
    429 asset exchange company, providing a venue to buy and sell digital
    430 currencies. Coinbase also uses wallets secured with private keys. To
    431 recover this private key the user has to provide a 12 words recovery
    432 phrase.
    433 
    434 Coinbase offers a solution to securely deposit this recovery phrase
    435 onto the users Google Drive or iCloud.~\cite{coinbase} The security
    436 here lies within the Google or iCloud account and another password
    437 used to encrypt the security phrase. The problem here is that this
    438 approach undermines confidentiality, as encrypting a strong key with a
    439 weak key simply reduces the security to that of the weaker key.
    440 
    441 \subsubsection{MIDATA}
    442 
    443 MIDATA is a project that aims to give patients back control over their
    444 medical data and to enable them to share their data only with those
    445 they trust.\footnote{\url{https://www.midata.coop/}} In case a patient
    446 lost their device with the MIDATA-application and also forgot their
    447 MIDATA password, MIDATA provides a key recovery system using the
    448 Shamir Secret Sharing scheme (as described in
    449 Section~\ref{sec:rel:shamir}).
    450 
    451 In their case, a few ``persons working at MIDATA have generated a
    452 public-private key pair (Recovery key) on their own computer. They
    453 keep the private recovery key for themselves and never share it. The
    454 public keys are made public so that the apps can also access
    455 them''~\cite{midata}. Using Shamir's Secret Sharing the MIDATA
    456 application splits the user's app private key into 5 parts which are
    457 encrypted with one of the published recovery keys. The encrypted parts
    458 are then stored into the MIDATA server. During the recovery process at
    459 least two of the 5 parts need to be decrypted by the persons owning
    460 the private key part of the recovery key. ``The decrypted parts are
    461 sent to the server and {\em the server} may now reconstruct the app
    462 private key if enough parts of the key have been
    463 decrypted''~\cite{midata}. (Emphasis ours.)
    464 
    465 The security of MIDATA as described in ``Patient empowerment in IoT
    466 for eHealth - How to deal with lost keys?''~\cite{midata} is broken in
    467 three ways:
    468 
    469 \begin{enumerate}
    470  \item The password is reconstructed at {\em the server}, not on the
    471    patients device. An administrator of the server could thus
    472    access the recovered password at that time.  It would be better
    473    to reconstruct the password only on the patients device.
    474  \item It is not clear which authentication methods the persons
    475    working for MIDATA use for their decisions and activities regarding
    476    the key recovery. The business process used here could be
    477    vulnerable, and it is not clear whether multi-factor authentication
    478    is used. As a result, we worry that it may be possible for an attacker
    479    to successfully use social engineering via email (or other means)
    480    to illegitimately trigger a recovery process.
    481  \item The MIDATA system also does not offer any trust agility~\cite{marlinspike2011}.
    482    The user is forced to accept the 2-out-of-5 rule with trustees
    483    provided by MIDATA.
    484 \end{enumerate}
	anastasis Credential backup and recovery protocol and service
	Log \| Files \| Refs \| Submodules \| README \| LICENSE