related_work.tex (23192B)
1 \section{Related work} 2 3 This chapter explains some important cryptographic functions and which 4 are used in Anastasis or related to our work. We also describe issues 5 with existing solutions in this domain. 6 7 \subsection{Cryptographic primitives} 8 9 \subsubsection{Pseudo-randomness} 10 11 A pseudo random generator (PRG) is an algorithm producing an infinite 12 sequence of bits for which there is no efficient algorithm to 13 distinguish it from a truly random sequence~\cite{vadhan2012}. The 14 algorithm ``takes as input a short, perfectly random 15 seed''~\cite{vadhan2012} which determines the output value.\\ 16 17 A pseudo random function (PRF) is a deterministic function which 18 output is finite and indistinguishable from a true random 19 function.~\cite{nielsen2002} PRFs can be constructed using 20 PRGs.~\cite{GGM1986} 21 22 \subsubsection{Hash function} 23 24 Hash functions "compress a string of arbitrary length to a string of 25 fixed length [...]"~\cite{Preneel1999}. The output of a hash function 26 often is called a "hash". Hash functions in general should be very 27 fast to compute. Cryptographic hash functions need to fulfil 28 additional security requirements which are 29 30 \begin{itemize} 31 \item (first) pre-image resistance, 32 \item second pre-image resistance, 33 \item collision resistance, 34 \item pseudo randomness, and the 35 \item avalanche effect. 36 \end{itemize} 37 38 Pre-image resistance, also called the ``one way property'', means that 39 for a given hash function $H$ and a hash value $H(x)$, it is 40 computationally infeasible to find $x$.~\cite{SG2012} For example, 41 since in Anastasis we derive the key to encrypt the personal details 42 required for user authentication (e.g. the mobile phone number for 43 authentication via SMS) using functions based on hash functions (see 44 HKDF), it is very important that you cannot derive the corresponding 45 input values from the key. 46 47 The second pre-image resistance is described by following: For a given 48 hash function $H$ and a hash value $H(x)$, it is computationally 49 infeasible to find $x$ and $x'$ such that $H(x) = H(x')$ and $x \not= 50 x'$.~\cite{SG2012} In Anastasis hash functions also are involved in 51 signing our so called recovery document. Hence an attacker should not 52 be able to create a malicious recovery document with the same hash 53 value as the original one. 54 55 The definition of collision resistance slightly differs from the 56 second pre-image resistance: For a given hash function $H$, it is 57 computationally infeasible to find a pair $x, y$ such that $H(x) = 58 H(y)$ \cite{SG2012}. 59 %CG: the text below does NOT related to collision resistance! 60 %As we are using HKDFs for deriving keys in 61 %Anastasis, an attacker should not be able to find some other input 62 %values also leading to the same keys we use. 63 Anastasis does not rely upon collision resistance in its use of hash 64 functions. % CG: at least no case comes to mind for me right now... 65 66 A cryptographic hash function should also behave as a pseudo random 67 function. This means that although a hash function is purely 68 deterministic, the output must not be predictable. 69 70 The avalanche effect describes the property of an algorithm that 71 causes a significant change of the output value, usually a bit 72 flipping of more than half the output is desired, if the input is 73 changed slightly (for example, flipping a single bit).~\cite{RK2011} 74 The more bits are flipping in the output value the higher the entropy 75 of the randomness of the hash function. 76 77 There are many applications for cryptographic hash functions. For 78 example, you can store the hash value of a passphrase instead of the 79 passphrase itself in a computer to protect the passphrase. Another 80 important application is verification of message integrity: Before and 81 after transmission of a message one can calculate the hash values of 82 it and compare hashes later to determine if the message changed during 83 transmission. 84 85 In Anastasis we use SHA-512~\cite{GJW2011} for fast hash functions. 86 87 \subsubsection{HMAC} 88 89 When it comes to integrity of messages during communication of two 90 parties over an insecure channel Keyed-Hash Message Authentication 91 Codes (HMAC) are used as check values. An HMAC function is based on a 92 hash function and takes two arguments, a key $K$ and a message $M$: 93 94 $HMAC_{K}(M) = H(K \oplus opad,H(K \oplus ipad, M))$ with "ipad" and 95 "opad" being constants which fill up the key $K$ to the blocksize of 96 the hash function~\cite{BCK1996}. The blocksize of a modern hash 97 function like SHA-512 is 64 bytes. 98 99 \subsubsection{HKDF} 100 101 A HKDF is a key derivation function (KDF) based on HMAC. A KDF ``is a 102 basic and essential component of cryptographic systems: Its goal is 103 to take a source of initial keying material, usually containing some 104 good amount of randomness, but not distributed uniformly or for which 105 an attacker has some partial knowledge, and derive from it one or more 106 cryptographically strong secret keys''~\cite{krawczyk2010}. 107 108 Anastasis uses HKDFs based on SHA-512 to derive symmetric keys for 109 encryption. 110 111 \subsubsection{Argon2} 112 113 Hash functions like SHA-512 are designed to be very fast. Therefore 114 passwords being stored using this kind of hash are vulnerable to 115 dictionary attacks with new hardware architectures like 116 FPGAs~\cite{trimberger2012} and dedicated ASIC~\cite{madurawe2006} 117 modules. But those architectures ``experience difficulties when 118 operating on large amount of memory''~\cite{BDK2016}. 119 120 In contrast to standard hash functions there are functions designed to 121 be memory-hard. Argon2 is such a memory-hard function that won the 122 Password Hashing Competition in 2015. It minimizes time-memory 123 tradeoff~\cite{stamp2003} and thus maximizes the costs to implement an 124 ASIC for given CPU computing time~\cite{BDK2016}. Aside from the fact 125 that Argon2 makes dictionary attacks much harder, Argon2 can be used 126 for another feature too: Memory-hard schemes like Argon2 are very 127 useful for key derivation from low-entropy sources~\cite{BDK2016}. 128 129 Argon2 is used in Anastasis to derive an identifier for the user from 130 the user's attributes, which serve as low-entropy inputs. 131 132 133 \subsection{Secret sharing} 134 135 Secret splitting, also known as secret sharing, is a technique for 136 distributing a secret amongst multiple recipients. This is achieved by 137 assigning a share of the secret to each recipient. By combining a 138 sufficient number of those shares, it is possible to reconstruct the 139 secret. In a secret sharing theme the recipients of a share often are 140 called \textit{players}. The figure who gives a share of the secret to 141 the players is called \textit{dealer}. 142 143 In Anastasis the user is the trusted dealer who splits the secret and 144 also reconstructs it. 145 146 \subsubsection{Shamir's secret sharing} \label{sec:rel:shamir} 147 148 The algorithm ``Shamir's secret sharing'' is probably the most well 149 known secret sharing scheme. It ``divide[s] data D into n pieces in 150 such a way that D is easily reconstructible from any k pieces, but 151 even complete knowledge of $k - 1$ pieces reveals absolutely no 152 information about D''~\cite{shamir_sharing}. 153 154 Shamir’s simple secret sharing scheme has two key limitations. First, 155 it requires a trusted dealer who initially generates the secret to be 156 distributed, and second the shares are not verifiable during 157 reconstruction. Therefore, malicious shareholders could submit corrupt 158 shares to prevent the system from reconstructing the secret --- without 159 these corrupt shareholders being detectable as malicious. Furthermore, 160 the dealer distributing the shares could be corrupt and distribute 161 some inconsistent shares to the others. Also, in some scenarios the 162 dealer cannot be trusted with the knowledge of the original core 163 secret. 164 165 Additionally, Shamir's secret sharing is inflexible because it is a 166 simple $k$-out-of-$n$ threshold scheme. While this makes the scheme 167 reasonably efficient even for big values of $n$, efficiency with 168 respect to a large number of escrow providers and authorization 169 procedures is not important for Anastasis: it is already difficult to 170 conceive users providing more than a handful of authentication methods 171 (Section~\ref{sec:rel:authentication} describes common choices.) 172 173 For Anastasis, we thus decided to opt for more flexible approach that 174 allows complex policies for recovery authorization, instead of only 175 $k$-out-of-$n$. Each user of Anastasis is also able to decide which 176 combinations of \textit{players}, which in case of Anastasis are the 177 escrow providers, shall be permitted. 178 179 \subsubsection{Verifiable secret sharing} 180 181 Verifiability can be achieved by using so called commitment schemes 182 like the Pederson commitment. It allows ``to distribute a secret to n 183 persons such that each person can verify that he has received correct 184 information about the secret without talking with other 185 persons''~\cite{pedersen_sharing_0}. In his paper ``A Practical Scheme 186 for Non-interactive Verifiable Secret 187 Sharing''~\cite{feldman_sharing}, Paul Feldman combines the two 188 schemes Shamir Secret Sharing and Pederson commitment. His algorithm 189 for verifiable secret sharing (VSS), allows each recipient to 190 verify the correctness of their share. But like in the Shamir Secret 191 Sharing scheme, the dealer in the VSS scheme 192 also can't be trusted with the knowledge of the original core secret. 193 194 Because in Anastasis each user can act as their own trusted dealer, 195 the shares must not be verified and therefore Anastasis do not need 196 any form of VSS. 197 198 \subsubsection{Distributed key generation} 199 200 Distributed key generation (DKG) algorithms solve the problem of 201 needing a trustworthy dealer by instead relying on a threshold of 202 honest persons for key generation. Contrary to the above-mentioned 203 schemes, in distributed key generation algorithms every participant is 204 involved in key generation. The Pederson DKG is such ``a secret 205 sharing scheme without a mutually trusted 206 authority''~\cite{pedersen_sharing_5.2}. Basically, this DKG works as 207 follows: First, each involved party generates a pre-secret and 208 distributes it to all parties using the verifiable secret sharing 209 scheme of Feldman. Afterwards, each party recombines the received 210 shares, including its own pre-secret, to a share of the main 211 secret. The main secret can be reconstructed by summing up each 212 recombination of the shared pre-secrets. 213 214 Because in Anastasis each user can act as their own trusted dealer, we 215 also do not worry about the dealer learning the user's key and hence 216 Anastasis do not need any form of DKG. 217 218 \subsection{Authentication} \label{sec:rel:authentication} 219 220 To build a secure authentication procedure, today multi-factor 221 authentication is the standard~\cite{multifactor_authentication}. A 222 single authentication method by itself is usually vulnerable. 223 Multi-factor authentication combines multiple authentication 224 procedures to enhance the security of the system. 225 226 During procedure of some authentication methods a so called token is 227 sent to the user. The user than has to provide the token to authorize.\\ 228 The token should be a randomly generated passphrase which has at 229 least 128 bits of entropy. It is best practice for a token to have an 230 expiration time, although this is not relevant for security of Anastasis.\\ 231 232 Anastasis is designed to use a wide range of authentication methods to 233 authenticate its users. Even though the user in Anastasis is free to 234 specify only one authentication method, we strongly recommend the use 235 of multi-factor authentication, typically using different 236 authentication methods at different providers. 237 238 A short overview of common authentication methods and issues with 239 each of them is presented here. 240 241 \subsubsection{Password authentication} 242 243 Password authentication is probably the most widely used 244 authentication procedure. But as studies show the procedure has its 245 drawbacks~\cite{authentication_methods_review}. For example the 246 handling of the passwords, like storage or transmission, often is done 247 poorly. Another problem is that the user must remember his 248 password. Therefore the password is limited to the capabilities of the 249 user to remember it. Thus people tend to use passwords with low 250 entropy. Those passwords are vulnerable to brute force attacks or 251 dictionary attacks. Another problem using passwords is the possibility 252 of replay attacks: A password can be stolen by an eavesdropper during 253 online transmission and used by the attacker. 254 255 Because passwords can be forgotten, we do not recommend using this 256 method for provider-authentication in Anastasis. Users could easily 257 add a passwords into their set of ``invariant'' attributes used to 258 derive the identity key, and then would automatically obtain all of 259 the possible benefits (and drawbacks) from using a password. 260 Specifically, they must make sure that the password cannot be 261 forgotten, even if it means that the password has low entropy. 262 263 \subsubsection{Secure question} 264 265 Similar to password authentication the use of an authentication method 266 based on a secure question requires the user to remember the correct 267 answer to a specific question. The difference here is that the 268 question provides a context that helps the user to remember the answer 269 and the user does not necessarily need to memorize something 270 new~\cite{just2004}. 271 272 There are several variations to implement authentication using a 273 secure question: 274 275 \begin{itemize} 276 \item The questions and answers are predefined. 277 \item Just the questions are predefined. 278 \item The user is free to create custom questions and answers. 279 \end{itemize} 280 281 The first option is the easiest one. But predefining the answers has 282 the disadvantage being impersonal and inflexible. The questions must 283 inevitably be general, which may allow an attacker to obtain answers 284 by collecting public information about the victim, or even simply 285 solving the challenge by brute-forcing trying all possible choices. 286 Therefore the first option is not ideal. 287 288 The second option is more applicable but has some drawbacks, too. For 289 example there may be questions whose answers have multiple syntactic 290 representations (for example, ``St.'' versus 291 ``Street'')~\cite{just2004}. Another problem could be a question whose 292 answer may change over time. Asking for the favourite actor for 293 example could be problematic. In addition, there is a challenge to 294 define questions for all kind of people. Some people for example could 295 not answer to the question, what the name of their pet is, because 296 they do not have one. 297 298 In case of the third option, we have all of the issues of the second 299 one, but additionally there is the difficulty for the user to ask 300 creative questions. A good question should only be answerable by the 301 user. Also, it would be perfect to have the attacker on the wrong 302 track by using ambiguous questions with word plays the adversary 303 cannot easily comprehend. 304 305 Authentication using a secure question requires checking the validity 306 of an answer that may include private personal information. 307 Consequently, Anastasis does not store the answers of secure questions 308 in cleartext. Instead, Anastasis only stores the hash value of a 309 (salted) answer. Thus the user only has to provide the hash value of 310 the answer and not disclose the answer itself. 311 312 \subsubsection{SMS authentication} 313 314 Another way to authenticate users that have a mobile phone is to use 315 SMS authentication. The most popular use case is the so called Mobile 316 TAN used to authorize online banking transactions. A Mobile TAN is an 317 SMS based One-Time Password (OTP), short SMS OTP. SMS OTPs ``were 318 introduced to counter phishing and other attacks against 319 authentication and authorization of Internet 320 services''~\cite{MBSS2013}. 321 322 However, SMS authentication is not very secure, as it relies on the 323 security of the mobile network, which has various 324 vulnerabilities~\cite{rieck_detection}. There are also specialized 325 mobile trojans which are used to eavesdrop on these messages directly 326 on the user's mobile device. 327 328 While likely not as sensitive as answers to security questions, we 329 still consider user's phone numbers as private information that 330 deserves protection. Naturally, a service authenticating the user 331 needs the phone number to send a message to the user during SMS 332 authentication. 333 334 Hence, Anastasis providers have to learn the phone number during SMS 335 authentication. However, we can use cryptography to ensure that the 336 provider only gets the keys to decrypt the phone number when the 337 authentication process is started by the user as part of a recovery 338 operation. Thus, a compromise of the provider's database would not 339 directly reveal the phone numbers to the attacker. 340 341 342 \subsubsection{E-mail authentication} 343 344 Authentication by email is similar to SMS authentication. Here, 345 the user receives a token by email and has to provide it during the 346 authentication process. 347 348 It is important that the email should not already contain the 349 requested information, so in the case of Anastasis the keyshare. This 350 is because the SMTP protocol used for email offers no hard security 351 assurances. In particular, the email is likely to be stored for a 352 indefinite period in the user's mailbox, which could be easily 353 compromised and read by a mailbox provider.~\cite{emailauthowasp} 354 355 Like with SMS authentication, Anastasis also encrypts the email 356 addresses when they are stored at the provider. The user has to 357 provide the corresponding decryption key to the server during 358 the authentication process. 359 360 361 \subsubsection{VideoIdent} 362 363 VideoIdent uses a video chat to verify the identity of a user. The 364 user needs to show their face using a camera to an employee of the 365 VideoIdent service. The service then verifies the identity of the user 366 by comparing the video stream to a picture of the 367 user~\cite{pohlmann2017}. 368 369 Prerequisites for error-free identification are a video camera with 370 good video quality and a high-resolution image of the user on which 371 the face can be clearly seen. The user should also not change their 372 outward appearance too much over time. For example, growing or 373 trimming a beard could lead to the VideoIdent-service employee not 374 being able to recognise a user with sufficient confidence. 375 376 For an attacker who looks similar to the user, there is a chance that 377 the employee incorrectly confirms the identification. 378 379 %CG: that's IMO then e-mail based verification, should not mix 380 % the two: this is basically multi-factor, so I'd leave it out here! 381 %Therefore, some interaction of the user is needed like for example 382 %telling the employee a short code which has been sent right before to 383 %the user by mail. 384 385 In Anastasis, pictures of users for VideoIdent authentication are 386 considered private information stored encrypted at the providers. 387 During the authentication process, the user has to provide the correct 388 key for decryption to the service. 389 390 \subsubsection{PostIdent} 391 392 It is also possible to sent a verification code to the user by 393 physical mail. A major drawback of this authentication method is 394 that it has high latency, and there is also the possibility that 395 physical mail gets intercepted or lost during transmission. 396 397 Anastasis providers using PostIndent would not store the address of 398 their users in cleartext. Instead the address is encrypted by the user 399 and the provider would receive the key to decrypt the address only 400 during the authentication process. 401 402 \subsubsection{Biometric authentication} 403 404 Another way of authenticating is the biometric 405 approach~\cite{biometric_auth}. Biometric authentication is based on 406 ``something you are'', like your iris or your fingerprint. 407 408 Biometric authentication is highly problematic because the attributes 409 are invariant and frequently shared involuntarily. Unlike passphrases 410 or phone numbers, users cannot change their genome or fingerprint in 411 case their private biometric information is exposed. Furthermore, 412 there are credible threats against biometric authentication, in 413 paritcular there are documented inexpensive attacks against 414 fingerprint and iris scan authentication. For example, a member of the 415 German CCC e.V. was able to generate replicas from Angela Merkel's 416 iris and Ursula von der Leyen's fingerprint~\cite{ccc_merkel}. 417 418 419 420 \subsection{Existing solutions for key recovery} 421 422 This section introduces some existing solutions for key recovery and 423 why they are problematic. 424 425 426 \subsubsection{Coinbase} 427 428 Coinbase\footnote{\url{https://www.coinbase.com/}} is a global digital 429 asset exchange company, providing a venue to buy and sell digital 430 currencies. Coinbase also uses wallets secured with private keys. To 431 recover this private key the user has to provide a 12 words recovery 432 phrase. 433 434 Coinbase offers a solution to securely deposit this recovery phrase 435 onto the users Google Drive or iCloud.~\cite{coinbase} The security 436 here lies within the Google or iCloud account and another password 437 used to encrypt the security phrase. The problem here is that this 438 approach undermines confidentiality, as encrypting a strong key with a 439 weak key simply reduces the security to that of the weaker key. 440 441 \subsubsection{MIDATA} 442 443 MIDATA is a project that aims to give patients back control over their 444 medical data and to enable them to share their data only with those 445 they trust.\footnote{\url{https://www.midata.coop/}} In case a patient 446 lost their device with the MIDATA-application and also forgot their 447 MIDATA password, MIDATA provides a key recovery system using the 448 Shamir Secret Sharing scheme (as described in 449 Section~\ref{sec:rel:shamir}). 450 451 In their case, a few ``persons working at MIDATA have generated a 452 public-private key pair (Recovery key) on their own computer. They 453 keep the private recovery key for themselves and never share it. The 454 public keys are made public so that the apps can also access 455 them''~\cite{midata}. Using Shamir's Secret Sharing the MIDATA 456 application splits the user's app private key into 5 parts which are 457 encrypted with one of the published recovery keys. The encrypted parts 458 are then stored into the MIDATA server. During the recovery process at 459 least two of the 5 parts need to be decrypted by the persons owning 460 the private key part of the recovery key. ``The decrypted parts are 461 sent to the server and {\em the server} may now reconstruct the app 462 private key if enough parts of the key have been 463 decrypted''~\cite{midata}. (Emphasis ours.) 464 465 The security of MIDATA as described in ``Patient empowerment in IoT 466 for eHealth - How to deal with lost keys?''~\cite{midata} is broken in 467 three ways: 468 469 \begin{enumerate} 470 \item The password is reconstructed at {\em the server}, not on the 471 patients device. An administrator of the server could thus 472 access the recovered password at that time. It would be better 473 to reconstruct the password only on the patients device. 474 \item It is not clear which authentication methods the persons 475 working for MIDATA use for their decisions and activities regarding 476 the key recovery. The business process used here could be 477 vulnerable, and it is not clear whether multi-factor authentication 478 is used. As a result, we worry that it may be possible for an attacker 479 to successfully use social engineering via email (or other means) 480 to illegitimately trigger a recovery process. 481 \item The MIDATA system also does not offer any trust agility~\cite{marlinspike2011}. 482 The user is forced to accept the 2-out-of-5 rule with trustees 483 provided by MIDATA. 484 \end{enumerate}