summaryrefslogtreecommitdiff
path: root/doc/system-documentation/related_work.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/system-documentation/related_work.tex')
-rw-r--r--doc/system-documentation/related_work.tex484
1 files changed, 484 insertions, 0 deletions
diff --git a/doc/system-documentation/related_work.tex b/doc/system-documentation/related_work.tex
new file mode 100644
index 0000000..7f4d879
--- /dev/null
+++ b/doc/system-documentation/related_work.tex
@@ -0,0 +1,484 @@
+\section{Related work}
+
+This chapter explains some important cryptographic functions and which
+are used in Anastasis or related to our work. We also describe issues
+with existing solutions in this domain.
+
+\subsection{Cryptographic primitives}
+
+\subsubsection{Pseudo-randomness}
+
+A pseudo random generator (PRG) is an algorithm producing an infinite
+sequence of bits for which there is no efficient algorithm to
+distinguish it from a truly random sequence~\cite{vadhan2012}. The
+algorithm ``takes as input a short, perfectly random
+seed''~\cite{vadhan2012} which determines the output value.\\
+
+A pseudo random function (PRF) is a deterministic function which
+output is finite and indistinguishable from a true random
+function.~\cite{nielsen2002} PRFs can be constructed using
+PRGs.~\cite{GGM1986}
+
+\subsubsection{Hash function}
+
+Hash functions "compress a string of arbitrary length to a string of
+fixed length [...]"~\cite{Preneel1999}. The output of a hash function
+often is called a "hash". Hash functions in general should be very
+fast to compute. Cryptographic hash functions need to fulfil
+additional security requirements which are
+
+\begin{itemize}
+ \item (first) pre-image resistance,
+ \item second pre-image resistance,
+ \item collision resistance,
+ \item pseudo randomness, and the
+ \item avalanche effect.
+\end{itemize}
+
+Pre-image resistance, also called the ``one way property'', means that
+for a given hash function $H$ and a hash value $H(x)$, it is
+computationally infeasible to find $x$.~\cite{SG2012} For example,
+since in Anastasis we derive the key to encrypt the personal details
+required for user authentication (e.g. the mobile phone number for
+authentication via SMS) using functions based on hash functions (see
+HKDF), it is very important that you cannot derive the corresponding
+input values from the key.
+
+The second pre-image resistance is described by following: For a given
+hash function $H$ and a hash value $H(x)$, it is computationally
+infeasible to find $x$ and $x'$ such that $H(x) = H(x')$ and $x \not=
+x'$.~\cite{SG2012} In Anastasis hash functions also are involved in
+signing our so called recovery document. Hence an attacker should not
+be able to create a malicious recovery document with the same hash
+value as the original one.
+
+The definition of collision resistance slightly differs from the
+second pre-image resistance: For a given hash function $H$, it is
+computationally infeasible to find a pair $x, y$ such that $H(x) =
+H(y)$ \cite{SG2012}.
+%CG: the text below does NOT related to collision resistance!
+%As we are using HKDFs for deriving keys in
+%Anastasis, an attacker should not be able to find some other input
+%values also leading to the same keys we use.
+Anastasis does not rely upon collision resistance in its use of hash
+functions. % CG: at least no case comes to mind for me right now...
+
+A cryptographic hash function should also behave as a pseudo random
+function. This means that although a hash function is purely
+deterministic, the output must not be predictable.
+
+The avalanche effect describes the property of an algorithm that
+causes a significant change of the output value, usually a bit
+flipping of more than half the output is desired, if the input is
+changed slightly (for example, flipping a single bit).~\cite{RK2011}
+The more bits are flipping in the output value the higher the entropy
+of the randomness of the hash function.
+
+There are many applications for cryptographic hash functions. For
+example, you can store the hash value of a passphrase instead of the
+passphrase itself in a computer to protect the passphrase. Another
+important application is verification of message integrity: Before and
+after transmission of a message one can calculate the hash values of
+it and compare hashes later to determine if the message changed during
+transmission.
+
+In Anastasis we use SHA-512~\cite{GJW2011} for fast hash functions.
+
+\subsubsection{HMAC}
+
+When it comes to integrity of messages during communication of two
+parties over an insecure channel Keyed-Hash Message Authentication
+Codes (HMAC) are used as check values. An HMAC function is based on a
+hash function and takes two arguments, a key $K$ and a message $M$:
+
+$HMAC_{K}(M) = H(K \oplus opad,H(K \oplus ipad, M))$ with "ipad" and
+"opad" being constants which fill up the key $K$ to the blocksize of
+the hash function~\cite{BCK1996}. The blocksize of a modern hash
+function like SHA-512 is 64 bytes.
+
+\subsubsection{HKDF}
+
+A HKDF is a key derivation function (KDF) based on HMAC. A KDF ``is a
+basic and essential component of cryptographic systems: Its goal is
+to take a source of initial keying material, usually containing some
+good amount of randomness, but not distributed uniformly or for which
+an attacker has some partial knowledge, and derive from it one or more
+cryptographically strong secret keys''~\cite{krawczyk2010}.
+
+Anastasis uses HKDFs based on SHA-512 to derive symmetric keys for
+encryption.
+
+\subsubsection{Argon2}
+
+Hash functions like SHA-512 are designed to be very fast. Therefore
+passwords being stored using this kind of hash are vulnerable to
+dictionary attacks with new hardware architectures like
+FPGAs~\cite{trimberger2012} and dedicated ASIC~\cite{madurawe2006}
+modules. But those architectures ``experience difficulties when
+operating on large amount of memory''~\cite{BDK2016}.
+
+In contrast to standard hash functions there are functions designed to
+be memory-hard. Argon2 is such a memory-hard function that won the
+Password Hashing Competition in 2015. It minimizes time-memory
+tradeoff~\cite{stamp2003} and thus maximizes the costs to implement an
+ASIC for given CPU computing time~\cite{BDK2016}. Aside from the fact
+that Argon2 makes dictionary attacks much harder, Argon2 can be used
+for another feature too: Memory-hard schemes like Argon2 are very
+useful for key derivation from low-entropy sources~\cite{BDK2016}.
+
+Argon2 is used in Anastasis to derive an identifier for the user from
+the user's attributes, which serve as low-entropy inputs.
+
+
+\subsection{Secret sharing}
+
+Secret splitting, also known as secret sharing, is a technique for
+distributing a secret amongst multiple recipients. This is achieved by
+assigning a share of the secret to each recipient. By combining a
+sufficient number of those shares, it is possible to reconstruct the
+secret. In a secret sharing theme the recipients of a share often are
+called \textit{players}. The figure who gives a share of the secret to
+the players is called \textit{dealer}.
+
+In Anastasis the user is the trusted dealer who splits the secret and
+also reconstructs it.
+
+\subsubsection{Shamir's secret sharing} \label{sec:rel:shamir}
+
+The algorithm ``Shamir's secret sharing'' is probably the most well
+known secret sharing scheme. It ``divide[s] data D into n pieces in
+such a way that D is easily reconstructible from any k pieces, but
+even complete knowledge of $k - 1$ pieces reveals absolutely no
+information about D''~\cite{shamir_sharing}.
+
+Shamir’s simple secret sharing scheme has two key limitations. First,
+it requires a trusted dealer who initially generates the secret to be
+distributed, and second the shares are not verifiable during
+reconstruction. Therefore, malicious shareholders could submit corrupt
+shares to prevent the system from reconstructing the secret --- without
+these corrupt shareholders being detectable as malicious. Furthermore,
+the dealer distributing the shares could be corrupt and distribute
+some inconsistent shares to the others. Also, in some scenarios the
+dealer cannot be trusted with the knowledge of the original core
+secret.
+
+Additionally, Shamir's secret sharing is inflexible because it is a
+simple $k$-out-of-$n$ threshold scheme. While this makes the scheme
+reasonably efficient even for big values of $n$, efficiency with
+respect to a large number of escrow providers and authorization
+procedures is not important for Anastasis: it is already difficult to
+conceive users providing more than a handful of authentication methods
+(Section~\ref{sec:rel:authentication} describes common choices.)
+
+For Anastasis, we thus decided to opt for more flexible approach that
+allows complex policies for recovery authorization, instead of only
+$k$-out-of-$n$. Each user of Anastasis is also able to decide which
+combinations of \textit{players}, which in case of Anastasis are the
+escrow providers, shall be permitted.
+
+\subsubsection{Verifiable secret sharing}
+
+Verifiability can be achieved by using so called commitment schemes
+like the Pederson commitment. It allows ``to distribute a secret to n
+persons such that each person can verify that he has received correct
+information about the secret without talking with other
+persons''~\cite{pedersen_sharing_0}. In his paper ``A Practical Scheme
+for Non-interactive Verifiable Secret
+Sharing''~\cite{feldman_sharing}, Paul Feldman combines the two
+schemes Shamir Secret Sharing and Pederson commitment. His algorithm
+for verifiable secret sharing (VSS), allows each recipient to
+verify the correctness of their share. But like in the Shamir Secret
+Sharing scheme, the dealer in the VSS scheme
+also can't be trusted with the knowledge of the original core secret.
+
+Because in Anastasis each user can act as their own trusted dealer,
+the shares must not be verified and therefore Anastasis do not need
+any form of VSS.
+
+\subsubsection{Distributed key generation}
+
+Distributed key generation (DKG) algorithms solve the problem of
+needing a trustworthy dealer by instead relying on a threshold of
+honest persons for key generation. Contrary to the above-mentioned
+schemes, in distributed key generation algorithms every participant is
+involved in key generation. The Pederson DKG is such ``a secret
+sharing scheme without a mutually trusted
+authority''~\cite{pedersen_sharing_5.2}. Basically, this DKG works as
+follows: First, each involved party generates a pre-secret and
+distributes it to all parties using the verifiable secret sharing
+scheme of Feldman. Afterwards, each party recombines the received
+shares, including its own pre-secret, to a share of the main
+secret. The main secret can be reconstructed by summing up each
+recombination of the shared pre-secrets.
+
+Because in Anastasis each user can act as their own trusted dealer, we
+also do not worry about the dealer learning the user's key and hence
+Anastasis do not need any form of DKG.
+
+\subsection{Authentication} \label{sec:rel:authentication}
+
+To build a secure authentication procedure, today multi-factor
+authentication is the standard~\cite{multifactor_authentication}. A
+single authentication method by itself is usually vulnerable.
+Multi-factor authentication combines multiple authentication
+procedures to enhance the security of the system.
+
+During procedure of some authentication methods a so called token is
+sent to the user. The user than has to provide the token to authorize.\\
+The token should be a randomly generated passphrase which has at
+least 128 bits of entropy. It is best practice for a token to have an
+expiration time, although this is not relevant for security of Anastasis.\\
+
+Anastasis is designed to use a wide range of authentication methods to
+authenticate its users. Even though the user in Anastasis is free to
+specify only one authentication method, we strongly recommend the use
+of multi-factor authentication, typically using different
+authentication methods at different providers.
+
+A short overview of common authentication methods and issues with
+each of them is presented here.
+
+\subsubsection{Password authentication}
+
+Password authentication is probably the most widely used
+authentication procedure. But as studies show the procedure has its
+drawbacks~\cite{authentication_methods_review}. For example the
+handling of the passwords, like storage or transmission, often is done
+poorly. Another problem is that the user must remember his
+password. Therefore the password is limited to the capabilities of the
+user to remember it. Thus people tend to use passwords with low
+entropy. Those passwords are vulnerable to brute force attacks or
+dictionary attacks. Another problem using passwords is the possibility
+of replay attacks: A password can be stolen by an eavesdropper during
+online transmission and used by the attacker.
+
+Because passwords can be forgotten, we do not recommend using this
+method for provider-authentication in Anastasis. Users could easily
+add a passwords into their set of ``invariant'' attributes used to
+derive the identity key, and then would automatically obtain all of
+the possible benefits (and drawbacks) from using a password.
+Specifically, they must make sure that the password cannot be
+forgotten, even if it means that the password has low entropy.
+
+\subsubsection{Secure question}
+
+Similar to password authentication the use of an authentication method
+based on a secure question requires the user to remember the correct
+answer to a specific question. The difference here is that the
+question provides a context that helps the user to remember the answer
+and the user does not necessarily need to memorize something
+new~\cite{just2004}.
+
+There are several variations to implement authentication using a
+secure question:
+
+\begin{itemize}
+ \item The questions and answers are predefined.
+ \item Just the questions are predefined.
+ \item The user is free to create custom questions and answers.
+\end{itemize}
+
+The first option is the easiest one. But predefining the answers has
+the disadvantage being impersonal and inflexible. The questions must
+inevitably be general, which may allow an attacker to obtain answers
+by collecting public information about the victim, or even simply
+solving the challenge by brute-forcing trying all possible choices.
+Therefore the first option is not ideal.
+
+The second option is more applicable but has some drawbacks, too. For
+example there may be questions whose answers have multiple syntactic
+representations (for example, ``St.'' versus
+``Street'')~\cite{just2004}. Another problem could be a question whose
+answer may change over time. Asking for the favourite actor for
+example could be problematic. In addition, there is a challenge to
+define questions for all kind of people. Some people for example could
+not answer to the question, what the name of their pet is, because
+they do not have one.
+
+In case of the third option, we have all of the issues of the second
+one, but additionally there is the difficulty for the user to ask
+creative questions. A good question should only be answerable by the
+user. Also, it would be perfect to have the attacker on the wrong
+track by using ambiguous questions with word plays the adversary
+cannot easily comprehend.
+
+Authentication using a secure question requires checking the validity
+of an answer that may include private personal information.
+Consequently, Anastasis does not store the answers of secure questions
+in cleartext. Instead, Anastasis only stores the hash value of a
+(salted) answer. Thus the user only has to provide the hash value of
+the answer and not disclose the answer itself.
+
+\subsubsection{SMS authentication}
+
+Another way to authenticate users that have a mobile phone is to use
+SMS authentication. The most popular use case is the so called Mobile
+TAN used to authorize online banking transactions. A Mobile TAN is an
+SMS based One-Time Password (OTP), short SMS OTP. SMS OTPs ``were
+introduced to counter phishing and other attacks against
+authentication and authorization of Internet
+services''~\cite{MBSS2013}.
+
+However, SMS authentication is not very secure, as it relies on the
+security of the mobile network, which has various
+vulnerabilities~\cite{rieck_detection}. There are also specialized
+mobile trojans which are used to eavesdrop on these messages directly
+on the user's mobile device.
+
+While likely not as sensitive as answers to security questions, we
+still consider user's phone numbers as private information that
+deserves protection. Naturally, a service authenticating the user
+needs the phone number to send a message to the user during SMS
+authentication.
+
+Hence, Anastasis providers have to learn the phone number during SMS
+authentication. However, we can use cryptography to ensure that the
+provider only gets the keys to decrypt the phone number when the
+authentication process is started by the user as part of a recovery
+operation. Thus, a compromise of the provider's database would not
+directly reveal the phone numbers to the attacker.
+
+
+\subsubsection{E-mail authentication}
+
+Authentication by email is similar to SMS authentication. Here,
+the user receives a token by email and has to provide it during the
+authentication process.
+
+It is important that the email should not already contain the
+requested information, so in the case of Anastasis the keyshare. This
+is because the SMTP protocol used for email offers no hard security
+assurances. In particular, the email is likely to be stored for a
+indefinite period in the user's mailbox, which could be easily
+compromised and read by a mailbox provider.~\cite{emailauthowasp}
+
+Like with SMS authentication, Anastasis also encrypts the email
+addresses when they are stored at the provider. The user has to
+provide the corresponding decryption key to the server during
+the authentication process.
+
+
+\subsubsection{VideoIdent}
+
+VideoIdent uses a video chat to verify the identity of a user. The
+user needs to show their face using a camera to an employee of the
+VideoIdent service. The service then verifies the identity of the user
+by comparing the video stream to a picture of the
+user~\cite{pohlmann2017}.
+
+Prerequisites for error-free identification are a video camera with
+good video quality and a high-resolution image of the user on which
+the face can be clearly seen. The user should also not change their
+outward appearance too much over time. For example, growing or
+trimming a beard could lead to the VideoIdent-service employee not
+being able to recognise a user with sufficient confidence.
+
+For an attacker who looks similar to the user, there is a chance that
+the employee incorrectly confirms the identification.
+
+%CG: that's IMO then e-mail based verification, should not mix
+% the two: this is basically multi-factor, so I'd leave it out here!
+%Therefore, some interaction of the user is needed like for example
+%telling the employee a short code which has been sent right before to
+%the user by mail.
+
+In Anastasis, pictures of users for VideoIdent authentication are
+considered private information stored encrypted at the providers.
+During the authentication process, the user has to provide the correct
+key for decryption to the service.
+
+\subsubsection{PostIdent}
+
+It is also possible to sent a verification code to the user by
+physical mail. A major drawback of this authentication method is
+that it has high latency, and there is also the possibility that
+physical mail gets intercepted or lost during transmission.
+
+Anastasis providers using PostIndent would not store the address of
+their users in cleartext. Instead the address is encrypted by the user
+and the provider would receive the key to decrypt the address only
+during the authentication process.
+
+\subsubsection{Biometric authentication}
+
+Another way of authenticating is the biometric
+approach~\cite{biometric_auth}. Biometric authentication is based on
+``something you are'', like your iris or your fingerprint.
+
+Biometric authentication is highly problematic because the attributes
+are invariant and frequently shared involuntarily. Unlike passphrases
+or phone numbers, users cannot change their genome or fingerprint in
+case their private biometric information is exposed. Furthermore,
+there are credible threats against biometric authentication, in
+paritcular there are documented inexpensive attacks against
+fingerprint and iris scan authentication. For example, a member of the
+German CCC e.V. was able to generate replicas from Angela Merkel's
+iris and Ursula von der Leyen's fingerprint~\cite{ccc_merkel}.
+
+
+
+\subsection{Existing solutions for key recovery}
+
+This section introduces some existing solutions for key recovery and
+why they are problematic.
+
+
+\subsubsection{Coinbase}
+
+Coinbase\footnote{\url{https://www.coinbase.com/}} is a global digital
+asset exchange company, providing a venue to buy and sell digital
+currencies. Coinbase also uses wallets secured with private keys. To
+recover this private key the user has to provide a 12 words recovery
+phrase.
+
+Coinbase offers a solution to securely deposit this recovery phrase
+onto the users Google Drive or iCloud.~\cite{coinbase} The security
+here lies within the Google or iCloud account and another password
+used to encrypt the security phrase. The problem here is that this
+approach undermines confidentiality, as encrypting a strong key with a
+weak key simply reduces the security to that of the weaker key.
+
+\subsubsection{MIDATA}
+
+MIDATA is a project that aims to give patients back control over their
+medical data and to enable them to share their data only with those
+they trust.\footnote{\url{https://www.midata.coop/}} In case a patient
+lost their device with the MIDATA-application and also forgot their
+MIDATA password, MIDATA provides a key recovery system using the
+Shamir Secret Sharing scheme (as described in
+Section~\ref{sec:rel:shamir}).
+
+In their case, a few ``persons working at MIDATA have generated a
+public-private key pair (Recovery key) on their own computer. They
+keep the private recovery key for themselves and never share it. The
+public keys are made public so that the apps can also access
+them''~\cite{midata}. Using Shamir's Secret Sharing the MIDATA
+application splits the user's app private key into 5 parts which are
+encrypted with one of the published recovery keys. The encrypted parts
+are then stored into the MIDATA server. During the recovery process at
+least two of the 5 parts need to be decrypted by the persons owning
+the private key part of the recovery key. ``The decrypted parts are
+sent to the server and {\em the server} may now reconstruct the app
+private key if enough parts of the key have been
+decrypted''~\cite{midata}. (Emphasis ours.)
+
+The security of MIDATA as described in ``Patient empowerment in IoT
+for eHealth - How to deal with lost keys?''~\cite{midata} is broken in
+three ways:
+
+\begin{enumerate}
+ \item The password is reconstructed at {\em the server}, not on the
+ patients device. An administrator of the server could thus
+ access the recovered password at that time. It would be better
+ to reconstruct the password only on the patients device.
+ \item It is not clear which authentication methods the persons
+ working for MIDATA use for their decisions and activities regarding
+ the key recovery. The business process used here could be
+ vulnerable, and it is not clear whether multi-factor authentication
+ is used. As a result, we worry that it may be possible for an attacker
+ to successfully use social engineering via email (or other means)
+ to illegitimately trigger a recovery process.
+ \item The MIDATA system also does not offer any trust agility~\cite{marlinspike2011}.
+ The user is forced to accept the 2-out-of-5 rule with trustees
+ provided by MIDATA.
+\end{enumerate}