\section{Related work}

This chapter explains some important cryptographic functions and which
are used in Anastasis or related to our work. We also describe issues
with existing solutions in this domain.

\subsection{Cryptographic primitives}

\subsubsection{Pseudo-randomness}

A pseudo random generator (PRG) is an algorithm producing an infinite
sequence of bits for which there is no efficient algorithm to
distinguish it from a truly random sequence~\cite{vadhan2012}. The
algorithm ``takes as input a short, perfectly random
seed''~\cite{vadhan2012} which determines the output value.\\

A pseudo random function (PRF) is a deterministic function which
output is finite and indistinguishable from a true random
function.~\cite{nielsen2002} PRFs can be constructed using
PRGs.~\cite{GGM1986}

\subsubsection{Hash function}

Hash functions "compress a string of arbitrary length to a string of
fixed length [...]"~\cite{Preneel1999}. The output of a hash function
often is called a "hash".  Hash functions in general should be very
fast to compute. Cryptographic hash functions need to fulfil
additional security requirements which are

\begin{itemize}
 \item (first) pre-image resistance,
 \item second pre-image resistance,
 \item collision resistance,
 \item pseudo randomness, and the
 \item avalanche effect.
\end{itemize}

Pre-image resistance, also called the ``one way property'', means that
for a given hash function $H$ and a hash value $H(x)$, it is
computationally infeasible to find $x$.~\cite{SG2012} For example,
since in Anastasis we derive the key to encrypt the personal details
required for user authentication (e.g. the mobile phone number for
authentication via SMS) using functions based on hash functions (see
HKDF), it is very important that you cannot derive the corresponding
input values from the key.

The second pre-image resistance is described by following: For a given
hash function $H$ and a hash value $H(x)$, it is computationally
infeasible to find $x$ and $x'$ such that $H(x) = H(x')$ and $x \not=
x'$.~\cite{SG2012} In Anastasis hash functions also are involved in
signing our so called recovery document. Hence an attacker should not
be able to create a malicious recovery document with the same hash
value as the original one.

The definition of collision resistance slightly differs from the
second pre-image resistance: For a given hash function $H$, it is
computationally infeasible to find a pair $x, y$ such that $H(x) =
H(y)$ \cite{SG2012}.
%CG: the text below does NOT related to collision resistance!
%As we are using HKDFs for deriving keys in
%Anastasis, an attacker should not be able to find some other input
%values also leading to the same keys we use.
Anastasis does not rely upon collision resistance in its use of hash
functions. % CG: at least no case comes to mind for me right now...

A cryptographic hash function should also behave as a pseudo random
function. This means that although a hash function is purely
deterministic, the output must not be predictable.

The avalanche effect describes the property of an algorithm that
causes a significant change of the output value, usually a bit
flipping of more than half the output is desired, if the input is
changed slightly (for example, flipping a single bit).~\cite{RK2011}
The more bits are flipping in the output value the higher the entropy
of the randomness of the hash function.

There are many applications for cryptographic hash functions. For
example, you can store the hash value of a passphrase instead of the
passphrase itself in a computer to protect the passphrase. Another
important application is verification of message integrity: Before and
after transmission of a message one can calculate the hash values of
it and compare hashes later to determine if the message changed during
transmission.

In Anastasis we use SHA-512~\cite{GJW2011} for fast hash functions.

\subsubsection{HMAC}

When it comes to integrity of messages during communication of two
parties over an insecure channel Keyed-Hash Message Authentication
Codes (HMAC) are used as check values. An HMAC function is based on a
hash function and takes two arguments, a key $K$ and a message $M$:

$HMAC_{K}(M) = H(K \oplus opad,H(K \oplus ipad, M))$ with "ipad" and
"opad" being constants which fill up the key $K$ to the blocksize of
the hash function~\cite{BCK1996}. The blocksize of a modern hash
function like SHA-512 is 64 bytes.

\subsubsection{HKDF}

A HKDF is a key derivation function (KDF) based on HMAC. A KDF ``is a
basic and essential component of cryptographic systems: Its goal is
to take a source of initial keying material, usually containing some
good amount of randomness, but not distributed uniformly or for which
an attacker has some partial knowledge, and derive from it one or more
cryptographically strong secret keys''~\cite{krawczyk2010}.

Anastasis uses HKDFs based on SHA-512 to derive symmetric keys for
encryption.

\subsubsection{Argon2}

Hash functions like SHA-512 are designed to be very fast. Therefore
passwords being stored using this kind of hash are vulnerable to
dictionary attacks with new hardware architectures like
FPGAs~\cite{trimberger2012} and dedicated ASIC~\cite{madurawe2006}
modules. But those architectures ``experience difficulties when
operating on large amount of memory''~\cite{BDK2016}.

In contrast to standard hash functions there are functions designed to
be memory-hard. Argon2 is such a memory-hard function that won the
Password Hashing Competition in 2015. It minimizes time-memory
tradeoff~\cite{stamp2003} and thus maximizes the costs to implement an
ASIC for given CPU computing time~\cite{BDK2016}. Aside from the fact
that Argon2 makes dictionary attacks much harder, Argon2 can be used
for another feature too: Memory-hard schemes like Argon2 are very
useful for key derivation from low-entropy sources~\cite{BDK2016}.

Argon2 is used in Anastasis to derive an identifier for the user from
the user's attributes, which serve as low-entropy inputs.


\subsection{Secret sharing}

Secret splitting, also known as secret sharing, is a technique for
distributing a secret amongst multiple recipients. This is achieved by
assigning a share of the secret to each recipient. By combining a
sufficient number of those shares, it is possible to reconstruct the
secret.  In a secret sharing theme the recipients of a share often are
called \textit{players}. The figure who gives a share of the secret to
the players is called \textit{dealer}.

In Anastasis the user is the trusted dealer who splits the secret and
also reconstructs it.

\subsubsection{Shamir's secret sharing} \label{sec:rel:shamir}

The algorithm ``Shamir's secret sharing'' is probably the most well
known secret sharing scheme. It ``divide[s] data D into n pieces in
such a way that D is easily reconstructible from any k pieces, but
even complete knowledge of $k - 1$ pieces reveals absolutely no
information about D''~\cite{shamir_sharing}.

Shamir’s simple secret sharing scheme has two key limitations. First,
it requires a trusted dealer who initially generates the secret to be
distributed, and second the shares are not verifiable during
reconstruction. Therefore, malicious shareholders could submit corrupt
shares to prevent the system from reconstructing the secret --- without
these corrupt shareholders being detectable as malicious. Furthermore,
the dealer distributing the shares could be corrupt and distribute
some inconsistent shares to the others. Also, in some scenarios the
dealer cannot be trusted with the knowledge of the original core
secret.

Additionally, Shamir's secret sharing is inflexible because it is a
simple $k$-out-of-$n$ threshold scheme.  While this makes the scheme
reasonably efficient even for big values of $n$, efficiency with
respect to a large number of escrow providers and authorization
procedures is not important for Anastasis: it is already difficult to
conceive users providing more than a handful of authentication methods
(Section~\ref{sec:rel:authentication} describes common choices.)

For Anastasis, we thus decided to opt for more flexible approach that
allows complex policies for recovery authorization, instead of only
$k$-out-of-$n$. Each user of Anastasis is also able to decide which
combinations of \textit{players}, which in case of Anastasis are the 
escrow providers, shall be permitted.

\subsubsection{Verifiable secret sharing}

Verifiability can be achieved by using so called commitment schemes
like the Pederson commitment. It allows ``to distribute a secret to n
persons such that each person can verify that he has received correct
information about the secret without talking with other
persons''~\cite{pedersen_sharing_0}. In his paper ``A Practical Scheme
for Non-interactive Verifiable Secret
Sharing''~\cite{feldman_sharing}, Paul Feldman combines the two
schemes Shamir Secret Sharing and Pederson commitment. His algorithm
for verifiable secret sharing (VSS), allows each recipient to
verify the correctness of their share. But like in the Shamir Secret
Sharing scheme, the dealer in the VSS scheme
also can't be trusted with the knowledge of the original core secret.

Because in Anastasis each user can act as their own trusted dealer,
the shares must not be verified and therefore Anastasis do not need
any form of VSS.

\subsubsection{Distributed key generation}

Distributed key generation (DKG) algorithms solve the problem of
needing a trustworthy dealer by instead relying on a threshold of
honest persons for key generation. Contrary to the above-mentioned
schemes, in distributed key generation algorithms every participant is
involved in key generation.  The Pederson DKG is such ``a secret
sharing scheme without a mutually trusted
authority''~\cite{pedersen_sharing_5.2}. Basically, this DKG works as
follows: First, each involved party generates a pre-secret and
distributes it to all parties using the verifiable secret sharing
scheme of Feldman.  Afterwards, each party recombines the received
shares, including its own pre-secret, to a share of the main
secret. The main secret can be reconstructed by summing up each
recombination of the shared pre-secrets.

Because in Anastasis each user can act as their own trusted dealer, we
also do not worry about the dealer learning the user's key and hence
Anastasis do not need any form of DKG.

\subsection{Authentication} \label{sec:rel:authentication}

To build a secure authentication procedure, today multi-factor
authentication is the standard~\cite{multifactor_authentication}. A
single authentication method by itself is usually vulnerable.
Multi-factor authentication combines multiple authentication
procedures to enhance the security of the system.

During procedure of some authentication methods a so called token is 
sent to the user. The user than has to provide the token to authorize.\\
The token should be a randomly generated passphrase which has at 
least 128 bits of entropy. It is best practice for a token to have an 
expiration time, although this is not relevant for security of Anastasis.\\

Anastasis is designed to use a wide range of authentication methods to
authenticate its users. Even though the user in Anastasis is free to
specify only one authentication method, we strongly recommend the use
of multi-factor authentication, typically using different
authentication methods at different providers.

A short overview of common authentication methods and issues with
each of them is presented here.

\subsubsection{Password authentication}

Password authentication is probably the most widely used
authentication procedure. But as studies show the procedure has its
drawbacks~\cite{authentication_methods_review}. For example the
handling of the passwords, like storage or transmission, often is done
poorly. Another problem is that the user must remember his
password. Therefore the password is limited to the capabilities of the
user to remember it. Thus people tend to use passwords with low
entropy. Those passwords are vulnerable to brute force attacks or
dictionary attacks. Another problem using passwords is the possibility
of replay attacks: A password can be stolen by an eavesdropper during
online transmission and used by the attacker.

Because passwords can be forgotten, we do not recommend using this
method for provider-authentication in Anastasis. Users could easily
add a passwords into their set of ``invariant'' attributes used to
derive the identity key, and then would automatically obtain all of
the possible benefits (and drawbacks) from using a password.
Specifically, they must make sure that the password cannot be
forgotten, even if it means that the password has low entropy.

\subsubsection{Secure question}

Similar to password authentication the use of an authentication method
based on a secure question requires the user to remember the correct
answer to a specific question. The difference here is that the
question provides a context that helps the user to remember the answer
and the user does not necessarily need to memorize something
new~\cite{just2004}.

There are several variations to implement authentication using a
secure question:

\begin{itemize}
 \item The questions and answers are predefined.
 \item Just the questions are predefined.
 \item The user is free to create custom questions and answers.
\end{itemize}

The first option is the easiest one. But predefining the answers has
the disadvantage being impersonal and inflexible. The questions must
inevitably be general, which may allow an attacker to obtain answers
by collecting public information about the victim, or even simply
solving the challenge by brute-forcing trying all possible choices.
Therefore the first option is not ideal.

The second option is more applicable but has some drawbacks, too. For
example there may be questions whose answers have multiple syntactic
representations (for example, ``St.'' versus
``Street'')~\cite{just2004}. Another problem could be a question whose
answer may change over time. Asking for the favourite actor for
example could be problematic. In addition, there is a challenge to
define questions for all kind of people. Some people for example could
not answer to the question, what the name of their pet is, because
they do not have one.

In case of the third option, we have all of the issues of the second
one, but additionally there is the difficulty for the user to ask
creative questions. A good question should only be answerable by the
user. Also, it would be perfect to have the attacker on the wrong
track by using ambiguous questions with word plays the adversary
cannot easily comprehend.

Authentication using a secure question requires checking the validity
of an answer that may include private personal information.
Consequently, Anastasis does not store the answers of secure questions
in cleartext. Instead, Anastasis only stores the hash value of a
(salted) answer.  Thus the user only has to provide the hash value of
the answer and not disclose the answer itself.

\subsubsection{SMS authentication}

Another way to authenticate users that have a mobile phone is to use
SMS authentication. The most popular use case is the so called Mobile
TAN used to authorize online banking transactions. A Mobile TAN is an
SMS based One-Time Password (OTP), short SMS OTP. SMS OTPs ``were
introduced to counter phishing and other attacks against
authentication and authorization of Internet
services''~\cite{MBSS2013}.

However, SMS authentication is not very secure, as it relies on the
security of the mobile network, which has various
vulnerabilities~\cite{rieck_detection}. There are also specialized
mobile trojans which are used to eavesdrop on these messages directly
on the user's mobile device.

While likely not as sensitive as answers to security questions, we
still consider user's phone numbers as private information that
deserves protection.  Naturally, a service authenticating the user
needs the phone number to send a message to the user during SMS
authentication.

Hence, Anastasis providers have to learn the phone number during SMS
authentication.  However, we can use cryptography to ensure that the
provider only gets the keys to decrypt the phone number when the
authentication process is started by the user as part of a recovery
operation. Thus, a compromise of the provider's database would not
directly reveal the phone numbers to the attacker.


\subsubsection{E-mail authentication}

Authentication by email is similar to SMS authentication. Here,
the user receives a token by email and has to provide it during the
authentication process.

It is important that the email should not already contain the
requested information, so in the case of Anastasis the keyshare.  This
is because the SMTP protocol used for email offers no hard security
assurances. In particular, the email is likely to be stored for a
indefinite period in the user's mailbox, which could be easily
compromised and read by a mailbox provider.~\cite{emailauthowasp}

Like with SMS authentication, Anastasis also encrypts the email
addresses when they are stored at the provider.  The user has to
provide the corresponding decryption key to the server during
the authentication process.


\subsubsection{VideoIdent}

VideoIdent uses a video chat to verify the identity of a user. The
user needs to show their face using a camera to an employee of the
VideoIdent service. The service then verifies the identity of the user
by comparing the video stream to a picture of the
user~\cite{pohlmann2017}.

Prerequisites for error-free identification are a video camera with
good video quality and a high-resolution image of the user on which
the face can be clearly seen. The user should also not change their
outward appearance too much over time. For example, growing or
trimming a beard could lead to the VideoIdent-service employee not
being able to recognise a user with sufficient confidence.

For an attacker who looks similar to the user, there is a chance that
the employee incorrectly confirms the identification.

%CG: that's IMO then e-mail based verification, should not mix
%    the two: this is basically multi-factor, so I'd leave it out here!
%Therefore, some interaction of the user is needed like for example
%telling the employee a short code which has been sent right before to
%the user by mail.

In Anastasis, pictures of users for VideoIdent authentication are
considered private information stored encrypted at the providers.
During the authentication process, the user has to provide the correct
key for decryption to the service.

\subsubsection{PostIdent}

It is also possible to sent a verification code to the user by
physical mail. A major drawback of this authentication method is
that it has high latency, and there is also the possibility that
physical mail gets intercepted or lost during transmission.

Anastasis providers using PostIndent would not store the address of
their users in cleartext. Instead the address is encrypted by the user
and the provider would receive the key to decrypt the address only
during the authentication process.

\subsubsection{Biometric authentication}

Another way of authenticating is the biometric
approach~\cite{biometric_auth}. Biometric authentication is based on
``something you are'', like your iris or your fingerprint.

Biometric authentication is highly problematic because the attributes
are invariant and frequently shared involuntarily.  Unlike passphrases
or phone numbers, users cannot change their genome or fingerprint in
case their private biometric information is exposed.  Furthermore,
there are credible threats against biometric authentication, in
paritcular there are documented inexpensive attacks against
fingerprint and iris scan authentication. For example, a member of the
German CCC e.V. was able to generate replicas from Angela Merkel's
iris and Ursula von der Leyen's fingerprint~\cite{ccc_merkel}.


\subsection{Existing solutions for key recovery}

This section introduces some existing solutions for key recovery and
why they are problematic.


\subsubsection{Coinbase}

Coinbase\footnote{\url{https://www.coinbase.com/}} is a global digital
asset exchange company, providing a venue to buy and sell digital
currencies. Coinbase also uses wallets secured with private keys. To
recover this private key the user has to provide a 12 words recovery
phrase.

Coinbase offers a solution to securely deposit this recovery phrase
onto the users Google Drive or iCloud.~\cite{coinbase} The security
here lies within the Google or iCloud account and another password
used to encrypt the security phrase. The problem here is that this
approach undermines confidentiality, as encrypting a strong key with a
weak key simply reduces the security to that of the weaker key.

\subsubsection{MIDATA}

MIDATA is a project that aims to give patients back control over their
medical data and to enable them to share their data only with those
they trust.\footnote{\url{https://www.midata.coop/}} In case a patient
lost their device with the MIDATA-application and also forgot their
MIDATA password, MIDATA provides a key recovery system using the
Shamir Secret Sharing scheme (as described in
Section~\ref{sec:rel:shamir}).

In their case, a few ``persons working at MIDATA have generated a
public-private key pair (Recovery key) on their own computer. They
keep the private recovery key for themselves and never share it. The
public keys are made public so that the apps can also access
them''~\cite{midata}. Using Shamir's Secret Sharing the MIDATA
application splits the user's app private key into 5 parts which are
encrypted with one of the published recovery keys. The encrypted parts
are then stored into the MIDATA server. During the recovery process at
least two of the 5 parts need to be decrypted by the persons owning
the private key part of the recovery key. ``The decrypted parts are
sent to the server and {\em the server} may now reconstruct the app
private key if enough parts of the key have been
decrypted''~\cite{midata}. (Emphasis ours.)

The security of MIDATA as described in ``Patient empowerment in IoT
for eHealth - How to deal with lost keys?''~\cite{midata} is broken in
three ways:

\begin{enumerate}
 \item The password is reconstructed at {\em the server}, not on the
   patients device. An administrator of the server could thus
   access the recovered password at that time.  It would be better
   to reconstruct the password only on the patients device.
 \item It is not clear which authentication methods the persons
   working for MIDATA use for their decisions and activities regarding
   the key recovery. The business process used here could be
   vulnerable, and it is not clear whether multi-factor authentication
   is used. As a result, we worry that it may be possible for an attacker
   to successfully use social engineering via email (or other means)
   to illegitimately trigger a recovery process.
 \item The MIDATA system also does not offer any trust agility~\cite{marlinspike2011}.
   The user is forced to accept the 2-out-of-5 rule with trustees
   provided by MIDATA.
\end{enumerate}