depolymerization

wire gateway for Bitcoin/Ethereum
Log | Files | Refs | Submodules | README | LICENSE

commit 074bb420b83fdb6f14e15253bc23cd0360416b8c
parent 7b6d3e75e7774175a942fcbbb3e6119b328053a7
Author: Antoine A <>
Date:   Tue, 29 Mar 2022 17:46:35 +0200

report progress

Diffstat:
M.gitignore | 1+
MCargo.lock | 4++--
Mdocs/figures/settlement_layer.tex | 4++++
Mdocs/report.tex | 143++++++++++++++++++++++++++++++++++++++++++++-----------------------------------
Adocs/tables/5-11.tex | 121+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 208 insertions(+), 65 deletions(-)

diff --git a/.gitignore b/.gitignore @@ -8,5 +8,6 @@ log !/docs/*.bib !/docs/media !/docs/figures +!/docs/tables /tmp taler.conf \ No newline at end of file diff --git a/Cargo.lock b/Cargo.lock @@ -1422,9 +1422,9 @@ dependencies = [ [[package]] name = "redox_syscall" -version = "0.2.11" +version = "0.2.12" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "8380fe0152551244f0747b1bf41737e0f8a74f97a14ccefd1148187271634f3c" +checksum = "8ae183fc1b06c149f0c1793e1eb447c8b04bfe46d48e9e48bfb8d2d7ed64ecf0" dependencies = [ "bitflags", ] diff --git a/docs/figures/settlement_layer.tex b/docs/figures/settlement_layer.tex @@ -45,4 +45,8 @@ \node[right=50mm of D] {\small{Debit}}; \draw[dashed,-stealth] (1.north) |- (off.west); \draw[dashed,-stealth] (off.east) -| (6.north); + + %% Separation + \draw[dotted] (-2.1,-1.4) -- (9,-1.4); + \draw[dotted] (-2.1,-2.6) -- (9,-2.6); \end{tikzpicture} \ No newline at end of file diff --git a/docs/report.tex b/docs/report.tex @@ -32,6 +32,15 @@ short=RPC, long=Remote Procedure Call } +\DeclareAcronym{API}{ +short=API, +long=Application programming interface +} +\DeclareAcronym{HTTP}{ +short=HTTP, +long=Hypertext Transfer Protocol +} + \begin{document} @@ -78,9 +87,7 @@ At the heart of these currencies is a blockchain. A blockchain is an append-only \subsection{Consensus} -The blockchain itself is just a storage system. To make it a \acf{DLT}, it needs a peer-to-peer network to share its changes. But also a way for participants (nodes) to agree on a single state of the chain, to reach consensus in a network where nodes can be malicious and have an economic interest in deceiving others. There are many ways to create such consensus, but only two of them interest us: proof of work and proof of stake. - -% DLT definition ? +The blockchain itself is just a storage system. To make it a \ac{DLT}, it needs a peer-to-peer network to share its changes. But also a way for participants (nodes) to agree on a single state of the chain, to reach consensus in a network where nodes can be malicious and have an economic interest in deceiving others. There are many ways to create such consensus, but only two of them interest us: proof of work and proof of stake. \subsubsection*{Proof of work} @@ -102,7 +109,7 @@ Achieving consensus within the peer-to-peer network requires broadcasting the st \subsubsection*{Reorganisation} -These decentralized consensus mechanisms lead to the creation of competing blockchain states. When two miners broadcast a new valid block in a short period of time, one part of the network may receive them in a different order than another part. As nodes will follow the first valid block they found, we have a blockchain fork where two different blokchain state are followed in the network. Over time, one fork will become longer than the other, and nodes will follow the longer chain. They will replace recent blocks as necessary during a reorganization of the blockchain. A reorganization can cause a transaction previously considered mined by a node to no longer be mined. Therefore, blockchain transactions lack finality. +These decentralized consensus mechanisms lead to the creation of competing blockchain states. When two miners broadcast a new valid block in a short period of time, one part of the network may receive them in a different order than another part. As nodes will follow the first valid block they found, we have a blockchain fork where two different blokchain state are followed in the network. Over time, one fork will become longer than the other, and nodes will follow the longer chain. They will replace recent blocks as necessary during a reorganisation of the blockchain. A reorganisation can cause a transaction previously considered mined by a node to no longer be mined. Therefore, blockchain transactions lack finality. \subsection{Mining incentive} @@ -223,7 +230,7 @@ We know that transactions can get stuck for a long time, which can be problemati \clearpage -\section{Metadata} +\section{Metadata} \label{metadata} Metadata is needed to link a wallet to credits and allow merchants to link deposits to debits. Metadata is stored alongside transactions in the blockchain, so it is possible to recover the full transaction history of any depolymerizer from it. @@ -233,9 +240,9 @@ The goal of our metadata format is to be simple to parse and versioned for futur \subsubsection*{Incoming transaction} -Incoming transaction metadata contains a reserve public key, which is a 32B hash of a Curve25519 public key. We juste prepend a versioning byte to allow future extension. +Incoming transaction metadata contains a reserve public key, which is a 32B hash of a public key. We just prepend a versioning byte to allow future extension. -\begin{figure}[H] +\begin{figure}[h] \begin{center} \begin{bytefield}{33} \bitheader{0,1,32} \\ @@ -247,10 +254,10 @@ Incoming transaction metadata contains a reserve public key, which is a 32B hash \subsubsection*{Outgoing transaction} -Outgoing transactions can be of two types credit or bounce. Credit metadata contains the wire transfer id which is a 32B hash and the exchange base URL which is of variable size. Bounce metadata contains the bounced transaction id which is implementation-dependent but is 32B for Bitcoin and Ethereum. A prepended versioning byte differentiates the two types, 0 being a credit and 254 a bounce. +Outgoing transactions can be of two types credit or bounce. Credit metadata contains the wire transfer ID which is a 32B hash and the exchange base URL which is of variable size. Bounce metadata contains the bounced transaction ID which is implementation-dependent but is 32B for Bitcoin and Ethereum. A prepended versioning byte differentiates the two types, 0 being a credit and 254 a bounce. -\begin{figure}[H] +\begin{figure}[h] \begin{center} \begin{bytefield}[rightcurly=., rightcurlyspace=0pt]{33} \bitheader{0,1,32,33} \\ @@ -273,27 +280,28 @@ Debits are performed from code using OP\_RETURN to store metadata, but credits a We use the latest address type, segwit addresses, which can contain 20B of chosen data. The reserve pub key being 32B, we need two addresses. Therefore, we use two fake addresses consisting of the two key halves prepended with the same random pattern, except for the first bit, which must be 0 for the first half and 1 for the second. We then send a single transaction with three addresses as recipients. -\begin{figure}[H] - \begin{center} - \begin{bytefield}[rightcurly=., rightcurlyspace=0pt]{20} - \bitheader{0,3,4,19} \\ - \begin{rightwordgroup}{Address} - \bitbox{4}{ID} & \bitbox{16}{Half} - \end{rightwordgroup} - \end{bytefield} - - \end{center} - \begin{center} - \begin{bytefield}[rightcurly=., rightcurlyspace=0pt]{32} - \bitheader{0,1,31} \\ - \begin{rightwordgroup}{First ID} - \bitbox{1}{\tiny 0} & \bitbox{31}{Random ID} - \end{rightwordgroup} \\ \\ - \begin{rightwordgroup}{Second ID} - \bitbox{1}{\tiny 1} & \bitbox{31}{Random ID} - \end{rightwordgroup} - \end{bytefield} - \end{center} +\begin{figure}[h] + \centering + \begin{tikzpicture} + \draw[dotted,thick] (-6.4,1.13) -- (-6.4,0.185); + \draw[dotted,thick] (-5.055,1.13) -- (4.23,0.185); + \node { + \begin{bytefield}[rightcurly=., rightcurlyspace=0pt]{32} + \bitheader{0,3,4,19} \\ + \begin{rightwordgroup}{Address} + \bitbox{4}{ID} & \bitbox{16}{Half} + \end{rightwordgroup} \\ \\ + \bitheader{0,1,31} \\ + \begin{rightwordgroup}{First ID} + \bitbox{1}{\tiny 0} & \bitbox{31}{Random ID} + \end{rightwordgroup} \\ + \bitbox[]{32}{or} \\ + \begin{rightwordgroup}{Second ID} + \bitbox{1}{\tiny 1} & \bitbox{31}{Random ID} + \end{rightwordgroup} + \end{bytefield} + }; + \end{tikzpicture} \caption{Outgoing metadata format} \end{figure} @@ -305,24 +313,24 @@ Ethereum is designed around the concept of smart contracts. Logging inside a sma \subsection{Friendly behavior on format error} -When we receive a transaction without any metadata or with an incompatible format (bogus wallet), we want to return the money to its owner (bounce). However, this is dangerous because we have created a potential attack loophole as anyone can now make Depolymerizer do a transaction, by sending a malformed transaction. Depolymerizer takes a bounce fee to make a potential \acs{DOS} attack too costly and charges the recipient the transaction fee to ensure it can not lose money on a bounce. +When we receive a transaction without any metadata or with an incompatible format (bogus wallet), we want to return the money to its owner (bounce). However, this is dangerous because we have created a potential attack loophole as anyone can now make Depolymerizer do a transaction, by sending a malformed transaction. Depolymerizer takes a bounce fee to make a potential \ac{DOS} attack too costly and charges the recipient the transaction fee to ensure it can not lose money on a bounce. \clearpage \section{Architecture} -Each cryptocurrency uses a different \ac{DLT} with its own format and rules, which evolve over time. We do not want to manage the \ac{DLT} logic ourselves, nor do we want to rely on third-party dependencies to implement their support properly and be maintained. The simplest solution is to rely on the official clients and communicate with them via \acs{RPC}. +Each cryptocurrency uses a different \ac{DLT} with its own format and rules, which evolve over time. We do not want to manage the \ac{DLT} logic ourselves, nor do we want to rely on third-party dependencies to implement their support properly and be maintained. The simplest solution is to rely on the official clients and communicate with them via \ac{RPC}. -\begin{figure}[hb] +\begin{figure} \begin{center} \input{figures/depolymerizer_arch.tex} \end{center} \caption{Depolymerizer architecture} \end{figure} -While some parts of Depolymerizer are \ac{DLT} specific, much of the logic is common and we want to reuse it. We have a Wire Gateway component that implements the Taler HTTP API to enable communication with Taler exchanges. Each supported cryptocurrency has its specific adapter to communicate with the official full node client via \acs{RPC}. The Wire Gateway module and the \ac{DLT} adapter use a common database to store transactions and communicate with notifications. +While some parts of Depolymerizer are \ac{DLT} specific, much of the logic is common, and we want to reuse it. We have a Wire Gateway component that implements the Taler \ac{HTTP} \ac{API} to enable communication with Taler exchanges. Each supported cryptocurrency has its specific adapter to communicate with the official full node client via \ac{RPC}. The Wire Gateway module and the \ac{DLT} adapter use a common database to store transactions and communicate with notifications. -\subsection{\acs{DLT} adapter} +\subsection{DLT adapter} The DTL adapter uses an event-based architecture with three distinct loops. @@ -332,17 +340,17 @@ The watcher loop looks for new incoming blocks and notifies the other loops of t \paragraph*{Analysis} -The analysis loop waits for new blocks and then analyzes the behavior of the blockchain network. If a dangerous reorganization occurs, it is responsible for updating the confirmation delay. +The analysis loop waits for new blocks and then analyzes the behavior of the blockchain network. If a dangerous reorganisation occurs, it is responsible for updating the confirmation delay. \paragraph*{Worker} -The worker loop waits for new blocks or transaction requests (from the Wire Gateway API). When one of these events occurs, it first reconciles the local database with the \ac{DLT}, then triggers requested debits, re-issues blocked debits and bounces malformed credits. +The worker loop waits for new blocks or transaction requests (from the Wire Gateway \ac{API}). When one of these events occurs, it first reconciles the local database with the \ac{DLT}, then triggers requested debits, re-issues blocked debits and bounces malformed credits. \subsection{Worker loop in detail} -\subsubsection*{\acs{DLT} reconciliation} +\subsubsection*{DLT reconciliation} -During a \ac{DLT} reconciliation, we first list all new transactions and any transactions that have been removed in a reorganization since the last reconciliation. If any previously confirmed debits have been removed without being reinserted into another block, we notify the Wire Gateway to cease activity and wait for the next block in hopes of recovering them. All newly confirmed debits and successful credits are registered in the database. +During a \ac{DLT} reconciliation, we first list all new transactions and any transactions that have been removed in a reorganisation since the last reconciliation. If any previously confirmed debits have been removed without being reinserted into another block, we notify the Wire Gateway to cease activity and wait for the next block in hopes of recovering them. All newly confirmed debits and successful credits are registered in the database. \subsubsection*{Reconciliation inconsistency} @@ -358,9 +366,9 @@ Since we know that blockchain-based cryptocurrencies have low throughput, we do \subsection*{Ethereum amount precision} -The Taler amount format comes from RFC 8905 \cite{RFC8905}. It allows up to $2^{53}$ unit and 8 decimal digits. This format is perfectly suited for Bitcoin where the maximal amount is 21 million bitcoins and the minimum amount is the satoshi, one satoshi being worth $10^{8}$ bitcoin. However, the minimum amount of Ethereum is the wei, with one ether being worth $10^{18}$ wei. The amount of ether in circulation continues to grow without a cap, with over 119.000.000 ether in circulation at the time of writing those lines. Therefore, it is not possible to represent all Ethereum amounts with the current format. +The Taler amount format comes from RFC 8905 \cite{RFC8905}. It allows up to $2^{53}$ unit and 8 decimal digits. This format is perfectly suited for Bitcoin where the maximal amount is 21 million bitcoins and the minimum amount is the satoshi, one satoshi being worth $10^{8}$ bitcoin. However, the minimum amount of Ethereum is the wei, with one ether being worth $10^{18}$ wei. The amount of ether in circulation continues to grow without a cap, with over 119,000,000 ether in circulation at the time of writing those lines. Therefore, it is not possible to represent all Ethereum amounts with the current format. -A standard Ethereum transaction requires 21 000 units of gas \footnote{https://ethereum.org/en/developers/docs/gas/\#post-london}. The average gas price is currently around 30 Gwei. Therefore, a standard transaction cost about $63.10^{18}$ wei in transaction fees. Since the transaction fee is so high, even if we truncate Ethereum value to its 8 first decimal, we can still represent any amount you can send whiteout losing money on the transaction fee. +A standard Ethereum transaction requires 21 000 units of gas\footnote{https://ethereum.org/en/developers/docs/gas/\#post-london}. The average gas price is currently around 30 Gwei. Therefore, a standard transaction cost about $63.10^{18}$ wei in transaction fees. Since the transaction fee is so high, even if we truncate Ethereum value to its 8 first decimal, we can still represent any amount you can send whiteout losing money on the transaction fee. \subsection*{Replaceable bitcoin transaction} @@ -368,48 +376,50 @@ When some merchants wanted to allow instant payments with Bitcoin, they chose to This becomes problematic when you want to make a legitimate replacement, to unstuck a transaction by increasing its transaction fee for example. At the same time, it is always dangerous to give an easy way to attackers and scammers to change the content of a pending transaction. -A solution has been adopted in \acs{BIP} 125 \cite{BIP125}. It is now possible to encode the replaceability of a bitcoin transaction when creating it. This allows it to be replaced by a new transaction within certain rules: you cannot send less money to existing recipients and you must pay a replacement fee as a countermeasure to a \acs{DOS} attack. +A solution has been adopted in \ac{BIP} 125 \cite{BIP125}. It is now possible to encode the replaceability of a bitcoin transaction when creating it. This allows it to be replaced by a new transaction within certain rules: you cannot send less money to existing recipients, and you must pay a replacement fee as a countermeasure to a \ac{DOS} attack. \clearpage -\section{Uri packing, a compression side quest} +\section{URI packing, a compression side quest} \subsection*{The need for compact URI} -As discussed previously, storing metadata in blockchain is +As discussed previously in section \ref{metadata}, storing metadata in blockchain is expensive and limited. Therefore, we want our metadata to be as small as possible. \noindent Transactions metadata are composed of three parts: \begin{itemize} \item Version and identity metadata ($\sim$ 1B) - \item Reserve public key or wire transfer id (32B) - \item Base url (debit only, variable) + \item Reserve public key or wire transfer ID (32B) + \item Base URL (debit only, variable) \end{itemize} -The only variable, and so problematic, part is the base url. Those url have some +The only variable, and so problematic, part is the base URL. Those URLs have some property in common, they always use a few different scheme (http or https) and are composed of a domain and a small path. -We would normally encode the url using ASCII, but we known only a few ASCII -character are actually used. +We would normally encode the URL using ASCII, but we knew only a few ASCII characters are actually used, and we can take advantage of that. \subsection*{5 or 11 encoding} Our idea is to encode the most commonly used characters using five bits, and the -remaining characters using eleven bits. As an ASCII character take eights bits -we gain on size if more than half of the characters composing the uri are -encodable using less bits. - -\begin{center} - \begin{tabular}{l l l} - code & meaning \\ +remaining characters using eleven bits. As an ASCII character is seven bits and are commonly encoded using height, +we gain on size if more than half of the characters composing the URI are +encodable using less bits (Table~\ref{table:uri-packing}). You can find the detailed encoding table in appending \ref{5-11}. + +\begin{table}[h] + \centering + \begin{tabular}{ll} + value & encoding \\ \hline 0..30 & common character: a-z . / - \% \\ 30 0..64 & extended character, remaining graphic ascii \\ 31 & end of encoded string \\ \end{tabular} -\end{center} + \caption{URI packing encoding} + \label{table:uri-packing} +\end{table} Using this encoding format on all domains on the majestic-million\footnote{https://majestic.com/reports/majestic-million} @@ -423,22 +433,21 @@ custom format. For example, for bitcoin the maximum amount of data than is accepted in OP\_RETURN is currently 80 bytes, leaving us 47 bytes to store the URI. With our -encoding we can encode in the best case 74 characters instead of 47 which is more -enough for our use case. +encoding we can encode in the best case 74 characters instead of 47 which is more than enough for our use case. \clearpage \section{Taler Wire Gateway HTTP API} -Taler is a modular project where each module communicates through HTTP API. The Wire Gateway API allows the exchange to communicate to wire adaptors. The Wire Gateway module allow Depolymerizer to communicate with Taler exchanges. As the API can be exposed on the Internet it has to be resistant to easy attacks. +Taler is a modular project where each module communicates through \ac{HTTP} \ac{API}. The Wire Gateway \ac{API} allows the exchange to communicate to wire adaptors. The Wire Gateway module allow Depolymerizer to communicate with Taler exchanges. As the \ac{API} can be exposed on the Internet it has to be resistant to easy attacks. \subsection*{HTTP Authentication} -The wire API only supports the Basic HTTP Authentication method and it has to be optional. Making it optional can lead to security issues by misconfiguration. If the default behavior in case of missing configuration is to deactivate authentication, a typo could lead to an exposed API. We made the authentication method configuration mandatory to make its deactivation explicit. +The wire \ac{API} only supports the Basic \ac{HTTP} Authentication method and it has to be optional. Making it optional can lead to security issues by misconfiguration. If the default behavior in case of missing configuration is to deactivate authentication, a typo could lead to an exposed \ac{API}. We made the authentication method configuration mandatory to make its deactivation explicit. \subsection*{OOM DOS} -A common Denial Of Service attack consists of sending many requests with huge bodies to saturate a server memory and, in the worst case, create an Out Of Memory error. To be resilient against such attacks we only read body after request authentication, to prevent any person without authorization to access the API to perform such attacks. +A common Denial Of Service attack consists of sending many requests with huge bodies to saturate a server memory and, in the worst case, create an Out Of Memory error. To be resilient against such attacks we only read body after request authentication, to prevent any person without authorization to access the \ac{API} to perform such attacks. Then we chose an aggressive memory budget of 4kB, as all request bodies should be very small, and we only read and parse them under this budget. In the case of compressed bodies, we also apply this budget to the decompression process to protect ourselves against decompression bombs. @@ -446,7 +455,7 @@ Then we chose an aggressive memory budget of 4kB, as all request bodies should b % Move testing to its own section -The taler exchange has a taler-exchange-wire-gateway-client CLI that allowed me to test that my implementation not only conforms to the API documentation but also with how the official client handles it. I found confusion in the documentation where it was specified that timestamp should consist of time in milliseconds in epoch but the client will reject timestamps that are not rounded to second. +The taler exchange has a taler-exchange-wire-gateway-client CLI that allowed me to test that my implementation not only conforms to the \ac{API} documentation but also with how the official client handles it. I found confusion in the documentation where it was specified that timestamp should consist of time in milliseconds in epoch but the client will reject timestamps that are not rounded to second. \clearpage @@ -469,4 +478,12 @@ There is an opportunity for a more intelligent analysis of network behaviours to \printacronyms +\clearpage + +\appendix + +\section*{5-11 encoding table}\label{5-11} + +\input{tables/5-11.tex} + \end{document} diff --git a/docs/tables/5-11.tex b/docs/tables/5-11.tex @@ -0,0 +1,121 @@ + +\begin{table}[H] + \begin{minipage}{.5\linewidth} + \centering + \begin{tabular}{rl} + value & encoding \\ + \hline + 0 & a \\ + 1 & b \\ + 2 & c \\ + 3 & d \\ + 4 & e \\ + 5 & f \\ + 6 & g \\ + 7 & h \\ + 8 & i \\ + 9 & j \\ + 10 & k \\ + 11 & l \\ + 12 & m \\ + 13 & n \\ + 14 & o \\ + 15 & p \\ + 16 & q \\ + 17 & r \\ + 18 & s \\ + 19 & t \\ + 20 & u \\ + 21 & v \\ + 22 & w \\ + 23 & x \\ + 24 & y \\ + 25 & z \\ + 26 & . \\ + 27 & / \\ + 28 & - \\ + 29 & \% \\ + 30 & EXTENDED \\ + 31 & EOF \\ + \end{tabular} + \caption{5 bit simple encoding} + \end{minipage} + \begin{minipage}{.5\linewidth} + \centering + \begin{tabular}{rl} + value & encoding \\ + \hline + 0 & \_ \\ + 1 & A \\ + 2 & B \\ + 3 & C \\ + 4 & D \\ + 5 & E \\ + 6 & F \\ + 7 & G \\ + 8 & H \\ + 9 & I \\ + 10 & J \\ + 11 & K \\ + 12 & L \\ + 13 & M \\ + 14 & N \\ + 15 & O \\ + 16 & P \\ + 17 & Q \\ + 18 & R \\ + 19 & S \\ + 20 & T \\ + 21 & U \\ + 22 & V \\ + 23 & W \\ + 24 & X \\ + 25 & Y \\ + 26 & Z \\ + 27 & 0 \\ + 28 & 1 \\ + 29 & 2 \\ + 30 & 3 \\ + 31 & 4 \\ + \end{tabular} + \quad + \begin{tabular}{rl} + value & encoding \\ + \hline + 32 & 5 \\ + 33 & 6 \\ + 34 & 7 \\ + 35 & 8 \\ + 36 & 9 \\ + 37 & ! \\ + 38 & " \\ + 39 & \# \\ + 40 & \$ \\ + 41 & \& \\ + 42 & ' \\ + 43 & ( \\ + 44 & ) \\ + 45 & * \\ + 46 & + \\ + 47 & , \\ + 48 & : \\ + 49 & ; \\ + 50 & \textless \\ + 51 & = \\ + 52 & \textgreater \\ + 53 & ? \\ + 54 & @ \\ + 55 & $[$ \\ + 56 & \textbackslash \\ + 57 & $]$ \\ + 58 & \textasciicircum \\ + 59 & $`$ \\ + 60 & \{ \\ + 61 & \textbar \\ + 62 & \} \\ + 63 & \textasciitilde \\ + \end{tabular} + + \caption{11 bit extended encoding} + \end{minipage} +\end{table}