response-202109.tex (13704B)
1 \documentclass[11pt]{article} 2 \oddsidemargin=0in \evensidemargin=0in 3 \textwidth=6.2in \textheight=8.7in 4 %\topmargin=-0.2in 5 6 \usepackage[ansinew]{inputenc} 7 \usepackage{makeidx,amsmath,amssymb,exscale,multicol,epsfig,graphics,url} 8 9 \begin{document} 10 \pagestyle{headings} 11 \title{Final response to the \\ GNU Taler security audit in Q2/Q3 2020} 12 \author{Christian Grothoff \and Florian Dold} 13 14 \maketitle 15 16 \section{Abstract} 17 18 This is the response to the source code audit report CodeBlau 19 created for GNU Taler in Q2/Q3 2020. 20 21 \section{Management Summary} 22 23 We thank CodeBlau for their detailed report and thorough analysis. We are 24 particularly impressed that they reported issues against components that were 25 not even in-scope, and also that they found an {\em interesting} new corner 26 case we had not previously considered. Finally, we also find several of their 27 architectural recommendations to strengthen security to be worthwhile, and 28 while some were already on our long-term roadmap, we will reprioritize our 29 roadmap given their recommendations. 30 31 Given our extensive discussions with CodeBlau, we also have the impression 32 that they really took the time to understand the system, and look forward 33 to working with CodeBlau as a competent auditor for GNU Taler in the future. 34 35 \section{Issues in the exchange} 36 37 We agree with the issues CodeBlau discovered and both parties believe that 38 they have all been addressed. 39 40 \section{Issues in the auditor} 41 42 We appreciate CodeBlau's extensive list of checks the Taler auditor performs, 43 which was previously not documented adequately by us. We agree that the 44 auditor still needs more comprehensive documentation. 45 46 As for issue \#6416, we agree with the analysis. However, the proposed fix 47 of making the primary key include the denomination would create other problems, 48 such as the exchange sometimes not having the denomination key (link, refund) 49 and the code in various places relying on the assumption of the coin's 50 public key being unique. Furthermore, allowing coin key re-use may validate 51 a terrible practice. We thus decided it is better to ``fail early'', and 52 modified the code to check that the coin public key is ``unique'' during 53 deposit, refresh and recoup and ensured that the exchange returns a proof 54 of non-uniqueness in case of a violation. The test suite was extended to 55 cover the corner case. 56 57 {\bf Update:} We have now also addressed the (``soft'') exchange online 58 signing key revocation issue (\#6161) reported originally by CodeBlau. 59 The auditor now checks for key revocations before recording deposit 60 confirmations. The impact is very minor, as this will merely prevent 61 an adversary controlling an exchange online signing key from submitting 62 false claims to the auditor. 63 64 65 66 \section{Issues in GNUnet} 67 68 We agree with the issues CodeBlau discovered and both parties believe that 69 they have all been addressed. 70 71 \section{General remarks on the code} 72 73 We understand that writing the code in another programming language may make 74 certain checks for the auditor less work to implement. However, our choice of C 75 is based on the advantages that make it superior to contemporary languages for 76 our use case: relatively low complexity of the language (compared to C++); 77 availability of mature compilers, static and dynamic analysis tools; 78 predictable performance; access to stable and battle-tested libraries; and 79 future-proofness due to portability to older systems as well as new platforms. 80 81 We believe creating a parallel implementation in other languages would provide 82 advantages, especially with respect to avoiding ``the implementation is the 83 specification''-style issues. However, given limited resources will not make 84 this a priority. 85 86 We disagree that all modern software development has embraced the idea that 87 memory errors are to be handled in ways other than terminating or restarting 88 the process. Many programming languages (Erlang, Java) hardly offer any other 89 means of handling out-of-memory situations than to terminate the process. We 90 also insist that Taler {\em does} handle out-of-memory as it does have code 91 that terminates the process (we do {\em not} simply ignore the return value 92 from {\tt malloc()} or other allocation functions!). We simply consider that 93 terminating the process (which is run by a hypervisor that will restart the 94 service) is the correct way to handle out-of-memory situations. We also have 95 limits in place that should prevent attackers from causing large amounts of 96 memory to be consumed, and also have code to automatically preemptively 97 restart the process to guard against memory exhaustion from memory 98 fragmentation. Finally, a common problem with abrupt termination may be 99 corrupted files. However, the code mostly only reads from files and limits 100 writing to the Postgres database. Hence, there is no possibility of corrupt 101 files being left behind even in the case of abnormal termination. 102 103 104 \section{More specs and documentation code} 105 106 We agree with the recommendation that the documentation should be improved, 107 and will try to improve it along the lines recommended by CodeBlau. 108 109 \section{Protocol change: API for uniformly distributed seeds} 110 111 We agree with the suggestion, have made the necessary changes, and both 112 parties believe that the suggestion has been implemented. 113 114 \section{Reduce code complexity} 115 116 \subsection{Reduce global variables} 117 118 While we do not disagree with the general goal to have few global variables, 119 we also believe that there are cases where global variables make sense. 120 121 We have already tried to minimize the scope of variables. The remaining few 122 global variables are largely ``read-only'' configuration data. The report does 123 not point out specific instances that would be particularly beneficial to 124 eliminate. As we continue to work on the code, we will of course evaluate 125 whether the removal of a particular global variable would make the code 126 cleaner. 127 128 Also, we want to point out that all global variables we introduce 129 in the exchange are indicated with a prefix {\tt TEH\_} in the code, so they 130 are easy to identify as such. 131 132 \subsection{Callbacks, type pruning} 133 134 We understand that higher order functions in C can be confusing, but this 135 is also a common pattern to enable code re-use and asynchronous execution 136 which is essential for network applications. We do not believe that we 137 use callbacks {\em excessively}. Rewriting the code in another language 138 may indeed make this part easier to understand, alas would have other 139 disadvantages as pointed out previously. 140 141 {\bf Update:} We introduced additional functions to replace 142 variadic calls to functions that cannot be type-checked by 143 the compiler (like libjansson's {\tt json\_pack()}) with 144 type-safe versions (like the new {\tt GNUNET\_JSON\_PACK()}). 145 146 147 \subsection{Initializing structs with memset} 148 149 Using {\tt memset()} first prevents compiler (or valgrind) warnings about 150 using uninitialized memory, possibly hiding bugs. We also do use struct 151 initialization in many cases. 152 153 The GNUnet-wrappers are generally designed to be ``safer'' or ``stricter'' 154 variants of the corresponding libc functions, and not merely ``the same''. 155 Hence we do not believe that renaming {\tt GNUNET\_malloc} is indicated. 156 157 The argument that {\tt memset()}ing first makes the code inherently more 158 obvious also seems fallacious, as it would commonly result in dead stores, 159 which can confuse developers and produce false-positive warnings from static 160 analysis tools. 161 162 \subsection{NULL pointer handling} 163 164 The problem with the ``goto fail'' style error handling is that it rarely 165 results in specific error handling where diagnostics are created that are 166 specific to the error. Using this style of programming encourages developers 167 to create simplistic error handling, which can result in inappropriate error 168 handling logic and also makes it harder to attribute errors to the specific 169 cause. 170 171 However, we have no prohibition on using this style of error handling either: 172 if it is appropriate, develpers should make a case-by-case decision as to how 173 to best handle a specific error. 174 175 We have made some first changes to how {\tt GNUNET\_free()} works in response 176 to the report, and will discuss further changes with the GNUnet development 177 team. 178 179 \subsection{Hidden security assumptions} 180 181 We disagree that the assumptions stated are ``hidden'', as (1) the Taler code 182 has its own checks to warrant that the requirements of the {\tt 183 GNUNET\_malloc()} API are satisfied (so enforcement is not limited to the 184 abstraction layer), and (2) the maximum allocation size limit is quite clearly 185 specified in the GNUnet documentation. Also, the GNUnet-functions are not 186 merely an abstraction layer for portability, but they provided extended 187 semantics that we rely upon. So it is not like it is possible to swap this 188 layer and expect anything to continue to work. 189 190 When we use the libjansson library, it is understood that it does not use 191 the GNUnet operations, and the code is careful about this distinction. 192 193 \subsection{Get rid of boolean function arguments} 194 195 We agree that this can make the code more readable, and have in some places 196 already changed the code in this way. 197 198 \section{Structural Recommendation} 199 200 \subsection{Least privilege} 201 202 It is wrong to say that GNU Taler has ``no work done'' on privilege separation. 203 For example, the {\tt taler-exchange-dbinit} tool is the only tool that requires 204 CREATE, ALTER and DROP rights on database tables, thus enusring that the ``main'' 205 process does not need these rights. 206 207 We also already had the {\tt taler-exchange-keyup} tool responsible for 208 initializing keys. In response to the audit, we already changed the GNUnet API 209 to make sure that tools do not create keys as a side-effect of trying to read 210 non-existent key files. 211 212 {\bf Update:} We have now implemented full privilege separation for access to the online 213 cryptographic signing keys. Details about the design are documented in the 214 section ``Exchange crypto helper design'' at \url{https://docs.taler.net/} of 215 Chapter 12. 216 217 {\bf Update:} In doing so, we also added a new type of signing key, the 218 ``security module'' signing key. This is used by the newly separated ``security 219 module`` processes to sign the public keys that they guard the private keys 220 for. The security module signatures are verified by the new 221 ``taler-exchange-offline`` tool to ensure that even if the {\tt 222 taler-exchange-httpd} process is compromised, the offline signature tool would 223 refuse to sign new public keys that do not originate from the security 224 module(s). The security module public keys can be given in the configuration, 225 or are learned TOFU-style. 226 227 228 \subsection{File system access} 229 230 The auditor helpers actually only read from the file system, only the LaTeX 231 invocation to compile the final report to PDF inherently needs write 232 access. We do not predict that we will retool LaTeX. Also, the file system 233 access is completely uncritical, as the auditor by design runs on a system 234 that is separate from the production exchange system. 235 236 Because that system will not have {\em any} crypto keys (not even the one of 237 the auditor!), CodeBlau is wrong to assume that reading from or writing to the 238 file system represents a security threat. 239 240 We have started to better document the operational requirements on running the 241 auditor. 242 243 {\bf Update:} On the exchange side, we have now moved additional information 244 from the file system into the database, in particular information about offline signatures 245 (including key revocations) and wire fees. This simplifies the deployment and 246 the interaction with offline key signing mechanism. The remaining disk accesses are for 247 quite fundamental configuration data (which ports to bind to, configuration to 248 access the database, etc.), and of course the program logic itself. 249 250 {\bf Update:} We have also restructured the configuration such that only 251 the {\tt taler-exchange-transfer} and {\tt taler-exchange-wirewatch} programs 252 need to have access to the more sensitive bank account configuration data, 253 and so that these processes can run as a separate user. 254 255 256 \subsection{Avoid dlopen} 257 258 Taler actually uses {\tt ltdlopen()} from GNU libtool, which provides 259 compiler flags to convert the dynamic linkage into static linkage. For 260 development, dynamic linkage has many advantages. 261 262 We plan to test and document how to build GNU Taler with only static 263 linkage, and will recommend this style of deployment for the Taler 264 exchange for production. 265 266 \subsection{Reduce reliance on PostgreSQL} 267 268 CodeBlau's suggestion to use an append-only transaction logging service in 269 addition to the PostgreSQL database is a reasonable suggestion for a 270 production-grade deployment of GNU Taler, as it would allow partial disaster 271 recovery even in the presence of an attacker that has gained write access to 272 the exchange's database. 273 274 We are currently still investigating whether the transaction logging should be 275 implemented directly by the exchange service, or via the database's extensible 276 replication mechanism. Any implementation of such an append-only logging 277 mechanism must be carefully designed to ensure it does not negatively impact 278 the exchange's availability and does not interfere with serializability of 279 database transactions. As such we believe that transaction logging can only be 280 provided on a best-effort basis. Fortunately, even a best-effort append-only 281 transaction log would serve to limit the financial damage incurred by the 282 exchange in an active database compromise scenario. 283 284 {\bf Update:} We have tightened the installation instructions for the 285 Taler exchange to guide users towards a more restricted Postgres setup, 286 tightening which components of the Exchange need what level of access 287 to the exchange database. 288 289 290 291 \end{document}