diff options
Diffstat (limited to 'doc/audit/response-202109.tex')
-rw-r--r-- | doc/audit/response-202109.tex | 291 |
1 files changed, 291 insertions, 0 deletions
diff --git a/doc/audit/response-202109.tex b/doc/audit/response-202109.tex new file mode 100644 index 000000000..43ed6a636 --- /dev/null +++ b/doc/audit/response-202109.tex | |||
@@ -0,0 +1,291 @@ | |||
1 | \documentclass[11pt]{article} | ||
2 | \oddsidemargin=0in \evensidemargin=0in | ||
3 | \textwidth=6.2in \textheight=8.7in | ||
4 | %\topmargin=-0.2in | ||
5 | |||
6 | \usepackage[ansinew]{inputenc} | ||
7 | \usepackage{makeidx,amsmath,amssymb,exscale,multicol,epsfig,graphics,url} | ||
8 | |||
9 | \begin{document} | ||
10 | \pagestyle{headings} | ||
11 | \title{Final response to the \\ GNU Taler security audit in Q2/Q3 2020} | ||
12 | \author{Christian Grothoff \and Florian Dold} | ||
13 | |||
14 | \maketitle | ||
15 | |||
16 | \section{Abstract} | ||
17 | |||
18 | This is the response to the source code audit report CodeBlau | ||
19 | created for GNU Taler in Q2/Q3 2020. | ||
20 | |||
21 | \section{Management Summary} | ||
22 | |||
23 | We thank CodeBlau for their detailed report and thorough analysis. We are | ||
24 | particularly impressed that they reported issues against components that were | ||
25 | not even in-scope, and also that they found an {\em interesting} new corner | ||
26 | case we had not previously considered. Finally, we also find several of their | ||
27 | architectural recommendations to strengthen security to be worthwhile, and | ||
28 | while some were already on our long-term roadmap, we will reprioritize our | ||
29 | roadmap given their recommendations. | ||
30 | |||
31 | Given our extensive discussions with CodeBlau, we also have the impression | ||
32 | that they really took the time to understand the system, and look forward | ||
33 | to working with CodeBlau as a competent auditor for GNU Taler in the future. | ||
34 | |||
35 | \section{Issues in the exchange} | ||
36 | |||
37 | We agree with the issues CodeBlau discovered and both parties believe that | ||
38 | they have all been addressed. | ||
39 | |||
40 | \section{Issues in the auditor} | ||
41 | |||
42 | We appreciate CodeBlau's extensive list of checks the Taler auditor performs, | ||
43 | which was previously not documented adequately by us. We agree that the | ||
44 | auditor still needs more comprehensive documentation. | ||
45 | |||
46 | As for issue \#6416, we agree with the analysis. However, the proposed fix | ||
47 | of making the primary key include the denomination would create other problems, | ||
48 | such as the exchange sometimes not having the denomination key (link, refund) | ||
49 | and the code in various places relying on the assumption of the coin's | ||
50 | public key being unique. Furthermore, allowing coin key re-use may validate | ||
51 | a terrible practice. We thus decided it is better to ``fail early'', and | ||
52 | modified the code to check that the coin public key is ``unique'' during | ||
53 | deposit, refresh and recoup and ensured that the exchange returns a proof | ||
54 | of non-uniqueness in case of a violation. The test suite was extended to | ||
55 | cover the corner case. | ||
56 | |||
57 | {\bf Update:} We have now also addressed the (``soft'') exchange online | ||
58 | signing key revocation issue (\#6161) reported originally by CodeBlau. | ||
59 | The auditor now checks for key revocations before recording deposit | ||
60 | confirmations. The impact is very minor, as this will merely prevent | ||
61 | an adversary controlling an exchange online signing key from submitting | ||
62 | false claims to the auditor. | ||
63 | |||
64 | |||
65 | |||
66 | \section{Issues in GNUnet} | ||
67 | |||
68 | We agree with the issues CodeBlau discovered and both parties believe that | ||
69 | they have all been addressed. | ||
70 | |||
71 | \section{General remarks on the code} | ||
72 | |||
73 | We understand that writing the code in another programming language may make | ||
74 | certain checks for the auditor less work to implement. However, our choice of C | ||
75 | is based on the advantages that make it superior to contemporary languages for | ||
76 | our use case: relatively low complexity of the language (compared to C++); | ||
77 | availability of mature compilers, static and dynamic analysis tools; | ||
78 | predictable performance; access to stable and battle-tested libraries; and | ||
79 | future-proofness due to portability to older systems as well as new platforms. | ||
80 | |||
81 | We believe creating a parallel implementation in other languages would provide | ||
82 | advantages, especially with respect to avoiding ``the implementation is the | ||
83 | specification''-style issues. However, given limited resources will not make | ||
84 | this a priority. | ||
85 | |||
86 | We disagree that all modern software development has embraced the idea that | ||
87 | memory errors are to be handled in ways other than terminating or restarting | ||
88 | the process. Many programming languages (Erlang, Java) hardly offer any other | ||
89 | means of handling out-of-memory situations than to terminate the process. We | ||
90 | also insist that Taler {\em does} handle out-of-memory as it does have code | ||
91 | that terminates the process (we do {\em not} simply ignore the return value | ||
92 | from {\tt malloc()} or other allocation functions!). We simply consider that | ||
93 | terminating the process (which is run by a hypervisor that will restart the | ||
94 | service) is the correct way to handle out-of-memory situations. We also have | ||
95 | limits in place that should prevent attackers from causing large amounts of | ||
96 | memory to be consumed, and also have code to automatically preemptively | ||
97 | restart the process to guard against memory exhaustion from memory | ||
98 | fragmentation. Finally, a common problem with abrupt termination may be | ||
99 | corrupted files. However, the code mostly only reads from files and limits | ||
100 | writing to the Postgres database. Hence, there is no possibility of corrupt | ||
101 | files being left behind even in the case of abnormal termination. | ||
102 | |||
103 | |||
104 | \section{More specs and documentation code} | ||
105 | |||
106 | We agree with the recommendation that the documentation should be improved, | ||
107 | and will try to improve it along the lines recommended by CodeBlau. | ||
108 | |||
109 | \section{Protocol change: API for uniformly distributed seeds} | ||
110 | |||
111 | We agree with the suggestion, have made the necessary changes, and both | ||
112 | parties believe that the suggestion has been implemented. | ||
113 | |||
114 | \section{Reduce code complexity} | ||
115 | |||
116 | \subsection{Reduce global variables} | ||
117 | |||
118 | While we do not disagree with the general goal to have few global variables, | ||
119 | we also believe that there are cases where global variables make sense. | ||
120 | |||
121 | We have already tried to minimize the scope of variables. The remaining few | ||
122 | global variables are largely ``read-only'' configuration data. The report does | ||
123 | not point out specific instances that would be particularly beneficial to | ||
124 | eliminate. As we continue to work on the code, we will of course evaluate | ||
125 | whether the removal of a particular global variable would make the code | ||
126 | cleaner. | ||
127 | |||
128 | Also, we want to point out that all global variables we introduce | ||
129 | in the exchange are indicated with a prefix {\tt TEH\_} in the code, so they | ||
130 | are easy to identify as such. | ||
131 | |||
132 | \subsection{Callbacks, type pruning} | ||
133 | |||
134 | We understand that higher order functions in C can be confusing, but this | ||
135 | is also a common pattern to enable code re-use and asynchronous execution | ||
136 | which is essential for network applications. We do not believe that we | ||
137 | use callbacks {\em excessively}. Rewriting the code in another language | ||
138 | may indeed make this part easier to understand, alas would have other | ||
139 | disadvantages as pointed out previously. | ||
140 | |||
141 | {\bf Update:} We introduced additional functions to replace | ||
142 | variadic calls to functions that cannot be type-checked by | ||
143 | the compiler (like libjansson's {\tt json\_pack()}) with | ||
144 | type-safe versions (like the new {\tt GNUNET\_JSON\_PACK()}). | ||
145 | |||
146 | |||
147 | \subsection{Initializing structs with memset} | ||
148 | |||
149 | Using {\tt memset()} first prevents compiler (or valgrind) warnings about | ||
150 | using uninitialized memory, possibly hiding bugs. We also do use struct | ||
151 | initialization in many cases. | ||
152 | |||
153 | The GNUnet-wrappers are generally designed to be ``safer'' or ``stricter'' | ||
154 | variants of the corresponding libc functions, and not merely ``the same''. | ||
155 | Hence we do not believe that renaming {\tt GNUNET\_malloc} is indicated. | ||
156 | |||
157 | The argument that {\tt memset()}ing first makes the code inherently more | ||
158 | obvious also seems fallacious, as it would commonly result in dead stores, | ||
159 | which can confuse developers and produce false-positive warnings from static | ||
160 | analysis tools. | ||
161 | |||
162 | \subsection{NULL pointer handling} | ||
163 | |||
164 | The problem with the ``goto fail'' style error handling is that it rarely | ||
165 | results in specific error handling where diagnostics are created that are | ||
166 | specific to the error. Using this style of programming encourages developers | ||
167 | to create simplistic error handling, which can result in inappropriate error | ||
168 | handling logic and also makes it harder to attribute errors to the specific | ||
169 | cause. | ||
170 | |||
171 | However, we have no prohibition on using this style of error handling either: | ||
172 | if it is appropriate, develpers should make a case-by-case decision as to how | ||
173 | to best handle a specific error. | ||
174 | |||
175 | We have made some first changes to how {\tt GNUNET\_free()} works in response | ||
176 | to the report, and will discuss further changes with the GNUnet development | ||
177 | team. | ||
178 | |||
179 | \subsection{Hidden security assumptions} | ||
180 | |||
181 | We disagree that the assumptions stated are ``hidden'', as (1) the Taler code | ||
182 | has its own checks to warrant that the requirements of the {\tt | ||
183 | GNUNET\_malloc()} API are satisfied (so enforcement is not limited to the | ||
184 | abstraction layer), and (2) the maximum allocation size limit is quite clearly | ||
185 | specified in the GNUnet documentation. Also, the GNUnet-functions are not | ||
186 | merely an abstraction layer for portability, but they provided extended | ||
187 | semantics that we rely upon. So it is not like it is possible to swap this | ||
188 | layer and expect anything to continue to work. | ||
189 | |||
190 | When we use the libjansson library, it is understood that it does not use | ||
191 | the GNUnet operations, and the code is careful about this distinction. | ||
192 | |||
193 | \subsection{Get rid of boolean function arguments} | ||
194 | |||
195 | We agree that this can make the code more readable, and have in some places | ||
196 | already changed the code in this way. | ||
197 | |||
198 | \section{Structural Recommendation} | ||
199 | |||
200 | \subsection{Least privilege} | ||
201 | |||
202 | It is wrong to say that GNU Taler has ``no work done'' on privilege separation. | ||
203 | For example, the {\tt taler-exchange-dbinit} tool is the only tool that requires | ||
204 | CREATE, ALTER and DROP rights on database tables, thus enusring that the ``main'' | ||
205 | process does not need these rights. | ||
206 | |||
207 | We also already had the {\tt taler-exchange-keyup} tool responsible for | ||
208 | initializing keys. In response to the audit, we already changed the GNUnet API | ||
209 | to make sure that tools do not create keys as a side-effect of trying to read | ||
210 | non-existent key files. | ||
211 | |||
212 | {\bf Update:} We have now implemented full privilege separation for access to the online | ||
213 | cryptographic signing keys. Details about the design are documented in the | ||
214 | section ``Exchange crypto helper design'' at \url{https://docs.taler.net/} of | ||
215 | Chapter 12. | ||
216 | |||
217 | {\bf Update:} In doing so, we also added a new type of signing key, the | ||
218 | ``security module'' signing key. This is used by the newly separated ``security | ||
219 | module`` processes to sign the public keys that they guard the private keys | ||
220 | for. The security module signatures are verified by the new | ||
221 | ``taler-exchange-offline`` tool to ensure that even if the {\tt | ||
222 | taler-exchange-httpd} process is compromised, the offline signature tool would | ||
223 | refuse to sign new public keys that do not originate from the security | ||
224 | module(s). The security module public keys can be given in the configuration, | ||
225 | or are learned TOFU-style. | ||
226 | |||
227 | |||
228 | \subsection{File system access} | ||
229 | |||
230 | The auditor helpers actually only read from the file system, only the LaTeX | ||
231 | invocation to compile the final report to PDF inherently needs write | ||
232 | access. We do not predict that we will retool LaTeX. Also, the file system | ||
233 | access is completely uncritical, as the auditor by design runs on a system | ||
234 | that is separate from the production exchange system. | ||
235 | |||
236 | Because that system will not have {\em any} crypto keys (not even the one of | ||
237 | the auditor!), CodeBlau is wrong to assume that reading from or writing to the | ||
238 | file system represents a security threat. | ||
239 | |||
240 | We have started to better document the operational requirements on running the | ||
241 | auditor. | ||
242 | |||
243 | {\bf Update:} On the exchange side, we have now moved additional information | ||
244 | from the file system into the database, in particular information about offline signatures | ||
245 | (including key revocations) and wire fees. This simplifies the deployment and | ||
246 | the interaction with offline key signing mechanism. The remaining disk accesses are for | ||
247 | quite fundamental configuration data (which ports to bind to, configuration to | ||
248 | access the database, etc.), and of course the program logic itself. | ||
249 | |||
250 | {\bf Update:} We have also restructured the configuration such that only | ||
251 | the {\tt taler-exchange-transfer} and {\tt taler-exchange-wirewatch} programs | ||
252 | need to have access to the more sensitive bank account configuration data, | ||
253 | and so that these processes can run as a separate user. | ||
254 | |||
255 | |||
256 | \subsection{Avoid dlopen} | ||
257 | |||
258 | Taler actually uses {\tt ltdlopen()} from GNU libtool, which provides | ||
259 | compiler flags to convert the dynamic linkage into static linkage. For | ||
260 | development, dynamic linkage has many advantages. | ||
261 | |||
262 | We plan to test and document how to build GNU Taler with only static | ||
263 | linkage, and will recommend this style of deployment for the Taler | ||
264 | exchange for production. | ||
265 | |||
266 | \subsection{Reduce reliance on PostgreSQL} | ||
267 | |||
268 | CodeBlau's suggestion to use an append-only transaction logging service in | ||
269 | addition to the PostgreSQL database is a reasonable suggestion for a | ||
270 | production-grade deployment of GNU Taler, as it would allow partial disaster | ||
271 | recovery even in the presence of an attacker that has gained write access to | ||
272 | the exchange's database. | ||
273 | |||
274 | We are currently still investigating whether the transaction logging should be | ||
275 | implemented directly by the exchange service, or via the database's extensible | ||
276 | replication mechanism. Any implementation of such an append-only logging | ||
277 | mechanism must be carefully designed to ensure it does not negatively impact | ||
278 | the exchange's availability and does not interfere with serializability of | ||
279 | database transactions. As such we believe that transaction logging can only be | ||
280 | provided on a best-effort basis. Fortunately, even a best-effort append-only | ||
281 | transaction log would serve to limit the financial damage incurred by the | ||
282 | exchange in an active database compromise scenario. | ||
283 | |||
284 | {\bf Update:} We have tightened the installation instructions for the | ||
285 | Taler exchange to guide users towards a more restricted Postgres setup, | ||
286 | tightening which components of the Exchange need what level of access | ||
287 | to the exchange database. | ||
288 | |||
289 | |||
290 | |||
291 | \end{document} | ||