summaryrefslogtreecommitdiff
path: root/consensus/chap.tex
blob: bd050edad34365930ee117c9d7f5ec7253c23947 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479

\chapter[Byzantine Set-Union Consensus]{Byzantine Set-Union Consensus%
\astfootnote{%
The content of this chapter has been previously published in the EURASIP
Journal on Wireless Communications and Networking \cite{dold2017byzantine}.
}
}\label{chapter:consensus}

\section{Introduction}

Byzantine consensus is a fundamental building block for fault-tolerant
distributed systems. It allows a group of peers to reach agreement on
some value, even if a fraction of the peers are controlled by an
active adversary.   Theory-oriented work on
Byzantine consensus often focuses on finding a single agreement on a
binary flag or bit string~\cite{fitzi2006optimally}.  More recent approaches for practical
applications are mainly based on state machine replication (SMR),
wherein peers agree on a sequence of state machine transitions.  State
machine replication makes it relatively easy to lift existing,
non-fault-tolerant services to a Byzantine fault-tolerant
implementation~\cite{castro1999practical}.  Each request from a
client triggers a state transition in the replicated state machine
that provides the service.

A major shortcoming of SMR is that all requests to the service
need to be individually agreed upon in sequence by the replica peers of the
state machine.  This is undesirable since in unoptimized SMR protocols,
such as PBFT~\cite{castro1999practical}, a single
transition requires $O(n^2)$ messages to be exchanged for $n$ replicas. Some
implementations~\cite{kotla2007zyzzyva} try to address this inefficiency by optimistically processing
requests and falling back to individual Byzantine agreements only when
Byzantine behavior is detected.  In practice, this leads to very complex
implementations whose correctness is hard to verify and that have weak progress
guarantees~\cite{clement2009making}.

The canonical example for a service where this inefficiency becomes
apparent is the aggregation of values submitted by clients into a set.
This scenario is relevant for the implementation of secure multiparty
computation protocols such electronic voting~\cite{cramer1997secure}, where ballots must be
collected, and auctions~\cite{bogetoft2009secure}, where bids must be collected.  A direct
implementation that reaches agreement on a set of $m$ elements with SMR requires $m$
sequential agreements, each consisting of $O(n^2)$ messages.


We introduce Byzantine set-union consensus (BSC) as an alternative
communication primitive that allows this aggregation to be implemented
more efficiently.  In order to implement the set aggregation service
described above, the peers first reconcile their sets using an
efficient set reconciliation protocol that is not fault-tolerant but
where the complexity is bounded even in the case of failures.  Then,
they use a variant of ByzConsensus~\cite{ben2010simple} to reach
Byzantine agreement on the union.

% Is is really partially synchronous?
% I guess we can say that here since we're suggesting how
% to extend it to partially synchronous ...
We assume a partially synchronous communication model, where
non-faulty peers are guaranteed to successfully receive values
transmitted by other non-faulty peers within an existing but unknown
finite time bound~\cite{dwork1988consensus}.  Peers communicate over
pairwise channels that are authenticated.  Message delivery is
reliable (i.e., messages arrive uncorrupted and in the right order) but
the receipt of messages may be delayed.  We make the same assumption
as Castro and Liskov~\cite{castro1999practical,castro2002practical}
about this delay, namely that it does not grow faster than some
(usually exponential) function of wall-clock time.  We assume a
computationally unbounded adversary that can corrupt at most $t=\lceil
n/3 \rceil - 1$ peers creating Byzantine faults.  The adversary is
static, that is the set of corrupted peers is fixed before the
protocol starts, but this set is not available to the correct peers.
The actual number of faulty peers is denoted by $f$, with $f \le t$.

The BSC protocol has message complexity $O(mn + n^2)$ when no
peers show Byzantine behavior. When $f$ peers show Byzantine behavior,
the message complexity is $O(mnf+kfn^2)$, where $k$ is the number of
valid set elements exclusively available to the adversary.  We will show how
$k$ can be bounded for common practical applications, since in the
general case $k$ is only bounded by the bandwidth available to the
adversary.  In practice, we expect $kf$ to be significantly smaller
than $m$. Thus, $O(mnf+kfn^2)$ is an improvement over using SMR-PBFT
which would have complexity $O(mn^2)$.


We have created an implementation of the BSC
protocol by combining Ben-Or's protocol for Byzantine
consensus~\cite{ben2010simple} with a bounded variant of
Eppstein's protocol for efficient
set reconciliation~\cite{eppstein2011difference}.
%
We demonstrate the practical applicability of our resulting
abstraction by using BSC to implement distributed key generation,
ballot collection and cooperative decryption from the
Cramer-Gennaro-Schoenmakers remote electronic voting
scheme~\cite{cramer1997secure} in the GNUnet framework.
Our experimental results show that the behavior of the implementation
matches our predictions from theory.

In summary, we make the following contributions in this chapter:
\begin{itemize}
  \item The introduction of Byzantine Set-Union Consensus (BSC) with Byzantine Eppstein Set Reconciliation.
  \item The analysis and proof of correctness of Byzantine Set Union Consensus.
  \item An implementation and experimental evaluation of the protocol.
  \item A discussion of practical applications to Secure Multiparty Computation.
\end{itemize}

%\newpage
\section{Background}

The Byzantine consensus problem \cite{lamport1982byzantine} is a
generalization of the consensus problem where the peers that need
to reach agreement on some value might also exhibit Byzantine faults.

Many specific variants of the agreement problem (such as
interactive consistency~\cite{fischer1981lower}, k-set
consensus~\cite{de2001k}, or leader election \cite{malpani2000leader}
and many others~\cite{fischer1986easy}) exist.  We will focus on the
consensus problem, wherein each peer in a set of peers
$\{P_1,\ldots,P_n\}$ starts with an initial value $v_i \in M$ for an
arbitrary fixed set $M$.  At some point during the execution of the
consensus protocol, each peer irrevocably decides on some output value
$\hat{v}_i \in M$.  Informally, a protocol that solves the consensus problem
must fulfill the following properties:\footnote{Different variations
  and names can be found in the literature.  We have chosen a
  definition that extends to our generalization to sets
  later on.}
\begin{itemize}
\item \emph{Agreement:} If peers $P_i$, $P_j$ are correct, then
  $\tilde v_i = \tilde v_j$.
\item \emph{Termination:} The protocol terminates in a finite number
  of steps.
\item \emph{Validity:} If all correct peers have the same input value
  $v$, then all correct peers decide on $\tilde v=v$.
\end{itemize}

Some definitions of the consensus problem also include \emph{strong
validity}, which requires the value that is agreed upon to be
the initial value of some correct
peer~\cite{neiger1994distributed}.
The consensus protocol
presented in this chapter does not offer strong validity; in fact,
for a set union operation this is not exactly desirable as the
goal is to have all peers agree a union of all of the original
sets, not on some peer's initial subset.


\subsection{The FLP Impossibility Result}

A fundamental theoretical result (often called FLP impossibility for
the initials of the authors) states, informally, that no deterministic
protocol can solve the consensus problem in the asynchronous
communication model, even in the presence of only one crash-fault
\cite{fischer1985impossibility}.

While this result initially seems discouraging, the conditions under
which FLP impossibility holds are quite specific and
subtle~\cite{aguilera2010stumbling}.  There are a number of techniques to avoid
these conditions while still resulting in a useful protocol.  For example:
\begin{itemize}
  \item \emph{Common coins:} Some protocols introduce a shared source
    of randomness that the adversary cannot predict or bias.  This avoids
    the FLP impossibility result, since the protocol is not deterministic anymore.
    In practice, these protocols are very complex and often use variants of
    secret-sharing and weaker forms of Byzantine agreement to implement the
    common coin
    \cite{feldman1988optimalphd,feldman1988optimal,mostefaoui2014signature}.
    Implementing a common coin oracle resilient against an active
    adversary is non-trivial and usually required extra assumptions
    such as a trusted dealer in the startup
    phase~\cite{cachin2005random} or shared
    memory~\cite{aspnes1998lower}.  Recent designs to implement a
    Byzantine fault-tolerant bias-resistant public random generator
    only scale to hundreds of participants and still have relatively
    high failure rates (reported at 0.08\% for and adversary power
    bounded at $\frac{1}{3}$ and 32
    participants)~\cite{cryptoeprint:2016:1067}.
  \item \emph{Failure oracles:}
    Approaches based on unreliable failure detectors
    \cite{guerraoui2000consensus} augment the model with oracles for the
    detection of faulty nodes.  Much care has to be taken not to violate correctness
    of the protocol by classifying too many correct peers as faulty; this is a
    problem present in early systems such as Rampart
    \cite{reiter1995rampart} and SecureRing \cite{kihlstrom1998securering} as
    noted by Castro and Liskov \cite{castro1999practical,castro2002practical}.
    While the theory of failure detectors is quite established for the
    non-Byzantine case, it is not clear whether they are still useful in the
    presence of Byzantine faults.
  \item \emph{Partial synchrony:} A model where a bound on the message delay
    or clock shift exists but is unknown or is known but only holds from an
    unknown future point in time is called partial synchrony.  The FLP result
    does not hold in this model~\cite{dwork1988consensus}.
  \item \emph{Minimal synchrony:} The definition of synchrony used by
    the FLP impossibility result can be split into three types of synchrony:
    Processor synchrony, communication synchrony and and message ordering synchrony.
    Dolev et al.~\cite{dolev1987minimal} show that consensus is still possible
    if only certain subsets of these three synchrony assumptions are fulfilled.
\end{itemize}

This work follows the path of \cite{dwork1988consensus} in relaxing the
full asynchrony assumption behind the FLP impossibility result.


\subsection{Byzantine Consensus in the Partially Synchronous Model}

% See 'Signature-Free Asynchronous Byzantine Consensus ...'
% for a good list of references.

The protocols presented in this chapter operate within the constraints
of the partially synchronous model, where participants have some
approximate information about time.

A fundamental result is that no Byzantine consensus protocol with $n$
peers can support $\lceil n/3 \rceil$ or more Byzantine faults in the
partially synchronous model~\cite{dwork1988consensus}.

Early attempts at implementing Byzantine consensus with state machine
replication are SecureRing~\cite{kihlstrom1998securering} and
Rampart~\cite{reiter1995rampart}.  A popular design in the partially
synchronous model is Castro and Liskov's Practical Byzantine Fault
Tolerance (PBFT)~\cite{castro1999practical,castro2002practical}.  PBFT
uses a leader to coordinate peers (called \emph{replicas} in PBFT
terminology).  When replicas detect that the leader is faulty, they
run a leader-election protocol to appoint a new leader.

PBFT guarantees progress as long as the message delay does not grow
indefinitely for some fixed growth function\footnote{In practice,
  exponential back-off is used.}.  The approach taken by PBFT (and
several derived protocols) has several
problems~\cite{clement2009making}: In practice, malicious participants
are able to slow down the system significantly.  When facing an adversarial
scheduler that violates PBFT's weak synchrony assumption, PBFT can
fail to make progress entirely ~\cite{Miller:2016:HBB:2976749.2978399}.

Some more recent Byzantine state machine replication protocols such as
Q/U~\cite{abd2005fault} or Zyzzyva~\cite{kotla2007zyzzyva} have less
overhead per request since they optimize for the non-Byzantine case.
This comes, however, often at the expense of robustness in the
presence of Byzantine faults~\cite{clement2009making}, not to mention
that correctness proofs for the respective protocols and the
implementation of state machine replication are notoriously
difficult~\cite{aublin2015next}.


\subsection{Gradecast} \label{sec:gradecast}

A key building block for our protocol is Feldman's Gradecast
protocol~\cite{feldman1988optimal}.  In contrast to an unreliable
broadcast, Gradecast provides correctness properties to the
receivers, even if the leader is exhibiting Byzantine faults.

In a Gradecast, a leader $P_L$ broadcasts a message~$m$ among a fixed
set $\mathcal{P} = \{P_1,\dots,P_n\}$ of peers.  For notational
convenience, we assume that $P_L \in \mathcal{P}$.  These are the
communication steps for peer $P_i$:
\begin{enumerate}
  \item LEAD:  If $i = L$, send the input value $v_L$ to $\mathcal{P}$
  \item ECHO:  Send the value received in LEAD to $\mathcal{P}$.
  \item CONFIRM:  If a common value $\overline{v}$ was received at least $n-t$ times
    in round ECHO, send $\overline{v}$ to $\mathcal{P}$.  Otherwise, send nothing.
\end{enumerate}

Afterwards, each peer assigns a confidence value $c_i \in \{0,1,2\}$
that ``grades'' the correctness of the broadcast.  The result is a
graded result tuple $\langle \hat{v}_i,c_i \rangle$ containing the
output value $\hat{v}_i$ and the confidence $c_i$.  The grading is
done with the following rules:

\begin{itemize}
  \item If some $\hat{v}$ was received at least $n-t$ times in CONFIRM,
        output $\langle \hat{v},2 \rangle$.
  \item Otherwise, if some $\hat{v}$ was received at least $t+1$ times in CONFIRM,
        output $\langle \hat{v},1 \rangle$.
  \item Otherwise, output $\langle \bot, 0 \rangle$.  Here, $\bot$ denotes
        a special value that indicates the absence of a meaningful value.
\end{itemize}
\noindent
For the $c_i$, the following correctness properties must hold:

\begin{enumerate}
  \item If $c_i \ge 1$ then $\hat{v}_i = \hat{v}_j$ for correct $P_i$ and $P_j$
  \item If $P_L$ is correct, then $c_i=2$ and $\hat{v}_i=v_L$ for correct $P_i$.
  \item $|c_i - c_j| \le 1$ for correct $P_i$ and $P_j$.
\end{enumerate}

When a correct peer $P_i$ receives a Gradecast
with confidence $2$, it can deduce that all other peers received the
same message, but some other peers might have only received it with a
confidence of $1$.  Receiving a Gradecast with confidence $1$ also
guarantees that all other correct peers received the same message.
However, it indicates that the leader behaved incorrectly.  No
assumption can be made about the confidence of other peers.  Receiving
a Gradecast with confidence $0$ indicates that the leader behaved
incorrectly and, crucially, that all other correct peers \emph{know}
that the leader behaved incorrectly.

A simple counting argument proves that the above protocol satisfies
the three Gradecast properties~\cite{feldman1988optimal}.

\subsection{ByzConsensus} \label{sec:byzconsensus}

ByzConsensus~\cite{ben2010simple} uses Gradecast to implement a
consensus protocol for simple values.  Each peer begins with a
starting value $s^{(1)}_i$ and the list of all $n$ participants
$\mathcal{P}$.  Each peer also starts with an empty blacklist of
corrupted peers. If a peer is ever blacklisted, it is henceforth
excluded from the protocol.  In ByzConsensus, Gradecast is used to
force corrupt peers to either expose themselves as faulty---and
consequently be excluded---by gradecasting a value with low confidence,
or to follow the protocol and allow all peers to reach agreement.

ByzConsensus consists of at most $f+1$ sequentially executed
super-rounds $r \in 1\ldots f+1$ where $f \le t$.  In each
super-round, each peer leads a Gradecast using their candidate value
$s^{(r)}_i$; these $n$ Gradecasts can be executed in parallel.
Leaders where the Gradecast results in a confidence of less than $2$
are put on the blacklist.  Recall that different correct peers might
receive a Gradecast with different confidence; thus, peers do not
necessarily agree on the blacklist.

At the end of each super-round, each peer computes a new candidate
value $s^{(r+1)}_i$ using the value that was received most often from
the Gradecasts with a confidence of as least $1$.  If $s^{(r)}_i$ was
received more than $n-t$ times, then $r = f$ and the next round is the
last round.

If the final candidate value does not receive a majority of at least
$2t+1$ among the $n$ Gradecasts, or if the blacklist has more than $t$
entries, then the protocol failed: either more than $t$ faults
happened or, in the partially synchronous model, correct peers did not
receive a message within the designated round due to the delayed
delivery.

% FIXME: should we mention that this optimization is
% useless for the message complexity in the single-value
% case, and only really works once you make the assumption
% about $k$?
ByzConsensus has message complexity $O(fn^3)$.  While the asymptotic
message complexity is obviously worse than the $O(n^2)$ of PBFT, there
is a way to use set reconciliation to benefit from the parallelism of
the Gradecast rounds and thereby reduce the complexity to $O(fn^2)$.

\subsection{Set Reconciliation}

The goal of set reconciliation is to identify the differences between
two large sets, say $S_a$ and $S_b$, that are stored on two different
machines in a network.  A simple but inefficient solution would be to
transmit the smaller of the two sets, and let the receiver compute
and announce the difference.  Research has thus focused on protocols
that are more efficient than this na\"ive approach with respect to the
amount of data that needs to be communicated when the sets $S_a$ and
$S_b$ are large, but their symmetric difference $S_a \oplus S_b$ is
small.

An early attempt to efficiently reconcile sets~\cite{minsky2003set}
was based on representing sets by their characteristic polynomial over
a finite field.  Conceptually, dividing the characteristic polynomials
of two sets cancels out the common elements, leaving only the set
difference.  The characteristic polynomials are transmitted as a
sequence of sampling points, where the number of sampling points is
proportional to the size of the symmetric difference of the sets $S_a$
and $S_b$.  The number of sampling points can be approximated with an
upper bound, or increased on the fly should a peer be unable to
interpolate a polynomial.  While theoretically elegant, the protocol
is not efficient in practice.  The computational complexity of the
polynomial interpolation grows as $O(|S_a \oplus S_b|^3)$ and uses
rather expensive arithmetic operations over large finite fields.

A practical protocol was first proposed by Eppstein et al. in
2011~\cite{eppstein2011difference}. It is based on invertible Bloom
filters (IBFs), a probabilistic data structure that is related to
Bloom filters~\cite{bloom1970space}, and stratas for difference
estimation.

\subsubsection{Invertible Bloom Filters}

An IBF is a constant-size data structure that supports four basic
operations, {\em insert}, {\em delete}, {\em decode} and {\em
  subtract}.

{\em Insert} and {\em delete} operations are commutative operations
encoding a {\em key} that uniquely identifies a set element, typically
derived from the element via a hash function.

The {\em decode} operation can be used to extract some or all of the
updates, returning the key and the sign of the operation, that is
either {\em insert} or {\em delete}.  Since the data structure uses
constant space, decoding cannot always succeed.  Decoding is a
probabilistic operation that is more likely to succeed when the IBF is
sparse, that is the number of encoded operations (\emph{excluding} the
operations that canceled each other out) is small.  The decoding
process can also be partially successful, if some elements could be
extracted but the remaining IBF is non-empty.  Extracting an update by
decoding an IBF is only possible if the key was recorded only once in
the IBF.  However, storing a deletion or insertion of the same key
twice or more (not counting operations that canceled each other out)
makes both updates impossible to decode.

IBFs of the same size can also be {\em subtracted} from each other.
When subtracting $\mathrm{IBF}_b$ from $\mathrm{IBF}_a$, the resulting
structure $\mathrm{IBF}_c := \mathrm{IBF}_a - \mathrm{IBF}_b$ contains
all insertions and deletions from $\mathrm{IBF}_a$, and insertions
from $\mathrm{IBF}_b$ are recorded as deletions in $\mathrm{IBF}_c$ and deletions from
$\mathrm{IBF}_b$ are recorded as insertions in $\mathrm{IBF}_c$.
Effectively, the IBF subtraction allows to compute the difference
between two sets simply by encoding each set as an IBF using only
insertion operations.

Under the hood, an IBF of size $n$ is an array of $n$ buckets.  Each
bucket holds three values:
\begin{itemize}
  \item A signed counter that handles overflow via wrap-around. Recording an
        insertion or deletion adds $-1$ or $+1$ to the counter, respectively.
  \item An $\oplus$-sum\footnote{The $\oplus$ denotes bit-wise
        exclusive or.}, called the \texttt{keySum}, over the keys that
        identify set the elements that were recorded in the bucket.
  \item An $\oplus$-sum, called the \texttt{keyHashSum}, over a the hash
        $h(\cdot)$ of each key that was recorded in the bucket.
\end{itemize}

As with ordinary Bloom filters, encoding an update in an IBF records
the update in $k$ different buckets of the IBF.  The indices of
buckets that record the update are derived via a $k$ independent hash
functions from the key of the element that is subject of the update.
We write $\mathrm{Pos}(x)$ for the set of array positions that
correspond to the element key $x$.

Before we describe the decoding process, we introduce some
terminology.  A bucket is called a \emph{candidate bucket} if its
counter is $-1$ or $+1$, which might indicate that the \texttt{keySum}
field contains the key of an element that was the subject of an
update.  Candidate buckets that contain the key of an element that was
previously updated are called \emph{pure buckets}.  Candidate buckets
are not necessarily pure buckets, since a candidate bucket could also
result from, for example, first inserting an element key $e_1$ and
then deleting $e_2$ when $\mathrm{Pos}(e_1) \cap \mathrm{Pos}(e_1)
\neq \emptyset$ and $\mathrm{Pos}(e_1) \neq \mathrm{Pos}(e_2)$.

The \texttt{keyHashSum} provides a way to detect if a candidate bucket
is not a pure bucket, namely when $h(\texttt{keySum}) \neq
\texttt{keyHashSum}$.  The probability of classifying an impure bucket
as pure with this method is dependent on the probability of a hash
collision.  Another method to check for an impure candidate bucket
with index $i$ is to check whether $i \notin
\mathrm{Pos}(\texttt{keySum})$.

The decoding process then simply searches for buckets that are, with
high probability, pure.  When the \texttt{count} field of the bucket
is $1$, the key decoding procedure reports the key as ``inserted'' and
exececutes a deletion operation with that key.  When the
\texttt{count} field is $-1$, the key is reported as ``deleted'' and
subsequently an insertion operation is executed.

With a probability that increases with sparser IBFs, decoding one
element may cause one or more other buckets to become pure, allowing
the decoding to be repeated.  If none of the buckets is pure, the IBF
is undecodable, and a larger IBF must be used, or the reconciliation
could fall back to the na\"ive approach of sending the whole set.

The IBF decoding process is particularly suitable for reconciling
large sets with small differences.  When the symmetric difference
between the sets is small enough compared to the size of the IBFs, the
result $\mathrm{IBF}_c$ of the subtraction can be decoded, since the
common elements encoded in $\mathrm{IBF}_a$ and $\mathrm{IBF}_b$
cancel each other out. This makes it possible to obtain the elements
of the symmetric difference, even when the IBFs that represent the
full sets cannot be decoded.

As long as the symmetric difference between the original sets $S_a$
and $S_b$ can be approximated well enough, IBFs can be used for set
reconciliation by encoding $S_a$ in $\mathrm{IBF}_a$ and $S_b$ in
$\mathrm{IBF}_b$.  One of the IBFs is sent over the network, the
$\mathrm{IBF}_c = \mathrm{IBF}_a - \mathrm{IBF}_b$ is computed and
decoded.  Should the decoding (partially) fail, the same procedure is
repeated with larger IBFs.


\subsubsection{Difference Estimation with Stratas}

In order to select the initial size of the IBF appropriately for the
set reconciliation protocol, one needs an estimate of the symmetric
difference between the sets that are being reconciled.  Eppstein et
al.~\cite{eppstein2011difference} describe a simple technique, called
strata estimation, that is accurate for small differences.  While
Eppstein et al. suggest combining the strata estimator, with a
min-wise estimator, which is more accurate for large differences, our
work only requires the strata estimators.

A strata estimator is an array of fixed-size IBFs.  These fixed-size
IBFs are called \emph{strata} since each of them contains a sample of
the whole set, with increased sampling probability towards inner
strata.  Similar to how two IBFs can be subtracted, strata estimators
are subtracted by pairwise subtraction of the IBFs they consist of.

The set difference is estimated by having both peers encode their set
in a strata estimator.  One of the strata estimators is then sent over
to the other peer, which subtracts the strata estimators from each
other.  With every IBF of the strata estimator that results from the
subtraction, a decoding attempt is made. The number of successfully
decoded elements in each stratum allows an estimate to be made on the
set difference, which is then used to determine the size of the IBF
for the actual set reconciliation.


% FIXME: This entire section needs to be polished.
\section{Our Approach}

%\subsection{Assumptions about inputs}
We now describe how to combine the previous approaches into a protocol
for Byzantine fault-tolerant set consensus.  The goal of the adversary
is to sabotage timely consensus among correct peers, e.g., by increasing
message complexity or forcing timeouts.

A major difficulty with agreeing on a set of elements as a whole is
that malicious peers can initially withhold elements from the correct
peers and later send them only to a subset of the correct peers.  This
could possibly happen at a time when it is too late to reconcile the
remaining difference caused by distributing these elements.  We assume
that the number of these elements that are initially known to the
adversary but not to all correct peers is bounded by $k$, where $k$
exists but is not necessarily known to the correct participants.

\subsection{Definition}

We now give a definition of set-union consensus that is motivated by
practical applications to secure multiparty computation protocols such
as electronic voting, which are discussed in more detail in
Section~\ref{sec:applications}.

Consider a set of $n$ peers $\mathcal{P} = \{P_1,\ldots,P_n\}$.  Fix some
(possibly infinite) universe $M$ of elements that can be represented by a
bit string.  Each peer $P_i$ has an initial set $S^{(0)}_i \subseteq M$.

Let $R: {\cal P}(M) \rightarrow {\cal P}(M)$ be an idempotent function that canonicalizes subsets of $M$
by replacing  multiple conflicting elements with the lexically smallest element
in the conflict set and removes invalid elements.  What is considered conflicting or
invalid is application-specific.
During the execution of the set-union consensus protocol, after finite time
each peer $P_i$ irrevocably commits to a set $S_i$ such that:
\begin{enumerate}
  \item For any pair of correct peers $P_i$, $P_j$ it holds that $S_i = S_j$.
  \item If $P_i$ is correct and $e \in S^{0}_i$ then $e \in S_i$.
  \item The set $S_i$ is canonical, that is $S_i = R(S_i)$.
\end{enumerate}

The canonicalization function allows us to set an upper bound on the
number of elements that can simultaneously be in a set.  For example,
in electronic voting, canonicalization would remove malformed ballots
and combine multiple different (encrypted) ballots submitted by the
same voter into a single ``invalid'' ballot for that voter.

\subsection{Byzantine Set-Union Consensus (BSC) Protocol}

Recall that every peer $P_i$, $0 < i \le n$ starts with a set
$S^{(0)}_i$.  The BSC protocol incorporates two subprotocols, bounded
set reconciliation and lower bound agreement, and uses those
to realize an efficient Byzantine fault-tolerant variant of
ByzConsensus.  An existing generalization of IBFs to multi-party set
reconciliation~\cite{mitzenmacher2013simple} based on network coding
is not applicable to this problem, as it requires trusted
intermediaries.

The basic problem solved by the two subprotocols is bounding the cost
of Eppstein's set reconciliation.  Given a set size difference between
two peers of $k$, the expected cost of Eppstein's set reconciliation is
$O(k)$ if both participants are honest.  However, we need to ensure
that malicious peers cannot generally raise the complexity to
$O(m)$ where $m$ is the size of the union.

For this, we use a bounded variant of Eppstein's set reconciliation
protocol, which is given a lower bound ${\cal L}$ on the size of the
set of elements shared by all honest participants.  Given such a lower
bound, the bounded set reconciliation protocol must detect
faulty participants in $O(k + (m - {\cal L}))$.  We note that for
${\cal L} = 0$, the bounded set reconciliation is still
allowed to cost $O(m)$.


\subsubsection{Bounded Set Reconciliation} \label{sec:boundedeppstein}

In {\em bounded} set reconciliation we are thus concerned
with modifications that ensure that a set reconciliation step between
an honest and a faulty peer either succeeds after $O(k)$ traffic, or
aborts notifying the honest peer that the other peer is faulty.  While
we use probabilities to detect faulty behavior, we note that suitable
parameters can be used to ensure that false-positives are rare, say
$1:2^{128}$, and thus as unlikely as successful brute-force attacks
against canonical cryptographic primitives, which BSC also assumes to
be safe.

To begin with, to bound the complexity of Eppstein set reconciliation
one needs to bound the number of iterations the protocol performs.
Assuming honest peers, the initial strata estimation ensures that the
IBFs will decode with high probability, resulting in Eppstein's claim
of single-round complexity.  Given aggressive choices of the
parameters to improve the balance between round-trips and bandwidth
consumption, decoding failures can happen with non-negligible
probability in practice. In this case, the process can simply be
restarted using a different set of hash functions and an IBF doubled
in size.  This addresses issues caused by conservative choices for IBF
sizes that optimize for the average case.  What is critical is that
the probability of such failures remains small enough that after if
the number of rounds exceeds some constant, we can assert faulty
behavior and overall remain within the $O(k)$ bound assuming individual
rounds are bounded by $O(k)$.

Another problem with Eppstein's original protocol related to
aggressive parameter choices is that iterative decoding does not
always have to end with an empty or an undecodable IBF.  Specifically,
the decoding step can sometimes decode a key that was never added to
the IBF, simply because the two purity checks are also probabilistic.
This is usually not an issue, as when a decoder requests the
transmission of the element corresponding to improperly decoded key,
the presumed element's owner can indicate a decoding failure at that
time.  Here, another round of the protocol is unlikely to produce the
same error and would again fix the problem. However, given reasonably
short strings for the \texttt{hashKeySum}, it is actually even
possible to obtain a looping IBF that spawns an infinite series of
``successfully'' decoded keys.  Here, the implementor has to be
careful to ensure that the iterated decoding algorithm terminates.
Instead of mandating an excessively long \texttt{hashKeySum} to
prevent this, it is in practice more efficient to handle this case by
stopping the iteration and reporting the IBF as undecodable when the
number of decoded keys exceeds a threshold proportional to the size of
the IBF.

We also need to consider the bandwidth consumption of an individual
round.  To cause more than $O(k)$ traffic, a malicious peer could
produce strata that result in a huge initial symmetric difference.  In
this case, the initial size of the IBF may exceed $O(k)$.  We address
this problem by not permitting the use of Eppstein's method if the
symmetric difference definitively exceeds $\frac{n - {\cal L}}{2}$,
where $n$ is the smaller of the two set sizes.\footnote{The optimal
  formula here depends on the size ratio of IBF element to the
  transmission size of an individual element and the estimated size of
  the set overlap.  However, to simplify the exposition, we will
  assume a simple 50\% threshold henceforth.}  Instead, once the
estimate of the symmetric difference substantially exceeds this
threshold, the reconciliation algorithm falls back to sending the
complete set.  As this creates $O(m)$ traffic, it must only be allowed
under certain conditions.

First, we consider the case where the honest peer has the larger set.
Here, the honest peer $P_i$ will only send its full set if the set
difference is no larger than $|S_i| - {\cal L}$, and otherwise report
a fault.  This ensures that a malicious peer cannot arbitrarily
request the full set from honest peers.

Second, we consider the case where the honest peer $P_i$ is facing a
faulty peer that claims to have a huge set.  This is can happen either
directly from the strata estimator, or after $P_i$ observes a constant
number of successive IBF decoding failures.\footnote{Each failure
  causes the IBF size to double and thus corresponds to a doubling of
  the set difference estimate. Thus, the number of decoding failures
  could remain the threshold that causes an abort, while the set
  difference estimate substantially exceeds $2 (|S_i| - {\cal L})$.}
At this point, instead of passively accepting the transmission of
elements, the receiver $P_i$ checks that a sufficient number of the
elements received are not in $S_i$.  Let $R$ be the stream of elements
$e$ received at any point in time.  We assume that the sender is
required to transmit the elements in randomized order.  Thus, if $|R
\cap S_i| - |R \setminus S_i| \ge 128$, $P_i$ can determine that the
sender is faulty with probability $2^{128}:1$, as the
$\frac{n}{2}$-threshold for converting to complete set transmission
ensures that for an honest sender less than half of the elements would
be in $S_i$.

Finally, we note that the individual {\em insert}, {\em delete}, {\em
  decode} and {\em subtract} operations on the IBF are all constant
time and that IBFs are also constant size.  Thus, given a constant
number of rounds and a bound on the bandwidth per round, we have
implicitly assured that memory and CPU consumption of the bounded
set reconciliation is also $O(k + (m - {\cal L}))$.


\subsubsection{Lower Bound Agreement}\label{sec:lower-bound}

To provide a lower bound on the permissable set size for set
reconciliation, BSC first executes a protocol for \emph{lower bound
  agreement} (LBA).  In this first step, every correct peer $P_i$
learns a superset $S^{(1)}_i$ of the union of all correct peers'
initial sets, as well as a lower bound $\ell_i$ for the minimum number
of elements shared by all correct peers where $n - \ell_i \le k$.
Note that neither $S^{(1)}_i = S^{(1)}_j$ nor $\ell_i = \ell_j$
necessarily hold even for correct peers $P_i$ and $P_j$.  Our LBA
protocol proceeds in three steps:
%\begin{enumerate}[(i)]
\begin{enumerate}[label=(\roman*)]
  \item All peers reconcile their initial set with each other,
    using pairwise bounded set reconciliation using
    a lower bound of ${\cal L} = 0$.
  \item All peers send their current set size to each other,
    and each peer $P_i$ sets sets $\ell_i$ to the $(t+1)$-smallest set size
    that $P_i$ received.
  \item All peers again reconcile their sets with each other,
    using pairwise bounded set reconciliation.
\end{enumerate}

The third step is necessary to ensure that every correct $P_i$ has at
least $\ell_i$ elements, since malicious peers could use the $k$
elements initially withheld to force an honest peer's set size below the
$(t+1)$-smallest set size.  Thanks to the repetition even if $\ell_i$
is different for each peer, it is guaranteed that $P_i$ has at least
$\ell_i$ elements in common with every other good peer.

% FIXME: maybe have a small example?
% Alice's lower commonality bound is 100.
% Alice has 110 elements.
% Mallory asks for 10 elements, and Alice sends them.
% When Mallory asks for 1 more element, Alice knows that
% something's wrong:  Mallory has
%   100 elements common with alice (by \ell bound)
%   +10 elements common with alice (by Alice sending it)
% So there's no way Mallory could be correct ...

In subsequent set reconciliations, $\ell_i$ can be used to bound the
traffic that malicious peers are able to cause by falsely claiming to
have a large number of elements missing.  LBA itself has complexity
$O(nmf)$: initially all malicious peers can {\em once} claim to have
empty sets with all other peers.  LBA ensures that for the remainder
of the protocol, a correct peer with $m_i$ elements can stop sending
elements to malicious peer $P_M$ after $P_M$ requested $m_i - \ell_i
\le k$ elements by reducing the complexity of bounded
set reconciliation with peer $m_i$ to $O(k)$ using ${\cal L} = \ell_i$.


\subsubsection{Exact Set Agreement} \label{sec:complexity}

After LBA, an \emph{exact set agreement} is executed, where all peers
reach Byzantine agreement for a super-set of the set reached in LBA.
The exact set agreement is implemented by executing a variant of
ByzConsensus which instead of sending values reconciles sets.

The Gradecast is adapted as follows:
\begin{enumerate}[label=(\roman*)]
  \item LEAD:  If $i=L$, reconcile the input set $V_L$ with $\mathcal{P}$.
  \item ECHO:  Reconcile the set received in LEAD with $\mathcal{P}$.
  \item CONFIRM:  Let ${\cal U}_E$ be the union
    of all sets received in the ECHO round, and $N_E(e)$ the number
    of times a single set element $e$ was received.

    If $\bigvee_{e \in {{\cal U}_E}} t < N_E(e) < n-t$, send $\bot$ (where $\bot \ne \emptyset$).
    Otherwise send ${\cal U}_E - \{e \mid N_E(e) \le t\}$ to $\mathcal{P}$.
\end{enumerate}
\noindent
The grading rules are also adapted to sets.  Let ${\cal U}_C$ be the union of
sets received in CONFIRM, $N^+_C(e)$ the number of times a
single element $e \in {\cal U}_C$ was received, and $N^-_C(e)$ the number of sets (not
$\bot$) received in CONFIRM that excluded $e$.

\begin{itemize}
\item If $\bigwedge_{e \in {\cal U}} N_C^+(e) \ge n-t \lor N_C^-(e) \ge n-t$, \\
      output $\langle \{e \mid N_C^+(e) \ge n-t\},2 \rangle$.
 \item Otherwise
       if $\bigwedge_{e \in {\cal U}_C} N_C^+(e) > t \land N_C^+(e) \ge N_C^-(e)$ \\
       or $\bigwedge_{e \in {\cal U}_C} N_C^-(e) > t \land N_C^-(e) > N_C^+(e)$, \\
       output $\langle \{e \mid N_C^+(e) > t \land N_C^+(e) \ge N_C^-(e)\},1 \rangle$.
  \item Otherwise, output $\langle \bot, 0 \rangle$.
\end{itemize}

Similar to ByzConsensus, the BSC consists of at most $f+1$ super-rounds,
where $f \le t$.
Each peer $P_i$ starts with $S^{(1)}_i$ as its
current set.  In sequential super-rounds, all peers lead a Gradecast for
their candidate set.  Like in ByzConsensus, if $P_i$ receives a
Gradecast with a confidence value that is not $2$, then $P_i$ puts the
leader of the Gradecast on its blacklist, and correct peers stop all
communictation with peers on their blacklist.

At the end of each super-round, peers update their candidate set as
follows.  Let $n'$ be the number of leaders that gradecasted a set
with a non-zero confidence.  The new candidate set contains all set
elements that were included in at least $\lceil n' / 2 \rceil$ sets
that were gradecasted with a non-zero confidence value.  If all
elements occur with a $(n-t)$-majority, then the next
round is the last round.  The output of the consensus protocol is the
candidate set after the last round---or failure if $f > t$.


We give a correctness proof that generalizes Feldman's proof for
Gradecast of single values~\cite[Section 4.1]{feldman1988optimalphd}.

\begin{lemma}\label{lem:main}
  If two correct peers send sets $A \ne \bot$ and $B \ne \bot$
  respectively in CONFIRM, then $A=B$.
\end{lemma}
\begin{proof}
  Proof by contradiction and counting argument.
  Assume w.l.o.g. that $e\in A$ and $e \notin B$.
  At least $n-t$ peers must have echoed a set that includes $e$ to the first peer.
  Suppose $f$ of these peers were faulty, then at least $n-t-f > t$ good peers included $e$
  in the ECHO transmission to the second peer. If $e \notin B$, then $t < N_E(e) < n-t$.
  In this case, an honest second peer must output $B = \bot$. Contradiction.
\end{proof}


\begin{theorem}
The generalization of Gradecast to sets satisfies the three Gradecast properties.
\end{theorem}
\begin{proof}
We show that each property holds:
\begin{itemize}
\item Property 1 (If $c_i, c_j \ge 1$ then $\hat{V}_i = \hat{V}_j$ for correct $P_i$ and $P_j$):
    Assume w.l.o.g. that $e \in \hat{V}_i \setminus \hat{V}_j$.

    For $e \in \hat{V}_j$, $P_i$ must have received $e$ at least $N_C^+(e) > t$ times in
    CONFIRM.  Given $f \le t$ failures, at least one honest peer must
    thus have included $e$ in CONFIRM.  According to
    Lemma~\ref{lem:main}, then all $n-f$ honest peers must either include
    $e$ in CONFIRM or send $\bot$.

    Because $\bot$ is not a set, this leaves at most all $f \le t$ faulty peers that can send a set without $e$.
    But for $e \notin \hat{V}_j$ we need $N^-_C(e) \ge t + 1$.  Contradiction.
  \item Property 2 (If $P_L$ is correct, then $c_i=2$ and $\hat{V}_i=\hat{V}_L$ for correct $P_i)$:
    All $n -f \ge n - t$ good peers ECHO and CONFIRM the same set.  By the grading rules,
    they must output a confidence of 2.
  \item Property 3 ($|c_i - c_j| \le 1$ for correct $P_i$ and $P_j$):
    Proof by contradiction. Assume w.l.o.g. $c_i = 2$ and $c_j = 0$.
    $c_i = 2$ implies that for each $x \in \hat{V}_i$ at least $n-t$
    peers (and thus $(n-t)-f \ge t+1$ correct peers) must have sent a set in
    CONFIRM that includes $x$.
    For any $y \notin \hat{V}_i$,  $n-t$ peers (and thus $(n-t)-f \ge t+1$ correct peers) must have sent a non-$\bot$ set in
    CONFIRM that excludes $y$.

    Given $c_j = 0$, there must have been an element $e$ such that,
    $N_C^+(e) \le t$ and $N_C^-(e) \le t$ for $P_j$.
    However, we just derived that for all elements either
    $N_C^+(e) > t$ or $N_C^-(e) > t$.  Contradiction. \qed
\end{itemize}
\end{proof}

Given the Gradecast properties for sets, the correctness argument
given by Ben-Or~\cite{ben2010simple} for the Byzantine consensus
applies to BSC's generalization to sets.

As described, the protocol has complexity $O(mnf +
fkn^3)$. However, the $n$ parallel set reconciliation rounds in
each super-round can be combined by tagging the set elements that are
being reconciled in the LEAD, ECHO and CONFIRM rounds with the
respective leader $L$.  Because LBA (via $n-\ell_i \le k$) and bounded
set reconciliation limit mischief for the combined
super-round, each malicious peer can, as leader, {\em once} cause
bounded set reconciliation during the ECHO round to all-to-all
transmit at most $k$ extra elements, resulting in a total of $O(fkn^2)$
extra traffic over all $f+1$ rounds.  Before exposing
themselves this way, non-leading malicious peers can only cause
$O(f^2kn)$ additional traffic during all ECHO rounds.
Finally, malicious peers can also cause at most $O(fkn^2)$ traffic in
the CONFIRM round.  Thus, BSC has overall message complexity of $O(mnf
+ fkn^2)$.


\section{Implementation}

We implemented the BSC protocol in the SET and CONSENSUS services of
GNUnet~\cite{gnunet-www}.

\subsection{The GNUnet Framework}

GNUnet is composed of various components that run in separate
operating system processes and communicate via message passing.
Components that expose an interface to other components are called
\emph{services} in GNUnet.  The main service used by our
implementation is the CADET service, which offers pairwise
authenticated end-to-end encryption between all participants.  CADET
uses a variation of the Axolotl public key ratcheting scheme and
double-encrypts using both TwoFish and AES~\cite{polot2014cadet}. The
resulting encryption is relatively expensive compared to the other
operations, and thus dominates in terms of CPU consumption for the
experiments.

\subsection{Set Reconciliation}

Bounded set reconciliation is implemented in the SET service.
The SET service provides a generic interface for set operations
between two peers; the operations currently implemented are the
IBF-based set reconciliation and set
intersection~\cite{tarkoma2012theory}.

In addition to the operation-specific protocols, the following aspects
are handled generically (i.e., independent of the specific remote set
operation) in the SET service:
\begin{description}
\item[Local set operations] \ \\
    Applications need to create sets and
    perform actions (iteration, insertion, deletion) on them locally.
  \item[Concurrent modifications] \ \\
    While a local set is in use in a network
    operation, the application may still continue to mutate that set.
    To allow this without interfering with concurrent the network operations,
    changes are versioned.  A network operation only sees the state of a
    set at the time the operation was started.
  \item[Lazy copying] \ \\
    Some applications building on the SET
    service---especially the CONSENSUS service described in
    the next section---manage many local sets that are large but only
    differ in a few elements.  We optimize for this case by providing
    a lazy copy operation that returns a logical copy of the set
    without duplicating the sets in memory.
  \item[Negotiating remote operations] \ \\
    In a network operation, the
    involved peers have one of two roles: The acceptor, which waits
    for remote operation requests and accepts or rejects them, as well
    as the initiator, which sends the request.
\end{description}

Our implementation estimates the initial difference between sets only
using {\em strata estimators} as described by
Eppstein~\cite{eppstein2011difference}.  However, we compress the
strata estimator---which is 60KB uncompressed---using {\texttt{gzip}}.  The
compression is highly effective at reducing bandwidth consumption due
to the high probability of long runs of zeros or ones in the most
sparse or most dense strata respectively.

We also use a \emph{salt} when deriving the bucket indices from the element
keys. When the decoding of an IBF fails, the IBF size is doubled and the salt
is changed.  This prevents decoding failures in scenarios where keys map to the
same bucket indices even modulo a power of two, where doubling the size of the
IBF does not remove the collision.


\subsection{Set-Union Consensus}

To keep the description of the set-union consensus protocol in the
previous section succinct, we merely stated that peers efficiently
transmit sets using the reconciliation protocol.  However, given that
the receiving peer has usually many sets to reconcile against, an
implementation needs to be careful to ensure that it scales to large
sets as intended.

The key goal is to avoid duplicating full sets and to instead focus on
the differences.  New sets usually differ in only a few elements, thus
our implementation avoids copying entire sets.  Instead, in the leader
round we just store the set of differences with a reference to the
original set.  In the ECHO and CONFIRM round, we also reconcile with
respect to the set we received from the leader, and not a peer's
current set.  In the ECHO round, we only store one set and annotate
each element to indicate which peer included or excluded that element.
This also allows for a rather efficient computation of the set to
determine the $\bot$-result in the CONFIRM round.

\subsection{Evaluating Malicious Behavior}

For the evaluation, our CONSENSUS service can be configured to
exhibit the following types of adversarial behavior:
\begin{itemize}
  \item \textsl{SpamAlways: }  A malicious peer adds a
    constant number of additional elements in every reconciliation.
  \item \textsl{SpamLeader: } A malicious peer adds a
    constant number of additional elements in reconciliations
    where the peer is the leader.
  \item \textsl{SpamEcho: }  A malicious peer adds a
    constant number of additional elements in echo rounds.
  \item \textsl{Idle: } Malicious peers do not participate actively in
    the protocol, which amounts to a crash fault from the start of the
    protocol.  This type of behavior is not interesting for the
    evaluation, but used to test the implementation with regards to
    timeouts and majority counting.
\end{itemize}

For the \textsl{Spam-*} behaviors, two different variations are
implemented.  One of them (``*-replace'') always generates new
elements for every reconciliation.  This is not typical for real
applications where the number of stuffable elements ought to be
limited by set canonicalization. However, this shows the performance
impact in the worst case.  The other variation (``*-noreplace'')
reuses the same set of additional elements for all reconciliations,
which is more realistic for most cases.  We did not implement
adversarial behaviour where elements are elided, since the resulting
traffic is the same as for additional elements, and memory usage would
only be reduced.


\section{Experimental Results}

All of the experiments were run on a single machine with a 24-core
2.30GHz Intel Xeon E5-2630 CPU, and GNUnet SVN revision 36765.  We
used the \texttt{gnunet-\allowbreak{}consensus-\allowbreak{}profiler} tool, which is based on
GNUnet's TESTBED service~\cite{totakura2013large}, to configure and
launch multiple peers on the target system.  We configured the
profiler to emulate a network of peers connected in a clique topology
(via loopback, without artificial latency).
%The experimental results were generated with the
%\texttt{gnunet-consensus-profiler} tool, which uses GNUnet's
%TESTBED service~\cite{totakura2013large} to execute a set
%union operation via loopback on one GNUnet peer.
% FIXED: ``one peer''?  One host? Or did you share core/cadet?
% I think everything after ``via'' in the sentence above
% should be cut / replaced to clarify that we used peers connected via loopback.
% Also, rephrase to clarify that this part does not JUST
% apply to the set union, but also the consensus below.
%The profiler
%emulates a netwok of GNUnet peers connected in a clique.
Elements for the set operations
are randomly generated and always 64 bytes large.

Bandwidth consumption was measured using the statistics that
GNUnet's CADET service~\cite{polot2014cadet} provides.
Processor time was measured using GNUnet's resource reporting
functionality, which uses the \texttt{wait3} system call for that
purpose.


\subsection{Bounded Set Reconciliation}

We now summarize the experimental results for the bounded
set reconciliation protocol between two peers.   We first
measured the behavior of the set reconciliation if identical sets were
given to both peers (Figure~\ref{fig:set-exp1-cpu}
and~\ref{fig:set-exp1-traffic}).  Figure~\ref{fig:set-exp1-cpu} shows
that total CPU utilization generally grows slowly as the set size
increases.  The sudden jump in processing time that is visible at
around 7,000 elements can most likely be explained by cache effects.
The effect could not be observed when we ran the experiment under
profiling tools.

\begin{figure}
  \includegraphics[width=\textwidth]{set-exp1-cpu.pdf}
  \caption[CPU time for the SET service in relation to set size.]{CPU system time for the SET service in relation to total set size.
    Average over 50 executions.}
  \label{fig:set-exp1-cpu}
\end{figure}

\begin{figure}
  \includegraphics[width=\textwidth]{set-exp1-traffic.pdf}
  \caption[CADET traffic for the SET service in relation to set size.]{CADET traffic for the SET service in relation to total set size.
    Average over 50 executions.}
  \label{fig:set-exp1-traffic}
\end{figure}

\begin{figure}
  \includegraphics[width=\textwidth]{set-exp2-cpu.pdf}
  \caption[CPU time for the SET service in relation to set difference.]{CPU system time for the SET service in relation to symmetric set difference.
    Average over 50 executions.}
  \label{fig:set-delta-cpu}
\end{figure}

\begin{figure}
  \includegraphics[width=\textwidth]{set-exp2-traffic.pdf}
  \caption[CADET traffic for the set service in relation to set difference.]{CADET traffic for the SET service in relation to symmetric difference.
    Average over 50 executions.}
  \label{fig:set-delta-bandwidth}
\end{figure}

Figure~\ref{fig:set-exp1-traffic} shows that bandwidth consumption
does not grow linearly with the total set size, as long as the set
size difference between the two peers is small.  The logarithmic
increase of the traffic with larger sets can be explained by the
compression of strata estimators: The $k$-th strata samples the set
with probability $2^{-k}$, and for small input sets the strata tends
to contain long runs of zeros that are more easily compressed.

We also measured the behavior of the set reconciliation implementation
if the sets differed.  Figure~\ref{fig:set-delta-cpu}
and~\ref{fig:set-delta-bandwidth} show that---as expected---CPU time
and bandwidth do grow linearly with the symmetric difference between
the two sets.

% FIXME: Fix the code to avoid/reduce the issue for small differences.
%Closer analysis of the data (not shown here due to space limitations)
%suggests that our difference estimator tends to underestimate the
%difference for larger symmetric differences.  The estimation could be
%improved according to Eppstein et al. by combining strata estimation
%with MinWise difference estimators~\cite{li2011theory}, which are more
%accurate for larger differences but less accurate for smaller ones.

Finally, we analyzed what happens when the algorithm
switches from transmitting set differences to full sets.
Figure~\ref{fig:set-traffic-switch} shows the bandwidth in relation to
the symmetric set difference, for different total numbers of elements
in the shared set.  Up to the threshold where the algorithm
switches from IBFs to full set transmission, we expect the
transmission size to grow steeply, and then afterwards continue
linearly at a lower rate again.  If the handover threshold is chosen
well, the two lines should meet.  This is the case in the dashed
curve in Figure~\ref{fig:set-traffic-switch}.
The small bump at a set difference of $\approx$ 800 is due
to an unlucky size estimate by the strata estimator causing the
algorithm to initially attempt set reconciliation, before switching to
full set transmission.
If the threshold between IBF and full set
transmission is picked a bit too high and IBFs are sent slightly
beyond the point where they are beneficial, the curve from the IBF
transmission will peak above the one that represents the full set
transmission.  This is the case in the solid curve in
Figure~\ref{fig:set-traffic-switch}.  Finally, the dotted curve shows
the case where the threshold is picked too low, causing expensive full
set transmission to occur when IBFs would have been more useful. Here,
we also see a lucky case of underestimating the size of the difference.
We note that given the size of an IBF entry, the average size of a set
element and an estimate of the size overlap, near-perfect thresholds
(instead of the 50\%-heuristic we described earlier) can be trivially
computed.

\begin{figure}
  \includegraphics[width=\textwidth]{set-exp2-traffic-by-size-2.pdf}
  \caption[CADET traffic for the SET service at boundary for full transmission.]{%
    CADET traffic for the SET service in relation to symmetric difference
    at the boundary between IBF and full set transmission.  Note that we
    did cherry-pick runs for this graph.  Our goal is to illustrate how the
    curves evolve with regard to different thresholds
    between IBF and full set transmission.  We also wanted to show how
    significant deviations in set difference estimates generated by the
    strata estimator can have a minor impact on performance.}
    % Average over five executions.
  \label{fig:set-traffic-switch}
\end{figure}

\subsection{Byzantine Set Consensus}

For our experiments with the BSC implementation, all ordinary peers
start with the same set of elements; different sets would only affect
the all-to-all union phase of the protocol which does pairwise set
reconciliation, resulting in increased bandwidth and CPU consumption
proportional to the set difference as shown in the previous section.

\begin{figure}
  \includegraphics[width=\textwidth]{consensus-baseline-traffic.pdf}
  \caption[CADET traffic per peer, only correct peers.]{CADET traffic for BSC per peer for 100 elements and only correct peers.
    Average over 50 executions.}
  \label{fig:consensus-baseline-traffic}
\end{figure}

\begin{figure}
  \includegraphics[width=\textwidth]{consensus-baseline-cpu.pdf}
  \caption[CPU usage of BSC, only correct peers.]{CPU of BSC for 100 elements of 64 bytes and only correct peers.
    Average over 50 executions.}
  \label{fig:consensus-baseline-cpu}
\end{figure}

\begin{figure}
  \includegraphics[width=\textwidth]{consensus-baseline-latency.pdf}
  \caption[Runtime of BSC, only correct peers.]{Runtime of BSC for 100 elements of 64 bytes and only correct peers.
    Average over 50 executions.}
  \label{fig:consensus-baseline-runtime}
\end{figure}

\begin{figure}
  \includegraphics[width=\textwidth]{consensus-amplify-traffic.pdf}
  \caption[CADET traffic for BSC, one malicious peer.]{CADET traffic for BSC on 100 elements of 64 bytes and
           one malicious peer with the indicated mode.  Average over 50 executions.}
  \label{fig:consensus-amplify-traffic}
\end{figure}

\begin{figure}
  \includegraphics[width=\textwidth]{consensus-amplify-latency.pdf}
  \caption[Latency for BSC, one malicious peer.]{Latency for BSC with 4 peers on 100 elements of 64 bytes and one
    malicious peer with the indicated mode. Average over 50
    executions.}
  \label{fig:consensus-amplify-runtime}
\end{figure}

\begin{figure}[b!]
  \includegraphics[width=\textwidth]{consensus-amplify-stuffed.pdf}
  \caption[Total number of extra elements received with one malicious peer.]{%
    Total number of extra elements received by each peer for
    BSC on 100 elements of 64 bytes and one malicious peer with
    the indicated mode. Average over 50 executions.}
  \label{fig:consensus-amplify-stuffed}
\end{figure}

As expected, traffic increases cubically with the number of peers when
no malicious peers are present
(Figure~\ref{fig:consensus-baseline-traffic}).  Most of the CPU time
(Figure~\ref{fig:consensus-baseline-cpu}) is taken up by CADET, which
uses expensive cryptographic operations~\cite{polot2014cadet}.  Since
we ran the experiments on a multicore machine, the total runtime
follows the same pattern as the traffic (Figure~\ref{fig:consensus-baseline-runtime}).

We now consider the performance implications from the presence of
malicious peers.  Figures~\ref{fig:consensus-amplify-runtime}
and Figure~\ref{fig:consensus-amplify-stuffed} show that bandwidth and
runtime increase linearly with the additional elements malicious peers
can exclusivly supply, in contrast to the sub-linear growth for the
non-Byzantine case (Figure~\ref{fig:set-exp1-traffic}).

Figure~\ref{fig:consensus-amplify-stuffed} highlights how the
different attack strategies impact the number of additional elements
that were received during set reconciliations: The number of stuffed
elements for the ``SpamEcho'' behavior is significantly larger than
for ``SpamLead'', since multiple ECHO rounds are executed for one LEAD
round, and the number of stuffed elements is fixed per reconciliation.
When malicious peers add extra elements during the LEAD round, the
effect of that is amplified, since all correct receivers have to
re-distribute the additional elements in the ECHO/CONFIRM round.  Even
though adding elements in the LEAD round requires the least bandwidth
from the leader, the effect on traffic and latency is the largest (see
Figures~\ref{fig:consensus-amplify-traffic} and
\ref{fig:consensus-amplify-runtime}).

As expected, when the number of stuffed elements is limited to a fixed
set, the effect on the performance is limited (``SpamAll-noreplace''
in Figures~\ref{fig:consensus-amplify-traffic},
\ref{fig:consensus-amplify-runtime},
\ref{fig:consensus-amplify-stuffed}).


\section{Opportunities for Further Improving BSC}

We now discuss some of the key limitations of the current
implementation and, how it could be optimized further.

\subsection{Extension to Partial Synchrony}

The prototype used in the evaluation only works in the synchronous
model.  It would be trivial to extend it to the partially synchronous
model with synchronous clocks by using the same construction as
PBFT~\cite{castro1999practical}, namely retrying the protocol with
larger round timeouts (usually doubled on each retry) when it did not
succeed.

It might be worthwhile to further investigate the Byzantine round
synchronization protocols discovered independently by Attya and
Dolev~\cite{attiya1984asynchronous} as well as Dwork, Lynch and
Stockmeyer~\cite{dwork1988consensus}.  Running a Byzantine clock
synchronization protocol interleaved with consensus protocol might
lead to a protocol with lower latency, since the timeouts are
dynamically adjusted instead of being increased for each failed
iteration.

\subsection{Persistent Data Structures}

Both the SET and CONSENSUS service have to store
many variations of the same set when faulty peers elide or add
elements.  While the SET service API already supports lazy
copying, the underlying implementation is inefficient and based on a
log of changes per element with an associated version number.  It
might be possible to reduce memory usage and increase performance of
the element storage by using data structures that are more well
suited, such as the persistent data structures described by
Okasaki~\cite{okasaki1999purely}.

\subsection{Fast Dissemination}

Recall that in order to be included in the final set, an element must
be sent to at least $t+1$ peers, so that at least one correct peer
will receive the element.  In applications of set-union consensus such
as electronic voting, the effort to the client should be minimized,
and thus in practice, elements might be sent only to $t+1$ peers,
which would lead to large initial symmetric differences between peers.

A possible optimization would be to add another dissemination round
that only requires $n \log_2 n$ reconciliations to achieve perfect
element distribution when only correct peers are present.  The $n^2$
reconciliations that follow will consequently be more efficient, since
no difference has to be reconciled when all peers are correct.  In the
presence of faulty peers, the optimization adds more overhead due to
the additional dissemination round.

More concretely, in the additional dissemination round the peers
reconcile with their $2^\ell$-th neighbour (for some arbitrary, fixed
order on the peers) in the $\ell$-th subround of the dissemination
round.  After $\lceil \log_2 \rceil$ of these subrounds, the elements
are perfectly distributed as long as every peer passed along their
current set correctly.


\section{Application to SMC} \label{sec:applications}

Secure multiparty computation (SMC) is an area of cryptography that is
concerned with protocols that allow a group of peers $\mathcal{P} =
P_1,\dots,P_n$ to jointly compute a function $y=f(x_1,\dots,x_n)$ over
private input values $x_1,\dots,x_n$ without using a trusted third
party \cite{goldwasser2005secure}.  Each peer $P_i$ contributes its
own input value $x_i$, and during the course of the SMC protocol,
$P_i$ ideally only learns the output $y$, but no additional
information about the other peers' input values.  Applications of SMC
include electronic voting, secure auctions and privacy-preserving data
mining.

SMC protocols often assume a threshold $t < n$ on the amount of peers
controlled by an adversary, which is typically either
\emph{honest-but-curious} (i.e., tries to learn as much information as
possible but follows the protocol) or \emph{actively malicious}.  The
actively malicious case mandates the availability of Byzantine
consensus as a building block~\cite{saia2015recent}.\footnote{An
  attempt has been made to relax the definition of SMC to alleviate
  this requirement, resulting in a weaker definition that includes
  non-unanimous \emph{aborts} as a possible result
  \cite{goldwasser2005secure}.  This definition is mainly useful in
  scenarios without an non-faulty $2/3$ majority, where Byzantine
  consensus is not possible in the asynchronous model
  \cite{dwork1988consensus}.}

In practical applications, the inputs typically consist of sets of
values that were given to the peers $\mathcal{P}$ by external clients:
In electronic voting protocols the peers need to agree on the set of
votes; with secure auctions, the peers need to agree on bids, and so
on.

In this section, we focus on one practical problem, namely electronic
voting.  We show how BSC is useful at multiple stages of the protocol,
and discuss how our approach differs from existing solutions found in
the literature.

\subsection{Bulletin Board for Electronic Voting}

The \emph{bulletin board} is a communication abstraction commonly used
for electronic voting \cite{benaloh1987verifiable,peters2005secure}.
It is a stateful, append-only channel that participants of the
election can post messages to.  Participants of the election identify
themselves with a public signing key and must sign all messages that
they post to the bulletin board.  The posted messages are publicly
available to facilitate independent auditing of elections.

Existing work on electronic voting either does not provide a Byzantine
fault-tolerant bulletin board in the first place
\cite{adida2008helios} and instead relies on trusted third parties, or
suggests the use of state machine replication \cite{cramer1997secure}.

Some of the bulletin board protocols surveyed by Peters
\cite{peters2005secure} use threshold signatures to
certify to the voter that the vote was accepted by a sufficiently
large fraction of the peers that jointly provide the bulletin board
service.  With a na\"ive approach, the message that certifies acceptance
by $t$ peers is the concatenation of the peers' individual signatures
and thus $O(t)$ bits large.  Threshold signature schemes allow smaller
signatures, but at the expense of a more complex protocol.  Since the
number of peers is typically not very large, a linear growth in $t$ is
acceptable, which makes the simple scheme sufficient for practical
implementations.

It is easy to implement a variant of the bulletin board with set-union
consensus.  In contrast to traditional bulletin boards, this variant
has \emph{phases}, where posted messages are only visible after the
group of peers have agreed that a phase is concluded.  The concept of
phases maps well to the requirements of existing voting
protocols. Every phase is implemented with one set-union consensus
execution.  To guarantee that a message is posted to the bulletin
board, it must be sent to at least one correct peer from the group of
peers that jointly implements the bulletin board.

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{set-consensus-deps.pdf}
  \caption[Different subsystems related to SMC in GNUnet.]{%
    Relation of different SMC protocols and communication
    primitives in GNUnet.  Dashed arrows indicate optional
    dependencies.}
  \label{fig:arch}
\end{figure}



\subsection{Distributed Threshold Key Generation and Cooperative Decryption}

Voting schemes as well as other secure multiparty computation
protocols often rely on threshold
cryptography~\cite{desmedt1994threshold}.  The basic intuition behind
threshold cryptography is that some operations---such as signing a
message or decrypting a ciphertext---should only succeed if a large
enough fraction of some group of peers cooperate.  Typically, the
public key of the threshold cryptosystem is publicly known, while the
private key is not known by any entity but reconstructible from the
shares that are distributed among the participants, for example, with
Shamir's secret sharing scheme~\cite{shamir1979share}.

Generating this shared secret key either requires a trusted third
party, or a protocol for distributed key
generation~\cite{fouque2001one,pedersen1991threshold}.  The former is
undesirable for most practical applications since it creates a single
point of failure.

In a distributed key generation protocol, each
peer contributes a number of \emph{pre-shares}.  The
peers agree on the set of pre-shares and each
peer re-combines them in a different way, yielding
the shares of the private threshold key.

In the key generation protocol used for the Cramer et al. voting
scheme, the number of pre-shares that need to be agreed upon is
quadratic in the number of peers.  Every peer needs to know every
pre-share, even if it is not required by the individual peer for
reconstructing the share, since the pre-shares are accompanied
by non-interactive proofs of correctness.  Thus, the number of values
that need to be agreed upon is quadratic in the number of peers,
which makes the use of set-union consensus attractive compared to
individual agreement.

Even though the pre-shares can be checked for correctness, Byzantine
consensus on the set of shares is still necessary for the case when a
malicious peer submits a incorrect share to only some peers.  Without
Byzantine consensus, different correct recipients might exclude
different peers, resulting in inconsistent shares.

Similarly, when a message that was encrypted with the threshold public
key shall be decryped, every peer contributes a \emph{partial
  decryption} with a proof of correctness.  While the set of partial
decryptions is typically linear in the number of peers, set-union
consensus is still a reasonable choice here, this way the whole system only
needs one agreement primitive.

\subsection{Electronic Voting with Homomorphic Encryption}

Various conceptually different voting schemes use homomorphic
encryption; we look as the scheme by Cramer et
al.~\cite{cramer1997secure} as a modern and practical representative.
A fundamental mechanism of the voting scheme is that a set of voting
authorities $A_1,\dots,A_n$ establish a threshold key pair that
allows any entity that knows the public part of the key to encrypt a
message that can only be decrypted when a threshold of the voting
authorities cooperate. The homomorphism in the cryptosystem enables
the computation of an encrypted tally with only the ciphertext of the
submitted ballots.  Ballots represent a choice of one candidate from a
list of candidate options.  The validity of encrypted ballot is ensured
by equipping them with a non-interactive zero-knowledge proof of their
validity.

It is assumed that the adversary is not able to corrupt more than
$1/3$ of the authorities.  The voting process itself is then
facilitated by all voters encrypting their vote and submitting it to
the authorities.  The encrypted tally is computed by every authority
and then cooperatively decrypted by the authorities and published.
Since correct authorities will only agree to decrypt the final tally
and not individual ballots, the anonymity of the voter is preserved.
For the voting scheme to work correctly, all correct peers must agree
on exactly the same set of ballots before the cooperative decryption
process starts, otherwise the decryption of the tally will fail.

Using BSC for this final step to agree on a set of ballots again makes
sense, as the number of ballots is typically much larger than the number
of authorities.  Figure~\ref{fig:arch} summarizes the various ways how
BSC and is used in our implementation~\cite{dold2014crypto} of
Cramer-style~\cite{cramer1997secure} electronic voting.


\subsection{Other Applications of BSC}

Bitcoin~\cite{nakamoto2008bitcoin} has gained immense popularity over
the past few years.  Bitcoin solves a slight variation of Byzantine
consensus without strong
validity~\cite{miller2014anonymous,garay2015bitcoin}.  Given that a
block in Bitcoin is basically just a set of (valid) transactions,
BSC could be used to efficiently achieve agreement between participants
about the next transaction group.  Here, the most natural application
would be to use BSC to improve the efficiency of proof-of-stake
incentivized peers running BFT consensus in
Cosmos~\cite{cosmos}.

%Ripple~\cite{schwartz2014ripple} also purports to implement a
%variation of Byzantine consensus over sets of financial transactions.
%The set of peers that participate in the Ripple consensus is not
%globally defined, but each peer its own fixed list of peers, called
%the unique node list (UNL).  Each UNL is individually assumed to hold
%a 80\% majority of correct peers.  Unlinke in our model, where peers
%agree on the whole set at once, in Ripple there is only a majority
%vote on elements that are incrementally accepted and applied as valid
%transactions.
% FIXME: the above sentence is still confusing.

\section{Conclusions}

Given $m$ ballots, $n$ authorities, $f$ Byzantine faults and $k$
ballots exclusively available to the adversary, voting with BSC
achieves a complexity of $O(mn + (f+k)n^3)$, which in practice is
better than the $O(mn^2)$ complexity of using SMR as $m$ is usually
significantly larger than $n$.  Equivalent arguments hold for other
applications requiring consensus over large sets.  Furthermore, BSC
remains advantageous in the absence of Byzantine failures, and the
bounded set reconciliation makes it particularly efficient at handling
various non-Byzantine faults.

To ensure these performance bounds, BSC requires a bounded variant of
Eppstein's set reconciliation protocol that ensures that individual
steps in the protocol cannot consume unbounded amounts of bandwidth.
We are currently applying bounded set reconciliation in related
domains, as any set reconciation can be made more robust if the
complexity of the operation is bounded.  For example, the GNU Name
System~\cite{gns2014wachs} can use bounded set reconciliation when
gossiping sets of key revocation sets. Here, the use of bounded set
reconciliation protects the key revocation protocol against
denial-of-service attacks where an attacker might have previously sent
excessively large IBFs or retransmitted known revocation messages
already known to the recipient.  The result is an efficient and
resilient method for disseminating key revocation data.

In future work, it would be interesting to apply bounded set
reconciliation to Byzantine consensus protocols that are more
efficient than the simple gradecast consensus.  It would also be
interesting to experimentally compare bulletin boards using BSC with
those using traditional replicated state machines.