subsystems.rst (87468B)
1 .. _subsystems: 2 3 Subsystems 4 ********** 5 6 This section consists brief description of the subsystems that make up 7 GNUnet. 8 This image is giving an overview over system dependencies and interactions. 9 10 .. image:: /images/gnunet-arch-full.svg 11 12 CADET - Decentralized End-to-end Transport 13 ========================================== 14 15 The Confidential Ad-hoc Decentralized End-to-end Transport (CADET) subsystem 16 in GNUnet is responsible for secure end-to-end 17 communications between nodes in the GNUnet overlay network. CADET builds 18 on the CORE subsystem, which provides for the link-layer communication, 19 by adding routing, forwarding, and additional security to the 20 connections. CADET offers the same cryptographic services as CORE, but 21 on an end-to-end level. This is done so peers retransmitting traffic on 22 behalf of other peers cannot access the payload data. 23 24 - CADET provides confidentiality with so-called perfect forward 25 secrecy; we use ECDHE powered by Curve25519 for the key exchange and 26 then use symmetric encryption, encrypting with both AES-256 and 27 Twofish 28 29 - authentication is achieved by signing the ephemeral keys using 30 Ed25519, a deterministic variant of ECDSA 31 32 - integrity protection (using SHA-512 to do encrypt-then-MAC, although 33 only 256 bits are sent to reduce overhead) 34 35 - replay protection (using nonces, timestamps, challenge-response, 36 message counters and ephemeral keys) 37 38 - liveness (keep-alive messages, timeout) 39 40 Additional to the CORE-like security benefits, CADET offers other 41 properties that make it a more universal service than CORE. 42 43 - CADET can establish channels to arbitrary peers in GNUnet. If a peer 44 is not immediately reachable, CADET will find a path through the 45 network and ask other peers to retransmit the traffic on its behalf. 46 47 - CADET offers (optional) reliability mechanisms. In a reliable channel 48 traffic is guaranteed to arrive complete, unchanged and in-order. 49 50 - CADET takes care of flow and congestion control mechanisms, not 51 allowing the sender to send more traffic than the receiver or the 52 network are able to process. 53 54 .. _CORE-Subsystem: 55 56 .. index:: 57 double: CORE; subsystem 58 59 CORE - GNUnet link layer 60 ======================== 61 62 The CORE subsystem in GNUnet is responsible for securing link-layer 63 communications between nodes in the GNUnet overlay network. CORE builds 64 on the TRANSPORT subsystem which provides for the actual, insecure, 65 unreliable link-layer communication (for example, via UDP or WLAN), and 66 then adds fundamental security to the connections: 67 68 - confidentiality with so-called perfect forward secrecy; we use ECDHE 69 (`Elliptic-curve 70 Diffie—Hellman <http://en.wikipedia.org/wiki/Elliptic_curve_Diffie%E2%80%93Hellman>`__) 71 powered by Curve25519 (`Curve25519 <http://cr.yp.to/ecdh.html>`__) 72 for the key exchange and then use symmetric encryption, encrypting 73 with both AES-256 74 (`AES-256 <http://en.wikipedia.org/wiki/Rijndael>`__) and Twofish 75 (`Twofish <http://en.wikipedia.org/wiki/Twofish>`__) 76 77 - `authentication <http://en.wikipedia.org/wiki/Authentication>`__ is 78 achieved by signing the ephemeral keys using Ed25519 79 (`Ed25519 <http://ed25519.cr.yp.to/>`__), a deterministic variant of 80 ECDSA (`ECDSA <http://en.wikipedia.org/wiki/ECDSA>`__) 81 82 - integrity protection (using SHA-512 83 (`SHA-512 <http://en.wikipedia.org/wiki/SHA-2>`__) to do 84 encrypt-then-MAC 85 (`encrypt-then-MAC <http://en.wikipedia.org/wiki/Authenticated_encryption>`__)) 86 87 - Replay (`replay <http://en.wikipedia.org/wiki/Replay_attack>`__) 88 protection (using nonces, timestamps, challenge-response, message 89 counters and ephemeral keys) 90 91 - liveness (keep-alive messages, timeout) 92 93 .. _Limitations: 94 95 :index:`Limitations <CORE; limitations>` 96 Limitations 97 ----------- 98 99 CORE does not perform 100 `routing <http://en.wikipedia.org/wiki/Routing>`__; using CORE it is 101 only possible to communicate with peers that happen to already be 102 \"directly\" connected with each other. CORE also does not have an API 103 to allow applications to establish such \"direct\" connections --- for 104 this, applications can ask TRANSPORT, but TRANSPORT might not be able to 105 establish a \"direct\" connection. The TOPOLOGY subsystem is responsible 106 for trying to keep a few \"direct\" connections open at all times. 107 Applications that need to talk to particular peers should use the CADET 108 subsystem, as it can establish arbitrary \"indirect\" connections. 109 110 Because CORE does not perform routing, CORE must only be used directly 111 by applications that either perform their own routing logic (such as 112 anonymous file-sharing) or that do not require routing, for example 113 because they are based on flooding the network. CORE communication is 114 unreliable and delivery is possibly out-of-order. Applications that 115 require reliable communication should use the CADET service. Each 116 application can only queue one message per target peer with the CORE 117 service at any time; messages cannot be larger than approximately 63 118 kilobytes. If messages are small, CORE may group multiple messages 119 (possibly from different applications) prior to encryption. If permitted 120 by the application (using the `cork <http://baus.net/on-tcp_cork/>`__ 121 option), CORE may delay transmissions to facilitate grouping of multiple 122 small messages. If cork is not enabled, CORE will transmit the message 123 as soon as TRANSPORT allows it (TRANSPORT is responsible for limiting 124 bandwidth and congestion control). CORE does not allow flow control; 125 applications are expected to process messages at line-speed. If flow 126 control is needed, applications should use the CADET service. 127 128 .. when is a peer connected 129 .. _When-is-a-peer-_0022connected_0022_003f: 130 131 When is a peer \"connected\"? 132 ----------------------------- 133 134 In addition to the security features mentioned above, CORE also provides 135 one additional key feature to applications using it, and that is a 136 limited form of protocol-compatibility checking. CORE distinguishes 137 between TRANSPORT-level connections (which enable communication with 138 other peers) and application-level connections. Applications using the 139 CORE API will (typically) learn about application-level connections from 140 CORE, and not about TRANSPORT-level connections. When a typical 141 application uses CORE, it will specify a set of message types (from 142 ``gnunet_protocols.h``) that it understands. CORE will then notify the 143 application about connections it has with other peers if and only if 144 those applications registered an intersecting set of message types with 145 their CORE service. Thus, it is quite possible that CORE only exposes a 146 subset of the established direct connections to a particular application 147 --- and different applications running above CORE might see different 148 sets of connections at the same time. 149 150 A special case are applications that do not register a handler for any 151 message type. CORE assumes that these applications merely want to 152 monitor connections (or \"all\" messages via other callbacks) and will 153 notify those applications about all connections. This is used, for 154 example, by the ``gnunet-core`` command-line tool to display the active 155 connections. Note that it is also possible that the TRANSPORT service 156 has more active connections than the CORE service, as the CORE service 157 first has to perform a key exchange with connecting peers before 158 exchanging information about supported message types and notifying 159 applications about the new connection. 160 .. _Distributed-Hash-Table-_0028DHT_0029: 161 162 .. index:: 163 double: Distributed hash table; subsystem 164 see: DHT; Distributed hash table 165 166 DHT - Distributed Hash Table 167 ============================ 168 169 GNUnet includes a generic distributed hash table that can be used by 170 developers building P2P applications in the framework. This section 171 documents high-level features and how developers are expected to use the 172 DHT. We have a research paper detailing how the DHT works. Also, Nate's 173 thesis includes a detailed description and performance analysis (in 174 chapter 6). [R5N2011]_ [EVANS2011]_ 175 176 Key features of GNUnet's DHT include: 177 178 - stores key-value pairs with values up to (approximately) 63k in size 179 180 - works with many underlay network topologies (small-world, random 181 graph), underlay does not need to be a full mesh / clique 182 183 - support for extended queries (more than just a simple 'key'), 184 filtering duplicate replies within the network (bloomfilter) and 185 content validation (for details, please read the subsection on the 186 block library) 187 188 - can (optionally) return paths taken by the PUT and GET operations to 189 the application 190 191 - provides content replication to handle churn 192 193 GNUnet's DHT is randomized and unreliable. Unreliable means that there 194 is no strict guarantee that a value stored in the DHT is always found 195 — values are only found with high probability. While this is somewhat 196 true in all P2P DHTs, GNUnet developers should be particularly wary of 197 this fact (this will help you write secure, fault-tolerant code). Thus, 198 when writing any application using the DHT, you should always consider 199 the possibility that a value stored in the DHT by you or some other peer 200 might simply not be returned, or returned with a significant delay. Your 201 application logic must be written to tolerate this (naturally, some loss 202 of performance or quality of service is expected in this case). 203 204 .. _Block-library-and-plugins: 205 206 Block library and plugins 207 ------------------------- 208 209 .. _What-is-a-Block_003f: 210 211 What is a Block? 212 ^^^^^^^^^^^^^^^^ 213 214 Blocks are small (< 63k) pieces of data stored under a key (struct 215 GNUNET_HashCode). Blocks have a type (enum GNUNET_BlockType) which 216 defines their data format. Blocks are used in GNUnet as units of static 217 data exchanged between peers and stored (or cached) locally. Uses of 218 blocks include file-sharing (the files are broken up into blocks), the 219 VPN (DNS information is stored in blocks) and the DHT (all information 220 in the DHT and meta-information for the maintenance of the DHT are both 221 stored using blocks). The block subsystem provides a few common 222 functions that must be available for any type of block. 223 224 225 .. [R5N2011] https://bib.gnunet.org/date.html#R5N 226 .. [EVANS2011] https://d-nb.info/1015129951 227 .. index:: 228 double: File sharing; subsystem 229 see: FS; File sharing 230 231 .. _File_002dsharing-_0028FS_0029-Subsystem: 232 233 FS — File sharing over GNUnet 234 ============================= 235 236 This chapter describes the details of how the file-sharing service 237 works. As with all services, it is split into an API (libgnunetfs), the 238 service process (gnunet-service-fs) and user interface(s). The 239 file-sharing service uses the datastore service to store blocks and the 240 DHT (and indirectly datacache) for lookups for non-anonymous 241 file-sharing. Furthermore, the file-sharing service uses the block 242 library (and the block fs plugin) for validation of DHT operations. 243 244 In contrast to many other services, libgnunetfs is rather complex since 245 the client library includes a large number of high-level abstractions; 246 this is necessary since the FS service itself largely only operates on 247 the block level. The FS library is responsible for providing a 248 file-based abstraction to applications, including directories, meta 249 data, keyword search, verification, and so on. 250 251 The method used by GNUnet to break large files into blocks and to use 252 keyword search is called the \"Encoding for Censorship Resistant 253 Sharing\" (ECRS). ECRS is largely implemented in the fs library; block 254 validation is also reflected in the block FS plugin and the FS service. 255 ECRS on-demand encoding is implemented in the FS service. 256 257 .. note:: The documentation in this chapter is quite incomplete. 258 259 .. _Encoding-for-Censorship_002dResistant-Sharing-_0028ECRS_0029: 260 261 .. index:: 262 see: Encoding for Censorship-Resistant Sharing; ECRS 263 264 :index:`ECRS — Encoding for Censorship-Resistant Sharing <single: ECRS>` 265 ECRS — Encoding for Censorship-Resistant Sharing 266 ------------------------------------------------ 267 268 When GNUnet shares files, it uses a content encoding that is called 269 ECRS, the Encoding for Censorship-Resistant Sharing. Most of ECRS is 270 described in the (so far unpublished) research paper attached to this 271 page. ECRS obsoletes the previous ESED and ESED II encodings which were 272 used in GNUnet before version 0.7.0. The rest of this page assumes that 273 the reader is familiar with the attached paper. What follows is a 274 description of some minor extensions that GNUnet makes over what is 275 described in the paper. The reason why these extensions are not in the 276 paper is that we felt that they were obvious or trivial extensions to 277 the original scheme and thus did not warrant space in the research 278 report. 279 280 .. todo:: Find missing link to file system paper. 281 282 .. index:: 283 double: GNU Name System; subsystem 284 see: GNS; GNU Name System 285 286 .. _GNU-Name-System-_0028GNS_0029: 287 288 GNS - the GNU Name system 289 ------------------------- 290 291 The GNU Name System (GNS) is a decentralized database that enables users 292 to securely resolve names to values. Names can be used to identify other 293 users (for example, in social networking), or network services (for 294 example, VPN services running at a peer in GNUnet, or purely IP-based 295 services on the Internet). Users interact with GNS by typing in a 296 hostname that ends in a top-level domain that is configured in the "GNS" 297 section, matches an identity of the user or ends in a Base32-encoded 298 public key. 299 300 Videos giving an overview of most of the GNS and the motivations behind 301 it is available here and here. The remainder of this chapter targets 302 developers that are familiar with high level concepts of GNS as 303 presented in these talks. 304 305 .. todo:: Link to videos and GNS talks? 306 307 GNS-aware applications should use the GNS resolver to obtain the 308 respective records that are stored under that name in GNS. Each record 309 consists of a type, value, expiration time and flags. 310 311 The type specifies the format of the value. Types below 65536 correspond 312 to DNS record types, larger values are used for GNS-specific records. 313 Applications can define new GNS record types by reserving a number and 314 implementing a plugin (which mostly needs to convert the binary value 315 representation to a human-readable text format and vice-versa). The 316 expiration time specifies how long the record is to be valid. The GNS 317 API ensures that applications are only given non-expired values. The 318 flags are typically irrelevant for applications, as GNS uses them 319 internally to control visibility and validity of records. 320 321 Records are stored along with a signature. The signature is generated 322 using the private key of the authoritative zone. This allows any GNS 323 resolver to verify the correctness of a name-value mapping. 324 325 Internally, GNS uses the NAMECACHE to cache information obtained from 326 other users, the NAMESTORE to store information specific to the local 327 users, and the DHT to exchange data between users. A plugin API is used 328 to enable applications to define new GNS record types. 329 330 .. index:: 331 single: GNS; name cache 332 double: subsystem; NAMECACHE 333 334 .. _GNS-Namecache: 335 336 NAMECACHE — DHT caching of GNS results 337 ====================================== 338 339 The NAMECACHE subsystem is responsible for caching (encrypted) 340 resolution results of the GNU Name System (GNS). GNS makes zone 341 information available to other users via the DHT. However, as accessing 342 the DHT for every lookup is expensive (and as the DHT's local cache is 343 lost whenever the peer is restarted), GNS uses the NAMECACHE as a more 344 persistent cache for DHT lookups. Thus, instead of always looking up 345 every name in the DHT, GNS first checks if the result is already 346 available locally in the NAMECACHE. Only if there is no result in the 347 NAMECACHE, GNS queries the DHT. The NAMECACHE stores data in the same 348 (encrypted) format as the DHT. It thus makes no sense to iterate over 349 all items in the NAMECACHE – the NAMECACHE does not have a way to 350 provide the keys required to decrypt the entries. 351 352 Blocks in the NAMECACHE share the same expiration mechanism as blocks in 353 the DHT – the block expires wheneever any of the records in the 354 (encrypted) block expires. The expiration time of the block is the only 355 information stored in plaintext. The NAMECACHE service internally 356 performs all of the required work to expire blocks, clients do not have 357 to worry about this. Also, given that NAMECACHE stores only GNS blocks 358 that local users requested, there is no configuration option to limit 359 the size of the NAMECACHE. It is assumed to be always small enough (a 360 few MB) to fit on the drive. 361 362 The NAMECACHE supports the use of different database backends via a 363 plugin API. 364 365 .. index:: 366 double: subsystem; NAMESTORE 367 368 .. _NAMESTORE-Subsystem: 369 370 NAMESTORE — Storage of local GNS zones 371 ====================================== 372 373 The NAMESTORE subsystem provides persistent storage for local GNS zone 374 information. All local GNS zone information are managed by NAMESTORE. It 375 provides both the functionality to administer local GNS information 376 (e.g. delete and add records) as well as to retrieve GNS information 377 (e.g to list name information in a client). NAMESTORE does only manage 378 the persistent storage of zone information belonging to the user running 379 the service: GNS information from other users obtained from the DHT are 380 stored by the NAMECACHE subsystem. 381 382 NAMESTORE uses a plugin-based database backend to store GNS information 383 with good performance. Here sqlite and PostgreSQL are supported 384 database backends. NAMESTORE clients interact with the IDENTITY 385 subsystem to obtain cryptographic information about zones based on egos 386 as described with the IDENTITY subsystem, but internally NAMESTORE 387 refers to zones using the respective private key. 388 389 NAMESTORE is queried and monitored by the ZONEMASTER service which periodically 390 publishes public records of GNS zones. ZONEMASTER also 391 collaborates with the NAMECACHE subsystem and stores zone information 392 when local information are modified in the NAMECACHE cache to increase look-up 393 performance for local information and to enable local access to private records 394 in zones through GNS. 395 396 NAMESTORE provides functionality to look-up and store records, to 397 iterate over a specific or all zones and to monitor zones for changes. 398 NAMESTORE functionality can be accessed using the NAMESTORE C API, the NAMESTORE 399 REST API, or the NAMESTORE command line tool. 400 401 .. index:: 402 double: HOSTLIST; subsystem 403 404 .. _HOSTLIST-Subsystem: 405 406 HOSTLIST — HELLO bootstrapping and gossip 407 ========================================= 408 409 Peers in the GNUnet overlay network need address information so that 410 they can connect with other peers. GNUnet uses so called HELLO messages 411 to store and exchange peer addresses. GNUnet provides several methods 412 for peers to obtain this information: 413 414 - out-of-band exchange of HELLO messages (manually, using for example 415 gnunet-core) 416 417 - HELLO messages shipped with GNUnet (automatic with distribution) 418 419 - UDP neighbor discovery in LAN (IPv4 broadcast, IPv6 multicast) 420 421 - topology gossiping (learning from other peers we already connected 422 to), and 423 424 - the HOSTLIST daemon covered in this section, which is particularly 425 relevant for bootstrapping new peers. 426 427 New peers have no existing connections (and thus cannot learn from 428 gossip among peers), may not have other peers in their LAN and might be 429 started with an outdated set of HELLO messages from the distribution. In 430 this case, getting new peers to connect to the network requires either 431 manual effort or the use of a HOSTLIST to obtain HELLOs. 432 433 .. _HELLOs: 434 435 HELLOs 436 ------ 437 438 The basic information peers require to connect to other peers are 439 contained in so called HELLO messages you can think of as a business 440 card. Besides the identity of the peer (based on the cryptographic 441 public key) a HELLO message may contain address information that 442 specifies ways to contact a peer. By obtaining HELLO messages, a peer 443 can learn how to contact other peers. 444 445 .. _Overview-for-the-HOSTLIST-subsystem: 446 447 Overview for the HOSTLIST subsystem 448 ----------------------------------- 449 450 The HOSTLIST subsystem provides a way to distribute and obtain contact 451 information to connect to other peers using a simple HTTP GET request. 452 Its implementation is split in three parts, the main file for the 453 daemon itself (``gnunet-daemon-hostlist.c``), the HTTP client used to 454 download peer information (``hostlist-client.c``) and the server 455 component used to provide this information to other peers 456 (``hostlist-server.c``). The server is basically a small HTTP web server 457 (based on GNU libmicrohttpd) which provides a list of HELLOs known to 458 the local peer for download. The client component is basically a HTTP 459 client (based on libcurl) which can download hostlists from one or more 460 websites. The hostlist format is a binary blob containing a sequence of 461 HELLO messages. Note that any HTTP server can theoretically serve a 462 hostlist, the built-in hostlist server makes it simply convenient to 463 offer this service. 464 465 .. _Features: 466 467 Features 468 ^^^^^^^^ 469 470 The HOSTLIST daemon can: 471 472 - provide HELLO messages with validated addresses obtained from 473 PEERINFO to download for other peers 474 475 - download HELLO messages and forward these message to the TRANSPORT 476 subsystem for validation 477 478 - advertises the URL of this peer's hostlist address to other peers via 479 gossip 480 481 - automatically learn about hostlist servers from the gossip of other 482 peers 483 484 .. _HOSTLIST-_002d-Limitations: 485 486 HOSTLIST - Limitations 487 ^^^^^^^^^^^^^^^^^^^^^^ 488 489 The HOSTLIST daemon does not: 490 491 - verify the cryptographic information in the HELLO messages 492 493 - verify the address information in the HELLO messages 494 495 .. _Interacting-with-the-HOSTLIST-daemon: 496 497 Interacting with the HOSTLIST daemon 498 ------------------------------------ 499 500 The HOSTLIST subsystem is currently implemented as a daemon, so there is 501 no need for the user to interact with it and therefore there is no 502 command line tool and no API to communicate with the daemon. In the 503 future, we can envision changing this to allow users to manually trigger 504 the download of a hostlist. 505 506 Since there is no command line interface to interact with HOSTLIST, the 507 only way to interact with the hostlist is to use STATISTICS to obtain or 508 modify information about the status of HOSTLIST: 509 510 :: 511 512 $ gnunet-statistics -s hostlist 513 514 In particular, HOSTLIST includes a **persistent** value in statistics 515 that specifies when the hostlist server might be queried next. As this 516 value is exponentially increasing during runtime, developers may want to 517 reset or manually adjust it. Note that HOSTLIST (but not STATISTICS) 518 needs to be shutdown if changes to this value are to have any effect on 519 the daemon (as HOSTLIST does not monitor STATISTICS for changes to the 520 download frequency). 521 522 .. _Hostlist-security-address-validation: 523 524 Hostlist security address validation 525 ------------------------------------ 526 527 Since information obtained from other parties cannot be trusted without 528 validation, we have to distinguish between *validated* and *not 529 validated* addresses. Before using (and so trusting) information from 530 other parties, this information has to be double-checked (validated). 531 Address validation is not done by HOSTLIST but by the TRANSPORT service. 532 533 The HOSTLIST component is functionally located between the PEERINFO and 534 the TRANSPORT subsystem. When acting as a server, the daemon obtains 535 valid (*validated*) peer information (HELLO messages) from the PEERINFO 536 service and provides it to other peers. When acting as a client, it 537 contacts the HOSTLIST servers specified in the configuration, downloads 538 the (unvalidated) list of HELLO messages and forwards these information 539 to the TRANSPORT server to validate the addresses. 540 541 .. _The-HOSTLIST-daemon: 542 543 :index:`The HOSTLIST daemon <double: daemon; HOSTLIST>` 544 The HOSTLIST daemon 545 ------------------- 546 547 The hostlist daemon is the main component of the HOSTLIST subsystem. It 548 is started by the ARM service and (if configured) starts the HOSTLIST 549 client and server components. 550 551 GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT 552 If the daemon provides a hostlist itself it can advertise it's own 553 hostlist to other peers. To do so it sends a 554 ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT`` message to other peers 555 when they connect to this peer on the CORE level. This hostlist 556 advertisement message contains the URL to access the HOSTLIST HTTP 557 server of the sender. The daemon may also subscribe to this type of 558 message from CORE service, and then forward these kind of message to the 559 HOSTLIST client. The client then uses all available URLs to download 560 peer information when necessary. 561 562 When starting, the HOSTLIST daemon first connects to the CORE subsystem 563 and if hostlist learning is enabled, registers a CORE handler to receive 564 this kind of messages. Next it starts (if configured) the client and 565 server. It passes pointers to CORE connect and disconnect and receive 566 handlers where the client and server store their functions, so the 567 daemon can notify them about CORE events. 568 569 To clean up on shutdown, the daemon has a cleaning task, shutting down 570 all subsystems and disconnecting from CORE. 571 572 .. _The-HOSTLIST-server: 573 574 :index:`The HOSTLIST server <single: HOSTLIST; server>` 575 The HOSTLIST server 576 ------------------- 577 578 The server provides a way for other peers to obtain HELLOs. Basically it 579 is a small web server other peers can connect to and download a list of 580 HELLOs using standard HTTP; it may also advertise the URL of the 581 hostlist to other peers connecting on CORE level. 582 583 .. _The-HTTP-Server: 584 585 The HTTP Server 586 ^^^^^^^^^^^^^^^ 587 588 During startup, the server starts a web server listening on the port 589 specified with the HTTPPORT value (default 8080). In addition it 590 connects to the PEERINFO service to obtain peer information. The 591 HOSTLIST server uses the GNUNET_PEERINFO_iterate function to request 592 HELLO information for all peers and adds their information to a new 593 hostlist if they are suitable (expired addresses and HELLOs without 594 addresses are both not suitable) and the maximum size for a hostlist is 595 not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When PEERINFO finishes 596 (with a last NULL callback), the server destroys the previous hostlist 597 response available for download on the web server and replaces it with 598 the updated hostlist. The hostlist format is basically a sequence of 599 HELLO messages (as obtained from PEERINFO) without any special 600 tokenization. Since each HELLO message contains a size field, the 601 response can easily be split into separate HELLO messages by the client. 602 603 A HOSTLIST client connecting to the HOSTLIST server will receive the 604 hostlist as an HTTP response and the server will terminate the 605 connection with the result code ``HTTP 200 OK``. The connection will be 606 closed immediately if no hostlist is available. 607 608 .. _Advertising-the-URL: 609 610 Advertising the URL 611 ^^^^^^^^^^^^^^^^^^^ 612 613 The server also advertises the URL to download the hostlist to other 614 peers if hostlist advertisement is enabled. When a new peer connects and 615 has hostlist learning enabled, the server sends a 616 ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT`` message to this peer 617 using the CORE service. 618 619 HOSTLIST client 620 .. _The-HOSTLIST-client: 621 622 The HOSTLIST client 623 ------------------- 624 625 The client provides the functionality to download the list of HELLOs 626 from a set of URLs. It performs a standard HTTP request to the URLs 627 configured and learned from advertisement messages received from other 628 peers. When a HELLO is downloaded, the HOSTLIST client forwards the 629 HELLO to the TRANSPORT service for validation. 630 631 The client supports two modes of operation: 632 633 - download of HELLOs (bootstrapping) 634 635 - learning of URLs 636 637 .. _Bootstrapping: 638 639 Bootstrapping 640 ^^^^^^^^^^^^^ 641 642 For bootstrapping, it schedules a task to download the hostlist from the 643 set of known URLs. The downloads are only performed if the number of 644 current connections is smaller than a minimum number of connections (at 645 the moment 4). The interval between downloads increases exponentially; 646 however, the exponential growth is limited if it becomes longer than an 647 hour. At that point, the frequency growth is capped at (#number of 648 connections \* 1h). 649 650 Once the decision has been taken to download HELLOs, the daemon chooses 651 a random URL from the list of known URLs. URLs can be configured in the 652 configuration or be learned from advertisement messages. The client uses 653 a HTTP client library (libcurl) to initiate the download using the 654 libcurl multi interface. Libcurl passes the data to the 655 callback_download function which stores the data in a buffer if space is 656 available and the maximum size for a hostlist download is not exceeded 657 (MAX_BYTES_PER_HOSTLISTS = 500000). When a full HELLO was downloaded, 658 the HOSTLIST client offers this HELLO message to the TRANSPORT service 659 for validation. When the download is finished or failed, statistical 660 information about the quality of this URL is updated. 661 662 .. _Learning: 663 664 :index:`Learning <single: HOSTLIST; learning>` 665 Learning 666 ^^^^^^^^ 667 668 The client also manages hostlist advertisements from other peers. The 669 HOSTLIST daemon forwards ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT`` 670 messages to the client subsystem, which extracts the URL from the 671 message. Next, a test of the newly obtained URL is performed by 672 triggering a download from the new URL. If the URL works correctly, it 673 is added to the list of working URLs. 674 675 The size of the list of URLs is restricted, so if an additional server 676 is added and the list is full, the URL with the worst quality ranking 677 (determined through successful downloads and number of HELLOs e.g.) is 678 discarded. During shutdown the list of URLs is saved to a file for 679 persistence and loaded on startup. URLs from the configuration file are 680 never discarded. 681 682 .. _Usage: 683 684 Usage 685 ----- 686 687 To start HOSTLIST by default, it has to be added to the DEFAULTSERVICES 688 section for the ARM services. This is done in the default configuration. 689 690 For more information on how to configure the HOSTLIST subsystem see the 691 installation handbook: Configuring the hostlist to bootstrap Configuring 692 your peer to provide a hostlist 693 694 .. index:: 695 double: IDENTITY; subsystem 696 697 .. _IDENTITY-Subsystem: 698 699 IDENTITY — Ego management 700 ========================= 701 702 Identities of \"users\" in GNUnet are called egos. Egos can be used as 703 pseudonyms (\"fake names\") or be tied to an organization (for example, 704 \"GNU\") or even the actual identity of a human. GNUnet users are 705 expected to have many egos. They might have one tied to their real 706 identity, some for organizations they manage, and more for different 707 domains where they want to operate under a pseudonym. 708 709 The IDENTITY service allows users to manage their egos. The identity 710 service manages the private keys egos of the local user; it does not 711 manage identities of other users (public keys). Public keys for other 712 users need names to become manageable. GNUnet uses the GNU Name System 713 (GNS) to give names to other users and manage their public keys 714 securely. This chapter is about the IDENTITY service, which is about the 715 management of private keys. 716 717 On the network, an ego corresponds to an ECDSA key (over Curve25519, 718 using RFC 6979, as required by GNS). Thus, users can perform actions 719 under a particular ego by using (signing with) a particular private key. 720 Other users can then confirm that the action was really performed by 721 that ego by checking the signature against the respective public key. 722 723 The IDENTITY service allows users to associate a human-readable name 724 with each ego. This way, users can use names that will remind them of 725 the purpose of a particular ego. The IDENTITY service will store the 726 respective private keys and allows applications to access key 727 information by name. Users can change the name that is locally (!) 728 associated with an ego. Egos can also be deleted, which means that the 729 private key will be removed and it thus will not be possible to perform 730 actions with that ego in the future. 731 732 Additionally, the IDENTITY subsystem can associate service functions 733 with egos. For example, GNS requires the ego that should be used for the 734 shorten zone. GNS will ask IDENTITY for an ego for the \"gns-short\" 735 service. The IDENTITY service has a mapping of such service strings to 736 the name of the ego that the user wants to use for this service, for 737 example \"my-short-zone-ego\". 738 739 Finally, the IDENTITY API provides access to a special ego, the 740 anonymous ego. The anonymous ego is special in that its private key is 741 not really private, but fixed and known to everyone. Thus, anyone can 742 perform actions as anonymous. This can be useful as with this trick, 743 code does not have to contain a special case to distinguish between 744 anonymous and pseudonymous egos. 745 746 .. index:: 747 double: subsystem; MESSENGER 748 749 .. _MESSENGER-Subsystem: 750 751 MESSENGER — Room-based end-to-end messaging 752 =========================================== 753 754 The MESSENGER subsystem is responsible for secure end-to-end 755 communication in groups of nodes in the GNUnet overlay network. 756 MESSENGER builds on the CADET subsystem which provides a reliable and 757 secure end-to-end communication between the nodes inside of these 758 groups. 759 760 Additionally to the CADET security benefits, MESSENGER provides 761 following properties designed for application level usage: 762 763 - MESSENGER provides integrity by signing the messages with the users 764 provided ego 765 766 - MESSENGER adds (optional) forward secrecy by replacing the key pair 767 of the used ego and signing the propagation of the new one with old 768 one (chaining egos) 769 770 - MESSENGER provides verification of a original sender by checking 771 against all used egos from a member which are currently in active use 772 (active use depends on the state of a member session) 773 774 - MESSENGER offsers (optional) decentralized message forwarding between 775 all nodes in a group to improve availability and prevent MITM-attacks 776 777 - MESSENGER handles new connections and disconnections from nodes in 778 the group by reconnecting them preserving an efficient structure for 779 message distribution (ensuring availability and accountablity) 780 781 - MESSENGER provides replay protection (messages can be uniquely 782 identified via SHA-512, include a timestamp and the hash of the last 783 message) 784 785 - MESSENGER allows detection for dropped messages by chaining them 786 (messages refer to the last message by their hash) improving 787 accountability 788 789 - MESSENGER allows requesting messages from other peers explicitly to 790 ensure availability 791 792 - MESSENGER provides confidentiality by padding messages to few 793 different sizes (512 bytes, 4096 bytes, 32768 bytes and maximal 794 message size from CADET) 795 796 - MESSENGER adds (optional) confidentiality with ECDHE to exchange and 797 use symmetric encryption, encrypting with both AES-256 and Twofish 798 but allowing only selected members to decrypt (using the receivers 799 ego for ECDHE) 800 801 Also MESSENGER provides multiple features with privacy in mind: 802 803 - MESSENGER allows deleting messages from all peers in the group by the 804 original sender (uses the MESSENGER provided verification) 805 806 - MESSENGER allows using the publicly known anonymous ego instead of 807 any unique identifying ego 808 809 - MESSENGER allows your node to decide between acting as host of the 810 used messaging room (sharing your peer's identity with all nodes in 811 the group) or acting as guest (sharing your peer's identity only with 812 the nodes you explicitly open a connection to) 813 814 - MESSENGER handles members independently of the peer's identity making 815 forwarded messages indistinguishable from directly received ones ( 816 complicating the tracking of messages and identifying its origin) 817 818 - MESSENGER allows names of members being not unique (also names are 819 optional) 820 821 - MESSENGER does not include information about the selected receiver of 822 an explicitly encrypted message in its header, complicating it for 823 other members to draw conclusions from communication partners 824 825 826 827 .. index:: 828 single: subsystem; Network size estimation 829 see: NSE; Network size estimation 830 831 .. _NSE-Subsystem: 832 833 NSE — Network size estimation 834 ============================= 835 836 NSE stands for Network Size Estimation. The NSE subsystem provides other 837 subsystems and users with a rough estimate of the number of peers 838 currently participating in the GNUnet overlay. The computed value is not 839 a precise number as producing a precise number in a decentralized, 840 efficient and secure way is impossible. While NSE's estimate is 841 inherently imprecise, NSE also gives the expected range. For a peer that 842 has been running in a stable network for a while, the real network size 843 will typically (99.7% of the time) be in the range of [2/3 estimate, 3/2 844 estimate]. We will now give an overview of the algorithm used to 845 calculate the estimate; all of the details can be found in this 846 technical report. 847 848 .. todo:: link to the report. 849 850 .. _Motivation: 851 852 Motivation 853 ---------- 854 855 Some subsystems, like DHT, need to know the size of the GNUnet network 856 to optimize some parameters of their own protocol. The decentralized 857 nature of GNUnet makes efficient and securely counting the exact number 858 of peers infeasible. Although there are several decentralized algorithms 859 to count the number of peers in a system, so far there is none to do so 860 securely. Other protocols may allow any malicious peer to manipulate the 861 final result or to take advantage of the system to perform Denial of 862 Service (DoS) attacks against the network. GNUnet's NSE protocol avoids 863 these drawbacks. 864 865 NSE security 866 .. _Security: 867 868 :index:`Security <single: NSE; security>` 869 Security 870 ^^^^^^^^ 871 872 The NSE subsystem is designed to be resilient against these attacks. It 873 uses `proofs of 874 work <http://en.wikipedia.org/wiki/Proof-of-work_system>`__ to prevent 875 one peer from impersonating a large number of participants, which would 876 otherwise allow an adversary to artificially inflate the estimate. The 877 DoS protection comes from the time-based nature of the protocol: the 878 estimates are calculated periodically and out-of-time traffic is either 879 ignored or stored for later retransmission by benign peers. In 880 particular, peers cannot trigger global network communication at will. 881 882 .. _Principle: 883 884 :index:`Principle <single: NSE; principle of operation>` 885 Principle 886 --------- 887 888 The algorithm calculates the estimate by finding the globally closest 889 peer ID to a random, time-based value. 890 891 The idea is that the closer the ID is to the random value, the more 892 \"densely packed\" the ID space is, and therefore, more peers are in the 893 network. 894 895 .. _Example: 896 897 Example 898 ^^^^^^^ 899 900 Suppose all peers have IDs between 0 and 100 (our ID space), and the 901 random value is 42. If the closest peer has the ID 70 we can imagine 902 that the average \"distance\" between peers is around 30 and therefore 903 the are around 3 peers in the whole ID space. On the other hand, if the 904 closest peer has the ID 44, we can imagine that the space is rather 905 packed with peers, maybe as much as 50 of them. Naturally, we could have 906 been rather unlucky, and there is only one peer and happens to have the 907 ID 44. Thus, the current estimate is calculated as the average over 908 multiple rounds, and not just a single sample. 909 910 .. _Algorithm: 911 912 Algorithm 913 ^^^^^^^^^ 914 915 Given that example, one can imagine that the job of the subsystem is to 916 efficiently communicate the ID of the closest peer to the target value 917 to all the other peers, who will calculate the estimate from it. 918 919 .. _Target-value: 920 921 Target value 922 ^^^^^^^^^^^^ 923 924 The target value itself is generated by hashing the current time, 925 rounded down to an agreed value. If the rounding amount is 1h (default) 926 and the time is 12:34:56, the time to hash would be 12:00:00. The 927 process is repeated each rounding amount (in this example would be every 928 hour). Every repetition is called a round. 929 930 .. _Timing: 931 932 Timing 933 ^^^^^^ 934 935 The NSE subsystem has some timing control to avoid everybody 936 broadcasting its ID all at one. Once each peer has the target random 937 value, it compares its own ID to the target and calculates the 938 hypothetical size of the network if that peer were to be the closest. 939 Then it compares the hypothetical size with the estimate from the 940 previous rounds. For each value there is an associated point in the 941 period, let's call it \"broadcast time\". If its own hypothetical 942 estimate is the same as the previous global estimate, its \"broadcast 943 time\" will be in the middle of the round. If its bigger it will be 944 earlier and if its smaller (the most likely case) it will be later. This 945 ensures that the peers closest to the target value start broadcasting 946 their ID the first. 947 948 .. _Controlled-Flooding: 949 950 Controlled Flooding 951 ^^^^^^^^^^^^^^^^^^^ 952 953 When a peer receives a value, first it verifies that it is closer than 954 the closest value it had so far, otherwise it answers the incoming 955 message with a message containing the better value. Then it checks a 956 proof of work that must be included in the incoming message, to ensure 957 that the other peer's ID is not made up (otherwise a malicious peer 958 could claim to have an ID of exactly the target value every round). Once 959 validated, it compares the broadcast time of the received value with the 960 current time and if it's not too early, sends the received value to its 961 neighbors. Otherwise it stores the value until the correct broadcast 962 time comes. This prevents unnecessary traffic of sub-optimal values, 963 since a better value can come before the broadcast time, rendering the 964 previous one obsolete and saving the traffic that would have been used 965 to broadcast it to the neighbors. 966 967 .. _Calculating-the-estimate: 968 969 Calculating the estimate 970 ^^^^^^^^^^^^^^^^^^^^^^^^ 971 972 Once the closest ID has been spread across the network each peer gets 973 the exact distance between this ID and the target value of the round and 974 calculates the estimate with a mathematical formula described in the 975 tech report. The estimate generated with this method for a single round 976 is not very precise. Remember the case of the example, where the only 977 peer is the ID 44 and we happen to generate the target value 42, 978 thinking there are 50 peers in the network. Therefore, the NSE subsystem 979 remembers the last 64 estimates and calculates an average over them, 980 giving a result of which usually has one bit of uncertainty (the real 981 size could be half of the estimate or twice as much). Note that the 982 actual network size is calculated in powers of two of the raw input, 983 thus one bit of uncertainty means a factor of two in the size estimate. 984 985 .. index:: 986 double: subsystem; PEERINFO 987 988 .. _PEERINFO-Subsystem: 989 990 PEERINFO — Persistent HELLO storage 991 =================================== 992 993 The PEERINFO subsystem is used to store verified (validated) information 994 about known peers in a persistent way. It obtains these addresses for 995 example from TRANSPORT service which is in charge of address validation. 996 Validation means that the information in the HELLO message are checked 997 by connecting to the addresses and performing a cryptographic handshake 998 to authenticate the peer instance stating to be reachable with these 999 addresses. Peerinfo does not validate the HELLO messages itself but only 1000 stores them and gives them to interested clients. 1001 1002 As future work, we think about moving from storing just HELLO messages 1003 to providing a generic persistent per-peer information store. More and 1004 more subsystems tend to need to store per-peer information in persistent 1005 way. To not duplicate this functionality we plan to provide a PEERSTORE 1006 service providing this functionality. 1007 1008 .. _PEERINFO-_002d-Features: 1009 1010 PEERINFO - Features 1011 ------------------- 1012 1013 - Persistent storage 1014 1015 - Client notification mechanism on update 1016 1017 - Periodic clean up for expired information 1018 1019 - Differentiation between public and friend-only HELLO 1020 1021 .. _PEERINFO-_002d-Limitations: 1022 1023 PEERINFO - Limitations 1024 ---------------------- 1025 1026 - Does not perform HELLO validation 1027 1028 .. _DeveloperPeer-Information: 1029 1030 DeveloperPeer Information 1031 ------------------------- 1032 1033 The PEERINFO subsystem stores these information in the form of HELLO 1034 messages you can think of as business cards. These HELLO messages 1035 contain the public key of a peer and the addresses a peer can be reached 1036 under. The addresses include an expiration date describing how long they 1037 are valid. This information is updated regularly by the TRANSPORT 1038 service by revalidating the address. If an address is expired and not 1039 renewed, it can be removed from the HELLO message. 1040 1041 Some peer do not want to have their HELLO messages distributed to other 1042 peers, especially when GNUnet's friend-to-friend modus is enabled. To 1043 prevent this undesired distribution. PEERINFO distinguishes between 1044 *public* and *friend-only* HELLO messages. Public HELLO messages can be 1045 freely distributed to other (possibly unknown) peers (for example using 1046 the hostlist, gossiping, broadcasting), whereas friend-only HELLO 1047 messages may not be distributed to other peers. Friend-only HELLO 1048 messages have an additional flag ``friend_only`` set internally. For 1049 public HELLO message this flag is not set. PEERINFO does and cannot not 1050 check if a client is allowed to obtain a specific HELLO type. 1051 1052 The HELLO messages can be managed using the GNUnet HELLO library. Other 1053 GNUnet systems can obtain these information from PEERINFO and use it for 1054 their purposes. Clients are for example the HOSTLIST component providing 1055 these information to other peers in form of a hostlist or the TRANSPORT 1056 subsystem using these information to maintain connections to other 1057 peers. 1058 1059 .. _Startup: 1060 1061 Startup 1062 ------- 1063 1064 During startup the PEERINFO services loads persistent HELLOs from disk. 1065 First PEERINFO parses the directory configured in the HOSTS value of the 1066 ``PEERINFO`` configuration section to store PEERINFO information. For 1067 all files found in this directory valid HELLO messages are extracted. In 1068 addition it loads HELLO messages shipped with the GNUnet distribution. 1069 These HELLOs are used to simplify network bootstrapping by providing 1070 valid peer information with the distribution. The use of these HELLOs 1071 can be prevented by setting the ``USE_INCLUDED_HELLOS`` in the 1072 ``PEERINFO`` configuration section to ``NO``. Files containing invalid 1073 information are removed. 1074 1075 .. _Managing-Information: 1076 1077 Managing Information 1078 -------------------- 1079 1080 The PEERINFO services stores information about known PEERS and a single 1081 HELLO message for every peer. A peer does not need to have a HELLO if no 1082 information are available. HELLO information from different sources, for 1083 example a HELLO obtained from a remote HOSTLIST and a second HELLO 1084 stored on disk, are combined and merged into one single HELLO message 1085 per peer which will be given to clients. During this merge process the 1086 HELLO is immediately written to disk to ensure persistence. 1087 1088 PEERINFO in addition periodically scans the directory where information 1089 are stored for empty HELLO messages with expired TRANSPORT addresses. 1090 This periodic task scans all files in the directory and recreates the 1091 HELLO messages it finds. Expired TRANSPORT addresses are removed from 1092 the HELLO and if the HELLO does not contain any valid addresses, it is 1093 discarded and removed from the disk. 1094 1095 .. _Obtaining-Information: 1096 1097 Obtaining Information 1098 --------------------- 1099 1100 When a client requests information from PEERINFO, PEERINFO performs a 1101 lookup for the respective peer or all peers if desired and transmits 1102 this information to the client. The client can specify if friend-only 1103 HELLOs have to be included or not and PEERINFO filters the respective 1104 HELLO messages before transmitting information. 1105 1106 To notify clients about changes to PEERINFO information, PEERINFO 1107 maintains a list of clients interested in this notifications. Such a 1108 notification occurs if a HELLO for a peer was updated (due to a merge 1109 for example) or a new peer was added. 1110 1111 .. index:: 1112 double: subsystem; PEERSTORE 1113 1114 .. _PEERSTORE-Subsystem: 1115 1116 PEERSTORE — Extensible local persistent data storage 1117 ==================================================== 1118 1119 GNUnet's PEERSTORE subsystem offers persistent per-peer storage for 1120 other GNUnet subsystems. GNUnet subsystems can use PEERSTORE to 1121 persistently store and retrieve arbitrary data. Each data record stored 1122 with PEERSTORE contains the following fields: 1123 1124 - subsystem: Name of the subsystem responsible for the record. 1125 1126 - peerid: Identity of the peer this record is related to. 1127 1128 - key: a key string identifying the record. 1129 1130 - value: binary record value. 1131 1132 - expiry: record expiry date. 1133 1134 .. _Functionality: 1135 1136 Functionality 1137 ------------- 1138 1139 Subsystems can store any type of value under a (subsystem, peerid, key) 1140 combination. A \"replace\" flag set during store operations forces the 1141 PEERSTORE to replace any old values stored under the same (subsystem, 1142 peerid, key) combination with the new value. Additionally, an expiry 1143 date is set after which the record is \*possibly\* deleted by PEERSTORE. 1144 1145 Subsystems can iterate over all values stored under any of the following 1146 combination of fields: 1147 1148 - (subsystem) 1149 1150 - (subsystem, peerid) 1151 1152 - (subsystem, key) 1153 1154 - (subsystem, peerid, key) 1155 1156 Subsystems can also request to be notified about any new values stored 1157 under a (subsystem, peerid, key) combination by sending a \"watch\" 1158 request to PEERSTORE. 1159 1160 .. _Architecture: 1161 1162 Architecture 1163 ------------ 1164 1165 PEERSTORE implements the following components: 1166 1167 - PEERSTORE service: Handles store, iterate and watch operations. 1168 1169 - PEERSTORE API: API to be used by other subsystems to communicate and 1170 issue commands to the PEERSTORE service. 1171 1172 - PEERSTORE plugins: Handles the persistent storage. At the moment, 1173 only an \"sqlite\" plugin is implemented. 1174 1175 .. index:: 1176 double: subsystem; REGEX 1177 1178 .. _REGEX-Subsystem: 1179 1180 REGEX — Service discovery using regular expressions 1181 =================================================== 1182 1183 Using the REGEX subsystem, you can discover peers that offer a 1184 particular service using regular expressions. The peers that offer a 1185 service specify it using a regular expressions. Peers that want to 1186 patronize a service search using a string. The REGEX subsystem will then 1187 use the DHT to return a set of matching offerers to the patrons. 1188 1189 For the technical details, we have Max's defense talk and Max's Master's 1190 thesis. 1191 1192 .. note:: An additional publication is under preparation and available 1193 to team members (in Git). 1194 1195 .. todo:: Missing links to Max's talk and Master's thesis 1196 1197 .. _How-to-run-the-regex-profiler: 1198 1199 How to run the regex profiler 1200 ----------------------------- 1201 1202 The gnunet-regex-profiler can be used to profile the usage of mesh/regex 1203 for a given set of regular expressions and strings. Mesh/regex allows 1204 you to announce your peer ID under a certain regex and search for peers 1205 matching a particular regex using a string. See 1206 `szengel2012ms <https://bib.gnunet.org/full/date.html#2012_5f2>`__ for a 1207 full introduction. 1208 1209 First of all, the regex profiler uses GNUnet testbed, thus all the 1210 implications for testbed also apply to the regex profiler (for example 1211 you need password-less ssh login to the machines listed in your hosts 1212 file). 1213 1214 **Configuration** 1215 1216 Moreover, an appropriate configuration file is needed. In the following 1217 paragraph the important details are highlighted. 1218 1219 Announcing of the regular expressions is done by the 1220 gnunet-daemon-regexprofiler, therefore you have to make sure it is 1221 started, by adding it to the START_ON_DEMAND set of ARM: 1222 1223 :: 1224 1225 [regexprofiler] 1226 START_ON_DEMAND = YES 1227 1228 Furthermore you have to specify the location of the binary: 1229 1230 :: 1231 1232 [regexprofiler] 1233 # Location of the gnunet-daemon-regexprofiler binary. 1234 BINARY = /home/szengel/gnunet/src/mesh/.libs/gnunet-daemon-regexprofiler 1235 # Regex prefix that will be applied to all regular expressions and 1236 # search string. 1237 REGEX_PREFIX = "GNVPN-0001-PAD" 1238 1239 When running the profiler with a large scale deployment, you probably 1240 want to reduce the workload of each peer. Use the following options to 1241 do this. 1242 1243 :: 1244 1245 [dht] 1246 # Force network size estimation 1247 FORCE_NSE = 1 1248 1249 [dhtcache] 1250 DATABASE = heap 1251 # Disable RC-file for Bloom filter? (for benchmarking with limited IO 1252 # availability) 1253 DISABLE_BF_RC = YES 1254 # Disable Bloom filter entirely 1255 DISABLE_BF = YES 1256 1257 [nse] 1258 # Minimize proof-of-work CPU consumption by NSE 1259 WORKBITS = 1 1260 1261 **Options** 1262 1263 To finally run the profiler some options and the input data need to be 1264 specified on the command line. 1265 1266 :: 1267 1268 gnunet-regex-profiler -c config-file -d log-file -n num-links \ 1269 -p path-compression-length -s search-delay -t matching-timeout \ 1270 -a num-search-strings hosts-file policy-dir search-strings-file 1271 1272 Where\... 1273 1274 - \... ``config-file`` means the configuration file created earlier. 1275 1276 - \... ``log-file`` is the file where to write statistics output. 1277 1278 - \... ``num-links`` indicates the number of random links between 1279 started peers. 1280 1281 - \... ``path-compression-length`` is the maximum path compression 1282 length in the DFA. 1283 1284 - \... ``search-delay`` time to wait between peers finished linking and 1285 starting to match strings. 1286 1287 - \... ``matching-timeout`` timeout after which to cancel the 1288 searching. 1289 1290 - \... ``num-search-strings`` number of strings in the 1291 search-strings-file. 1292 1293 - \... the ``hosts-file`` should contain a list of hosts for the 1294 testbed, one per line in the following format: 1295 1296 - ``user@host_ip:port`` 1297 1298 - \... the ``policy-dir`` is a folder containing text files containing 1299 one or more regular expressions. A peer is started for each file in 1300 that folder and the regular expressions in the corresponding file are 1301 announced by this peer. 1302 1303 - \... the ``search-strings-file`` is a text file containing search 1304 strings, one in each line. 1305 1306 You can create regular expressions and search strings for every AS in 1307 the Internet using the attached scripts. You need one of the `CAIDA 1308 routeviews 1309 prefix2as <http://data.caida.org/datasets/routing/routeviews-prefix2as/>`__ 1310 data files for this. Run 1311 1312 :: 1313 1314 create_regex.py <filename> <output path> 1315 1316 to create the regular expressions and 1317 1318 :: 1319 1320 create_strings.py <input path> <outfile> 1321 1322 to create a search strings file from the previously created regular 1323 expressions. 1324 1325 1326 1327 .. index:: 1328 double: subsystem; REST 1329 1330 .. _REST-Subsystem: 1331 1332 REST — RESTful GNUnet Web APIs 1333 ============================== 1334 1335 .. todo:: Define REST 1336 1337 Using the REST subsystem, you can expose REST-based APIs or services. 1338 The REST service is designed as a pluggable architecture. 1339 1340 **Configuration** 1341 1342 The REST service can be configured in various ways. The reference config 1343 file can be found in ``src/rest/rest.conf``: 1344 1345 :: 1346 1347 [rest] 1348 REST_PORT=7776 1349 REST_ALLOW_HEADERS=Authorization,Accept,Content-Type 1350 REST_ALLOW_ORIGIN=* 1351 REST_ALLOW_CREDENTIALS=true 1352 1353 The port as well as CORS (cross-origin resource sharing) headers 1354 that are supposed to be advertised by the rest service are configurable. 1355 1356 .. index:: 1357 double: subsystem; REVOCATION 1358 1359 .. _REVOCATION-Subsystem: 1360 1361 REVOCATION — Ego key revocation 1362 =============================== 1363 1364 The REVOCATION subsystem is responsible for key revocation of Egos. If a 1365 user learns that their private key has been compromised or has lost it, 1366 they can use the REVOCATION system to inform all of the other users that 1367 their private key is no longer valid. The subsystem thus includes ways 1368 to query for the validity of keys and to propagate revocation messages. 1369 1370 .. _Dissemination: 1371 1372 Dissemination 1373 ------------- 1374 1375 When a revocation is performed, the revocation is first of all 1376 disseminated by flooding the overlay network. The goal is to reach every 1377 peer, so that when a peer needs to check if a key has been revoked, this 1378 will be purely a local operation where the peer looks at its local 1379 revocation list. Flooding the network is also the most robust form of 1380 key revocation --- an adversary would have to control a separator of the 1381 overlay graph to restrict the propagation of the revocation message. 1382 Flooding is also very easy to implement --- peers that receive a 1383 revocation message for a key that they have never seen before simply 1384 pass the message to all of their neighbours. 1385 1386 Flooding can only distribute the revocation message to peers that are 1387 online. In order to notify peers that join the network later, the 1388 revocation service performs efficient set reconciliation over the sets 1389 of known revocation messages whenever two peers (that both support 1390 REVOCATION dissemination) connect. The SET service is used to perform 1391 this operation efficiently. 1392 1393 .. _Revocation-Message-Design-Requirements: 1394 1395 Revocation Message Design Requirements 1396 -------------------------------------- 1397 1398 However, flooding is also quite costly, creating O(\|E\|) messages on a 1399 network with \|E\| edges. Thus, revocation messages are required to 1400 contain a proof-of-work, the result of an expensive computation (which, 1401 however, is cheap to verify). Only peers that have expended the CPU time 1402 necessary to provide this proof will be able to flood the network with 1403 the revocation message. This ensures that an attacker cannot simply 1404 flood the network with millions of revocation messages. The 1405 proof-of-work required by GNUnet is set to take days on a typical PC to 1406 compute; if the ability to quickly revoke a key is needed, users have 1407 the option to pre-compute revocation messages to store off-line and use 1408 instantly after their key has expired. 1409 1410 Revocation messages must also be signed by the private key that is being 1411 revoked. Thus, they can only be created while the private key is in the 1412 possession of the respective user. This is another reason to create a 1413 revocation message ahead of time and store it in a secure location. 1414 1415 .. index:: 1416 double: subsystems; Random peer sampling 1417 see: RPS; Random peer sampling 1418 1419 .. _RPS-Subsystem: 1420 1421 RPS — Random peer sampling 1422 ========================== 1423 1424 In literature, Random Peer Sampling (RPS) refers to the problem of 1425 reliably [1]_ drawing random samples from an unstructured p2p network. 1426 1427 Doing so in a reliable manner is not only hard because of inherent 1428 problems but also because of possible malicious peers that could try to 1429 bias the selection. 1430 1431 It is useful for all kind of gossip protocols that require the selection 1432 of random peers in the whole network like gathering statistics, 1433 spreading and aggregating information in the network, load balancing and 1434 overlay topology management. 1435 1436 The approach chosen in the RPS service implementation in GNUnet follows 1437 the `Brahms <https://bib.gnunet.org/full/date.html\#2009_5f0>`__ design. 1438 1439 The current state is \"work in progress\". There are a lot of things 1440 that need to be done, primarily finishing the experimental evaluation 1441 and a re-design of the API. 1442 1443 The abstract idea is to subscribe to connect to/start the RPS service 1444 and request random peers that will be returned when they represent a 1445 random selection from the whole network with high probability. 1446 1447 An additional feature to the original Brahms-design is the selection of 1448 sub-groups: The GNUnet implementation of RPS enables clients to ask for 1449 random peers from a group that is defined by a common shared secret. 1450 (The secret could of course also be public, depending on the use-case.) 1451 1452 Another addition to the original protocol was made: The sampler 1453 mechanism that was introduced in Brahms was slightly adapted and used to 1454 actually sample the peers and returned to the client. This is necessary 1455 as the original design only keeps peers connected to random other peers 1456 in the network. In order to return random peers to client requests 1457 independently random, they cannot be drawn from the connected peers. The 1458 adapted sampler makes sure that each request for random peers is 1459 independent from the others. 1460 1461 .. _Brahms: 1462 1463 Brahms 1464 ------ 1465 1466 The high-level concept of Brahms is two-fold: Combining push-pull gossip 1467 with locally fixing a assumed bias using cryptographic min-wise 1468 permutations. The central data structure is the view - a peer's current 1469 local sample. This view is used to select peers to push to and pull 1470 from. This simple mechanism can be biased easily. For this reason Brahms 1471 'fixes' the bias by using the so-called sampler. A data structure that 1472 takes a list of elements as input and outputs a random one of them 1473 independently of the frequency in the input set. Both an element that 1474 was put into the sampler a single time and an element that was put into 1475 it a million times have the same probability of being the output. This 1476 is achieved with exploiting min-wise independent permutations. In the 1477 RPS service we use HMACs: On the initialisation of a sampler element, a 1478 key is chosen at random. On each input the HMAC with the random key is 1479 computed. The sampler element keeps the element with the minimal HMAC. 1480 1481 In order to fix the bias in the view, a fraction of the elements in the 1482 view are sampled through the sampler from the random stream of peer IDs. 1483 1484 According to the theoretical analysis of Bortnikov et al. this suffices 1485 to keep the network connected and having random peers in the view. 1486 1487 .. [1] 1488 \"Reliable\" in this context means having no bias, neither spatial, 1489 nor temporal, nor through malicious activity. 1490 1491 .. index:: 1492 double: STATISTICS; subsystem 1493 1494 .. _STATISTICS-Subsystem: 1495 1496 STATISTICS — Runtime statistics publication 1497 =========================================== 1498 1499 In GNUnet, the STATISTICS subsystem offers a central place for all 1500 subsystems to publish unsigned 64-bit integer run-time statistics. 1501 Keeping this information centrally means that there is a unified way for 1502 the user to obtain data on all subsystems, and individual subsystems do 1503 not have to always include a custom data export method for performance 1504 metrics and other statistics. For example, the TRANSPORT system uses 1505 STATISTICS to update information about the number of directly connected 1506 peers and the bandwidth that has been consumed by the various plugins. 1507 This information is valuable for diagnosing connectivity and performance 1508 issues. 1509 1510 Following the GNUnet service architecture, the STATISTICS subsystem is 1511 divided into an API which is exposed through the header 1512 **gnunet_statistics_service.h** and the STATISTICS service 1513 **gnunet-service-statistics**. The **gnunet-statistics** command-line 1514 tool can be used to obtain (and change) information about the values 1515 stored by the STATISTICS service. The STATISTICS service does not 1516 communicate with other peers. 1517 1518 Data is stored in the STATISTICS service in the form of tuples 1519 **(subsystem, name, value, persistence)**. The subsystem determines to 1520 which other GNUnet's subsystem the data belongs. name is the name 1521 through which value is associated. It uniquely identifies the record 1522 from among other records belonging to the same subsystem. In some parts 1523 of the code, the pair **(subsystem, name)** is called a **statistic** as 1524 it identifies the values stored in the STATISTCS service.The persistence 1525 flag determines if the record has to be preserved across service 1526 restarts. A record is said to be persistent if this flag is set for it; 1527 if not, the record is treated as a non-persistent record and it is lost 1528 after service restart. Persistent records are written to and read from 1529 the file **statistics.data** before shutdown and upon startup. The file 1530 is located in the HOME directory of the peer. 1531 1532 An anomaly of the STATISTICS service is that it does not terminate 1533 immediately upon receiving a shutdown signal if it has any clients 1534 connected to it. It waits for all the clients that are not monitors to 1535 close their connections before terminating itself. This is to prevent 1536 the loss of data during peer shutdown — delaying the STATISTICS 1537 service shutdown helps other services to store important data to 1538 STATISTICS during shutdown. 1539 1540 .. index:: 1541 double: TRANSPORT Next Generation; subsystem 1542 1543 .. _TRANSPORT_002dNG-Subsystem: 1544 1545 TRANSPORT-NG — Next-generation transport management 1546 =================================================== 1547 1548 The current GNUnet TRANSPORT architecture is rooted in the GNUnet 0.4 1549 design of using plugins for the actual transmission operations and the 1550 ATS subsystem to select a plugin and allocate bandwidth. The following 1551 key issues have been identified with this design: 1552 1553 - Bugs in one plugin can affect the TRANSPORT service and other 1554 plugins. There is at least one open bug that affects sockets, where 1555 the origin is difficult to pinpoint due to the large code base. 1556 1557 - Relevant operating system default configurations often impose a limit 1558 of 1024 file descriptors per process. Thus, one plugin may impact 1559 other plugin's connectivity choices. 1560 1561 - Plugins are required to offer bi-directional connectivity. However, 1562 firewalls (incl. NAT boxes) and physical environments sometimes only 1563 allow uni-directional connectivity, which then currently cannot be 1564 utilized at all. 1565 1566 - Distance vector routing was implemented in 209 but shortly afterwards 1567 broken and due to the complexity of implementing it as a plugin and 1568 dealing with the resource allocation consequences was never useful. 1569 1570 - Most existing plugins communicate completely using cleartext, 1571 exposing metad data (message size) and making it easy to fingerprint 1572 and possibly block GNUnet traffic. 1573 1574 - Various NAT traversal methods are not supported. 1575 1576 - The service logic is cluttered with \"manipulation\" support code for 1577 TESTBED to enable faking network characteristics like lossy 1578 connections or firewewalls. 1579 1580 - Bandwidth allocation is done in ATS, requiring the duplication of 1581 state and resulting in much delayed allocation decisions. As a 1582 result, often available bandwidth goes unused. Users are expected to 1583 manually configure bandwidth limits, instead of TRANSPORT using 1584 congestion control to adapt automatically. 1585 1586 - TRANSPORT is difficult to test and has bad test coverage. 1587 1588 - HELLOs include an absolute expiration time. Nodes with unsynchronized 1589 clocks cannot connect. 1590 1591 - Displaying the contents of a HELLO requires the respective plugin as 1592 the plugin-specific data is encoded in binary. This also complicates 1593 logging. 1594 1595 .. _Design-goals-of-TNG: 1596 1597 Design goals of TNG 1598 ------------------- 1599 1600 In order to address the above issues, we want to: 1601 1602 - Move plugins into separate processes which we shall call 1603 *communicators*. Communicators connect as clients to the transport 1604 service. 1605 1606 - TRANSPORT should be able to utilize any number of communicators to the 1607 same peer at the same time. 1608 1609 - TRANSPORT should be responsible for fragmentation, retransmission, 1610 flow- and congestion-control. Users should no longer have to 1611 configure bandwidth limits: TRANSPORT should detect what is available 1612 and use it. 1613 1614 - Communicators should be allowed to be uni-directional and 1615 unreliable. TRANSPORT shall create bi-directional channels from this 1616 whenever possible. 1617 1618 - DV should no longer be a plugin, but part of TRANSPORT. 1619 1620 - TRANSPORT should provide communicators help communicating, for 1621 example in the case of uni-directional communicators or the need for 1622 out-of-band signalling for NAT traversal. We call this functionality 1623 *backchannels*. 1624 1625 - Transport manipulation should be signalled to CORE on a per-message 1626 basis instead of an approximate bandwidth. 1627 1628 - CORE should signal performance requirements (reliability, latency, 1629 etc.) on a per-message basis to TRANSPORT. If possible, TRANSPORT 1630 should consider those options when scheduling messages for 1631 transmission. 1632 1633 - HELLOs should be in a human-readable format with monotonic time 1634 expirations. 1635 1636 The new architecture is planned as follows: 1637 1638 .. image:: /images/tng.png 1639 1640 TRANSPORT's main objective is to establish bi-directional virtual links 1641 using a variety of possibly uni-directional communicators. Links undergo 1642 the following steps: 1643 1644 1. Communicator informs TRANSPORT A that a queue (direct neighbour) is 1645 available, or equivalently TRANSPORT A discovers a (DV) path to a 1646 target B. 1647 1648 2. TRANSPORT A sends a challenge to the target peer, trying to confirm 1649 that the peer can receive. FIXME: This is not implemented properly 1650 for DV. Here we should really take a validated DVH and send a 1651 challenge exactly down that path! 1652 1653 3. The other TRANSPORT, TRANSPORT B, receives the challenge, and sends 1654 back a response, possibly using a dierent path. If TRANSPORT B does 1655 not yet have a virtual link to A, it must try to establish a virtual 1656 link. 1657 1658 4. Upon receiving the response, TRANSPORT A creates the virtual link. If 1659 the response included a challenge, TRANSPORT A must respond to this 1660 challenge as well, eectively re-creating the TCP 3-way handshake 1661 (just with longer challenge values). 1662 1663 .. _HELLO_002dNG: 1664 1665 HELLO-NG 1666 -------- 1667 1668 HELLOs change in three ways. First of all, communicators encode the 1669 respective addresses in a human-readable URL-like string. This way, we 1670 do no longer require the communicator to print the contents of a HELLO. 1671 Second, HELLOs no longer contain an expiration time, only a creation 1672 time. The receiver must only compare the respective absolute values. So 1673 given a HELLO from the same sender with a larger creation time, then the 1674 old one is no longer valid. This also obsoletes the need for the 1675 gnunet-hello binary to set HELLO expiration times to never. Third, a 1676 peer no longer generates one big HELLO that always contains all of the 1677 addresses. Instead, each address is signed individually and shared only 1678 over the address scopes where it makes sense to share the address. In 1679 particular, care should be taken to not share MACs across the Internet 1680 and confine their use to the LAN. As each address is signed separately, 1681 having multiple addresses valid at the same time (given the new creation 1682 time expiration logic) requires that those addresses must have exactly 1683 the same creation time. Whenever that monotonic time is increased, all 1684 addresses must be re-signed and re-distributed. 1685 1686 .. _Priorities-and-preferences: 1687 1688 Priorities and preferences 1689 -------------------------- 1690 1691 In the new design, TRANSPORT adopts a feature (which was previously 1692 already available in CORE) of the MQ API to allow applications to 1693 specify priorities and preferences per message (or rather, per MQ 1694 envelope). The (updated) MQ API allows applications to specify one of 1695 four priority levels as well as desired preferences for transmission by 1696 setting options on an envelope. These preferences currently are: 1697 1698 - GNUNET_MQ_PREF_UNRELIABLE: Disables TRANSPORT waiting for ACKS on 1699 unreliable channels like UDP. Now it is fire and forget. These 1700 messages then cannot be used for RTT estimates either. 1701 1702 - GNUNET_MQ_PREF_LOW_LATENCY: Directs TRANSPORT to select the 1703 lowest-latency transmission choices possible. 1704 1705 - GNUNET_MQ_PREF_CORK_ALLOWED: Allows TRANSPORT to delay transmission 1706 to group the message with other messages into a larger batch to 1707 reduce the number of packets sent. 1708 1709 - GNUNET_MQ_PREF_GOODPUT: Directs TRANSPORT to select the highest 1710 goodput channel available. 1711 1712 - GNUNET_MQ_PREF_OUT_OF_ORDER: Allows TRANSPORT to reorder the messages 1713 as it sees fit, otherwise TRANSPORT should attempt to preserve 1714 transmission order. 1715 1716 Each MQ envelope is always able to store those options (and the 1717 priority), and in the future this uniform API will be used by TRANSPORT, 1718 CORE, CADET and possibly other subsystems that send messages (like 1719 LAKE). When CORE sets preferences and priorities, it is supposed to 1720 respect the preferences and priorities it is given from higher layers. 1721 Similarly, CADET also simply passes on the preferences and priorities of 1722 the layer above CADET. When a layer combines multiple smaller messages 1723 into one larger transmission, the ``GNUNET_MQ_env_combine_options()`` 1724 should be used to calculate options for the combined message. We note 1725 that the exact semantics of the options may differ by layer. For 1726 example, CADET will always strictly implement reliable and in-order 1727 delivery of messages, while the same options are only advisory for 1728 TRANSPORT and CORE: they should try (using ACKs on unreliable 1729 communicators, not changing the message order themselves), but if 1730 messages are lost anyway (e.g. because a TCP is dropped in the middle), 1731 or if messages are reordered (e.g. because they took different paths 1732 over the network and arrived in a different order) TRANSPORT and CORE do 1733 not have to correct this. Whether a preference is strict or loose is 1734 thus dened by the respective layer. 1735 1736 .. _Communicators: 1737 1738 Communicators 1739 ------------- 1740 1741 The API for communicators is defined in 1742 ``gnunet_transport_communication_service.h``. Each communicator must 1743 specify its (global) communication characteristics, which for now only 1744 say whether the communication is reliable (e.g. TCP, HTTPS) or 1745 unreliable (e.g. UDP, WLAN). Each communicator must specify a unique 1746 address prex, or NULL if the communicator cannot establish outgoing 1747 connections (for example because it is only acting as a TCP server). A 1748 communicator must tell TRANSPORT which addresses it is reachable under. 1749 Addresses may be added or removed at any time. A communicator may have 1750 zero addresses (transmission only). Addresses do not have to match the 1751 address prefix. 1752 1753 TRANSPORT may ask a communicator to try to connect to another address. 1754 TRANSPORT will only ask for connections where the address matches the 1755 communicator's address prefix that was provided when the connection was 1756 established. Communicators should then attempt to establish a 1757 connection. 1758 It is under the discretion of the communicator whether to honor this request. 1759 Reasons for not honoring such a request may be that an existing connection exists 1760 or resource limitations. 1761 No response is provided to TRANSPORT service on failure. 1762 The TRANSPORT service has to ask the communicator explicitly to retry. 1763 1764 If a communicator succeeds in establishing an outgoing connection for 1765 transmission, or if a communicator receives an incoming bi-directional 1766 connection, the communicator must inform the TRANSPORT service that a 1767 message queue (MQ) for transmission is now available. 1768 For that MQ, the communicator must provide the peer identity claimed by the other end. 1769 It must also provide a human-readable address (for debugging) and a maximum transfer unit 1770 (MTU). A MTU of zero means sending is not supported, SIZE_MAX should be 1771 used for no MTU. The communicator should also tell TRANSPORT what 1772 network type is used for the queue. The communicator may tell TRANSPORT 1773 anytime that the queue was deleted and is no longer available. 1774 1775 The communicator API also provides for flow control. First, 1776 communicators exhibit back-pressure on TRANSPORT: the number of messages 1777 TRANSPORT may add to a queue for transmission will be limited. So by not 1778 draining the transmission queue, back-pressure is provided to TRANSPORT. 1779 In the other direction, communicators may allow TRANSPORT to give 1780 back-pressure towards the communicator by providing a non-NULL 1781 ``GNUNET_TRANSPORT_MessageCompletedCallback`` argument to the 1782 ``GNUNET_TRANSPORT_communicator_receive`` function. In this case, 1783 TRANSPORT will only invoke this function once it has processed the 1784 message and is ready to receive more. Communicators should then limit 1785 how much traffic they receive based on this backpressure. Note that 1786 communicators do not have to provide a 1787 ``GNUNET_TRANSPORT_MessageCompletedCallback``; for example, UDP cannot 1788 support back-pressure due to the nature of the UDP protocol. In this 1789 case, TRANSPORT will implement its own TRANSPORT-to-TRANSPORT flow 1790 control to reduce the sender's data rate to acceptable levels. 1791 1792 TRANSPORT may notify a communicator about backchannel messages TRANSPORT 1793 received from other peers for this communicator. Similarly, 1794 communicators can ask TRANSPORT to try to send a backchannel message to 1795 other communicators of other peers. The semantics of the backchannel 1796 message are up to the communicators which use them. TRANSPORT may fail 1797 transmitting backchannel messages, and TRANSPORT will not attempt to 1798 retransmit them. 1799 1800 UDP communicator 1801 ^^^^^^^^^^^^^^^^ 1802 1803 The UDP communicator implements a basic encryption layer to protect from 1804 metadata leakage. 1805 The layer tries to establish a shared secret using an Elliptic-Curve Diffie-Hellman 1806 key exchange in which the initiator of a packet creates an ephemeral key pair 1807 to encrypt a message for the target peer identity. 1808 The communicator always offers this kind of transmission queue to a (reachable) 1809 peer in which messages are encrypted with dedicated keys. 1810 The performance of this queue is not suitable for high volume data transfer. 1811 1812 If the UDP connection is bi-directional, or the TRANSPORT is able to offer a 1813 backchannel connection, the resulting key can be re-used if the recieving peer 1814 is able to ACK the reception. 1815 This will cause the communicator to offer a new queue (with a higher priority 1816 than the default queue) to TRANSPORT with a limited capacity. 1817 The capacity is increased whenever the communicator receives an ACK for a 1818 transmission. 1819 This queue is suitable for high-volume data transfer and TRANSPORT will likely 1820 prioritize this queue (if available). 1821 1822 Communicators that try to establish a connection to a target peer authenticate 1823 their peer ID (public key) in the first packets by signing a monotonic time 1824 stamp, its peer ID, and the target peerID and send this data as well as the signature 1825 in one of the first packets. 1826 Receivers should keep track (persist) of the monotonic time stamps for each 1827 peer ID to reject possible replay attacks. 1828 1829 FIXME: Handshake wire format? KX, Flow. 1830 1831 TCP communicator 1832 ^^^^^^^^^^^^^^^^ 1833 1834 FIXME: Handshake wire format? KX, Flow. 1835 1836 QUIC communicator 1837 ^^^^^^^^^^^^^^^^^ 1838 The QUIC communicator runs over a bi-directional UDP connection. 1839 TLS layer with self-signed certificates (binding/signed with peer ID?). 1840 Single, bi-directional stream? 1841 FIXME: Handshake wire format? KX, Flow. 1842 1843 .. index:: 1844 double: TRANSPORT; subsystem 1845 1846 .. _TRANSPORT-Subsystem: 1847 1848 TRANSPORT — Overlay transport management 1849 ======================================== 1850 1851 This chapter documents how the GNUnet transport subsystem works. The 1852 GNUnet transport subsystem consists of three main components: the 1853 transport API (the interface used by the rest of the system to access 1854 the transport service), the transport service itself (most of the 1855 interesting functions, such as choosing transports, happens here) and 1856 the transport plugins. A transport plugin is a concrete implementation 1857 for how two GNUnet peers communicate; many plugins exist, for example 1858 for communication via TCP, UDP, HTTP, HTTPS and others. Finally, the 1859 transport subsystem uses supporting code, especially the NAT/UPnP 1860 library to help with tasks such as NAT traversal. 1861 1862 Key tasks of the transport service include: 1863 1864 - Create our HELLO message, notify clients and neighbours if our HELLO 1865 changes (using NAT library as necessary) 1866 1867 - Validate HELLOs from other peers (send PING), allow other peers to 1868 validate our HELLO's addresses (send PONG) 1869 1870 - Upon request, establish connections to other peers (using address 1871 selection from ATS subsystem) and maintain them (again using PINGs 1872 and PONGs) as long as desired 1873 1874 - Accept incoming connections, give ATS service the opportunity to 1875 switch communication channels 1876 1877 - Notify clients about peers that have connected to us or that have 1878 been disconnected from us 1879 1880 - If a (stateful) connection goes down unexpectedly (without explicit 1881 DISCONNECT), quickly attempt to recover (without notifying clients) 1882 but do notify clients quickly if reconnecting fails 1883 1884 - Send (payload) messages arriving from clients to other peers via 1885 transport plugins and receive messages from other peers, forwarding 1886 those to clients 1887 1888 - Enforce inbound traffic limits (using flow-control if it is 1889 applicable); outbound traffic limits are enforced by CORE, not by us 1890 (!) 1891 1892 - Enforce restrictions on P2P connection as specified by the blacklist 1893 configuration and blacklisting clients 1894 1895 Note that the term \"clients\" in the list above really refers to the 1896 GNUnet-CORE service, as CORE is typically the only client of the 1897 transport service. 1898 1899 .. index:: 1900 double: subsystem; SET 1901 1902 .. _SET-Subsystem: 1903 1904 SET — Peer to peer set operations (Deprecated) 1905 ============================================== 1906 1907 .. note:: 1908 1909 The SET subsystem is in process of being replaced by the SETU and SETI 1910 subsystems, which provide basically the same functionality, just using 1911 two different subsystems. SETI and SETU should be used for new code. 1912 1913 The SET service implements efficient set operations between two peers 1914 over a CADET tunnel. Currently, set union and set intersection are the 1915 only supported operations. Elements of a set consist of an *element 1916 type* and arbitrary binary *data*. The size of an element's data is 1917 limited to around 62 KB. 1918 1919 .. _Local-Sets: 1920 1921 Local Sets 1922 ---------- 1923 1924 Sets created by a local client can be modified and reused for multiple 1925 operations. As each set operation requires potentially expensive special 1926 auxiliary data to be computed for each element of a set, a set can only 1927 participate in one type of set operation (either union or intersection). 1928 The type of a set is determined upon its creation. If a the elements of 1929 a set are needed for an operation of a different type, all of the set's 1930 element must be copied to a new set of appropriate type. 1931 1932 .. _Set-Modifications: 1933 1934 Set Modifications 1935 ----------------- 1936 1937 Even when set operations are active, one can add to and remove elements 1938 from a set. However, these changes will only be visible to operations 1939 that have been created after the changes have taken place. That is, 1940 every set operation only sees a snapshot of the set from the time the 1941 operation was started. This mechanism is *not* implemented by copying 1942 the whole set, but by attaching *generation information* to each element 1943 and operation. 1944 1945 .. _Set-Operations: 1946 1947 Set Operations 1948 -------------- 1949 1950 Set operations can be started in two ways: Either by accepting an 1951 operation request from a remote peer, or by requesting a set operation 1952 from a remote peer. Set operations are uniquely identified by the 1953 involved *peers*, an *application id* and the *operation type*. 1954 1955 The client is notified of incoming set operations by *set listeners*. A 1956 set listener listens for incoming operations of a specific operation 1957 type and application id. Once notified of an incoming set request, the 1958 client can accept the set request (providing a local set for the 1959 operation) or reject it. 1960 1961 .. _Result-Elements: 1962 1963 Result Elements 1964 --------------- 1965 1966 The SET service has three *result modes* that determine how an 1967 operation's result set is delivered to the client: 1968 1969 - **Full Result Set.** All elements of set resulting from the set 1970 operation are returned to the client. 1971 1972 - **Added Elements.** Only elements that result from the operation and 1973 are not already in the local peer's set are returned. Note that for 1974 some operations (like set intersection) this result mode will never 1975 return any elements. This can be useful if only the remove peer is 1976 actually interested in the result of the set operation. 1977 1978 - **Removed Elements.** Only elements that are in the local peer's 1979 initial set but not in the operation's result set are returned. Note 1980 that for some operations (like set union) this result mode will never 1981 return any elements. This can be useful if only the remove peer is 1982 actually interested in the result of the set operation. 1983 1984 .. index:: 1985 double: subsystem; SETI 1986 1987 .. _SETI-Subsystem: 1988 1989 SETI — Peer to peer set intersections 1990 ===================================== 1991 1992 The SETI service implements efficient set intersection between two peers 1993 over a CADET tunnel. Elements of a set consist of an *element type* and 1994 arbitrary binary *data*. The size of an element's data is limited to 1995 around 62 KB. 1996 1997 .. _Intersection-Sets: 1998 1999 Intersection Sets 2000 ----------------- 2001 2002 Sets created by a local client can be modified (by adding additional 2003 elements) and reused for multiple operations. If elements are to be 2004 removed, a fresh set must be created by the client. 2005 2006 .. _Set-Intersection-Modifications: 2007 2008 Set Intersection Modifications 2009 ------------------------------ 2010 2011 Even when set operations are active, one can add elements to a set. 2012 However, these changes will only be visible to operations that have been 2013 created after the changes have taken place. That is, every set operation 2014 only sees a snapshot of the set from the time the operation was started. 2015 This mechanism is *not* implemented by copying the whole set, but by 2016 attaching *generation information* to each element and operation. 2017 2018 .. _Set-Intersection-Operations: 2019 2020 Set Intersection Operations 2021 --------------------------- 2022 2023 Set operations can be started in two ways: Either by accepting an 2024 operation request from a remote peer, or by requesting a set operation 2025 from a remote peer. Set operations are uniquely identified by the 2026 involved *peers*, an *application id* and the *operation type*. 2027 2028 The client is notified of incoming set operations by *set listeners*. A 2029 set listener listens for incoming operations of a specific operation 2030 type and application id. Once notified of an incoming set request, the 2031 client can accept the set request (providing a local set for the 2032 operation) or reject it. 2033 2034 .. _Intersection-Result-Elements: 2035 2036 Intersection Result Elements 2037 ---------------------------- 2038 2039 The SET service has two *result modes* that determine how an operation's 2040 result set is delivered to the client: 2041 2042 - **Return intersection.** All elements of set resulting from the set 2043 intersection are returned to the client. 2044 2045 - **Removed Elements.** Only elements that are in the local peer's 2046 initial set but not in the intersection are returned. 2047 2048 2049 2050 2051 .. index:: 2052 double: SETU; subsystem 2053 2054 .. _SETU-Subsystem: 2055 2056 SETU — Peer to peer set unions 2057 ============================== 2058 2059 The SETU service implements efficient set union operations between two 2060 peers over a CADET tunnel. Elements of a set consist of an *element 2061 type* and arbitrary binary *data*. The size of an element's data is 2062 limited to around 62 KB. 2063 2064 .. _Union-Sets: 2065 2066 Union Sets 2067 ---------- 2068 2069 Sets created by a local client can be modified (by adding additional 2070 elements) and reused for multiple operations. If elements are to be 2071 removed, a fresh set must be created by the client. 2072 2073 .. _Set-Union-Modifications: 2074 2075 Set Union Modifications 2076 ----------------------- 2077 2078 Even when set operations are active, one can add elements to a set. 2079 However, these changes will only be visible to operations that have been 2080 created after the changes have taken place. That is, every set operation 2081 only sees a snapshot of the set from the time the operation was started. 2082 This mechanism is *not* implemented by copying the whole set, but by 2083 attaching *generation information* to each element and operation. 2084 2085 .. _Set-Union-Operations: 2086 2087 Set Union Operations 2088 -------------------- 2089 2090 Set operations can be started in two ways: Either by accepting an 2091 operation request from a remote peer, or by requesting a set operation 2092 from a remote peer. Set operations are uniquely identified by the 2093 involved *peers*, an *application id* and the *operation type*. 2094 2095 The client is notified of incoming set operations by *set listeners*. A 2096 set listener listens for incoming operations of a specific operation 2097 type and application id. Once notified of an incoming set request, the 2098 client can accept the set request (providing a local set for the 2099 operation) or reject it. 2100 2101 .. _Union-Result-Elements: 2102 2103 Union Result Elements 2104 --------------------- 2105 2106 The SET service has three *result modes* that determine how an 2107 operation's result set is delivered to the client: 2108 2109 - **Locally added Elements.** Elements that are in the union but not 2110 already in the local peer's set are returned. 2111 2112 - **Remote added Elements.** Additionally, notify the client if the 2113 remote peer lacked some elements and thus also return to the local 2114 client those elements that we are sending to the remote peer to be 2115 added to its union. Obtaining these elements requires setting the 2116 ``GNUNET_SETU_OPTION_SYMMETRIC`` option.