subsystems.rst - gnunet-handbook - The GNUnet Handbook

subsystems.rst (87468B)
      1 .. _subsystems:
      2 
      3 Subsystems
      4 **********
      5 
      6 This section consists brief description of the subsystems that make up
      7 GNUnet.
      8 This image is giving an overview over system dependencies and interactions.
      9 
     10 .. image:: /images/gnunet-arch-full.svg
     11 
     12 CADET - Decentralized End-to-end Transport
     13 ==========================================
     14 
     15 The Confidential Ad-hoc Decentralized End-to-end Transport (CADET) subsystem
     16 in GNUnet is responsible for secure end-to-end
     17 communications between nodes in the GNUnet overlay network. CADET builds
     18 on the CORE subsystem, which provides for the link-layer communication,
     19 by adding routing, forwarding, and additional security to the
     20 connections. CADET offers the same cryptographic services as CORE, but
     21 on an end-to-end level. This is done so peers retransmitting traffic on
     22 behalf of other peers cannot access the payload data.
     23 
     24 -  CADET provides confidentiality with so-called perfect forward
     25    secrecy; we use ECDHE powered by Curve25519 for the key exchange and
     26    then use symmetric encryption, encrypting with both AES-256 and
     27    Twofish
     28 
     29 -  authentication is achieved by signing the ephemeral keys using
     30    Ed25519, a deterministic variant of ECDSA
     31 
     32 -  integrity protection (using SHA-512 to do encrypt-then-MAC, although
     33    only 256 bits are sent to reduce overhead)
     34 
     35 -  replay protection (using nonces, timestamps, challenge-response,
     36    message counters and ephemeral keys)
     37 
     38 -  liveness (keep-alive messages, timeout)
     39 
     40 Additional to the CORE-like security benefits, CADET offers other
     41 properties that make it a more universal service than CORE.
     42 
     43 -  CADET can establish channels to arbitrary peers in GNUnet. If a peer
     44    is not immediately reachable, CADET will find a path through the
     45    network and ask other peers to retransmit the traffic on its behalf.
     46 
     47 -  CADET offers (optional) reliability mechanisms. In a reliable channel
     48    traffic is guaranteed to arrive complete, unchanged and in-order.
     49 
     50 -  CADET takes care of flow and congestion control mechanisms, not
     51    allowing the sender to send more traffic than the receiver or the
     52    network are able to process.
     53 
     54 .. _CORE-Subsystem:
     55 
     56 .. index::
     57    double: CORE; subsystem
     58 
     59 CORE - GNUnet link layer
     60 ========================
     61 
     62 The CORE subsystem in GNUnet is responsible for securing link-layer
     63 communications between nodes in the GNUnet overlay network. CORE builds
     64 on the TRANSPORT subsystem which provides for the actual, insecure,
     65 unreliable link-layer communication (for example, via UDP or WLAN), and
     66 then adds fundamental security to the connections:
     67 
     68 -  confidentiality with so-called perfect forward secrecy; we use ECDHE
     69    (`Elliptic-curve
     70    Diffie—Hellman <http://en.wikipedia.org/wiki/Elliptic_curve_Diffie%E2%80%93Hellman>`__)
     71    powered by Curve25519 (`Curve25519 <http://cr.yp.to/ecdh.html>`__)
     72    for the key exchange and then use symmetric encryption, encrypting
     73    with both AES-256
     74    (`AES-256 <http://en.wikipedia.org/wiki/Rijndael>`__) and Twofish
     75    (`Twofish <http://en.wikipedia.org/wiki/Twofish>`__)
     76 
     77 -  `authentication <http://en.wikipedia.org/wiki/Authentication>`__ is
     78    achieved by signing the ephemeral keys using Ed25519
     79    (`Ed25519 <http://ed25519.cr.yp.to/>`__), a deterministic variant of
     80    ECDSA (`ECDSA <http://en.wikipedia.org/wiki/ECDSA>`__)
     81 
     82 -  integrity protection (using SHA-512
     83    (`SHA-512 <http://en.wikipedia.org/wiki/SHA-2>`__) to do
     84    encrypt-then-MAC
     85    (`encrypt-then-MAC <http://en.wikipedia.org/wiki/Authenticated_encryption>`__))
     86 
     87 -  Replay (`replay <http://en.wikipedia.org/wiki/Replay_attack>`__)
     88    protection (using nonces, timestamps, challenge-response, message
     89    counters and ephemeral keys)
     90 
     91 -  liveness (keep-alive messages, timeout)
     92 
     93 .. _Limitations:
     94 
     95 :index:`Limitations <CORE; limitations>`
     96 Limitations
     97 -----------
     98 
     99 CORE does not perform
    100 `routing <http://en.wikipedia.org/wiki/Routing>`__; using CORE it is
    101 only possible to communicate with peers that happen to already be
    102 \"directly\" connected with each other. CORE also does not have an API
    103 to allow applications to establish such \"direct\" connections --- for
    104 this, applications can ask TRANSPORT, but TRANSPORT might not be able to
    105 establish a \"direct\" connection. The TOPOLOGY subsystem is responsible
    106 for trying to keep a few \"direct\" connections open at all times.
    107 Applications that need to talk to particular peers should use the CADET
    108 subsystem, as it can establish arbitrary \"indirect\" connections.
    109 
    110 Because CORE does not perform routing, CORE must only be used directly
    111 by applications that either perform their own routing logic (such as
    112 anonymous file-sharing) or that do not require routing, for example
    113 because they are based on flooding the network. CORE communication is
    114 unreliable and delivery is possibly out-of-order. Applications that
    115 require reliable communication should use the CADET service. Each
    116 application can only queue one message per target peer with the CORE
    117 service at any time; messages cannot be larger than approximately 63
    118 kilobytes. If messages are small, CORE may group multiple messages
    119 (possibly from different applications) prior to encryption. If permitted
    120 by the application (using the `cork <http://baus.net/on-tcp_cork/>`__
    121 option), CORE may delay transmissions to facilitate grouping of multiple
    122 small messages. If cork is not enabled, CORE will transmit the message
    123 as soon as TRANSPORT allows it (TRANSPORT is responsible for limiting
    124 bandwidth and congestion control). CORE does not allow flow control;
    125 applications are expected to process messages at line-speed. If flow
    126 control is needed, applications should use the CADET service.
    127 
    128 .. when is a peer connected
    129 .. _When-is-a-peer-_0022connected_0022_003f:
    130 
    131 When is a peer \"connected\"?
    132 -----------------------------
    133 
    134 In addition to the security features mentioned above, CORE also provides
    135 one additional key feature to applications using it, and that is a
    136 limited form of protocol-compatibility checking. CORE distinguishes
    137 between TRANSPORT-level connections (which enable communication with
    138 other peers) and application-level connections. Applications using the
    139 CORE API will (typically) learn about application-level connections from
    140 CORE, and not about TRANSPORT-level connections. When a typical
    141 application uses CORE, it will specify a set of message types (from
    142 ``gnunet_protocols.h``) that it understands. CORE will then notify the
    143 application about connections it has with other peers if and only if
    144 those applications registered an intersecting set of message types with
    145 their CORE service. Thus, it is quite possible that CORE only exposes a
    146 subset of the established direct connections to a particular application
    147 --- and different applications running above CORE might see different
    148 sets of connections at the same time.
    149 
    150 A special case are applications that do not register a handler for any
    151 message type. CORE assumes that these applications merely want to
    152 monitor connections (or \"all\" messages via other callbacks) and will
    153 notify those applications about all connections. This is used, for
    154 example, by the ``gnunet-core`` command-line tool to display the active
    155 connections. Note that it is also possible that the TRANSPORT service
    156 has more active connections than the CORE service, as the CORE service
    157 first has to perform a key exchange with connecting peers before
    158 exchanging information about supported message types and notifying
    159 applications about the new connection.
    160 .. _Distributed-Hash-Table-_0028DHT_0029:
    161 
    162 .. index::
    163    double: Distributed hash table; subsystem
    164    see: DHT; Distributed hash table
    165 
    166 DHT - Distributed Hash Table
    167 ============================
    168 
    169 GNUnet includes a generic distributed hash table that can be used by
    170 developers building P2P applications in the framework. This section
    171 documents high-level features and how developers are expected to use the
    172 DHT. We have a research paper detailing how the DHT works. Also, Nate's
    173 thesis includes a detailed description and performance analysis (in
    174 chapter 6). [R5N2011]_ [EVANS2011]_
    175 
    176 Key features of GNUnet's DHT include:
    177 
    178 -  stores key-value pairs with values up to (approximately) 63k in size
    179 
    180 -  works with many underlay network topologies (small-world, random
    181    graph), underlay does not need to be a full mesh / clique
    182 
    183 -  support for extended queries (more than just a simple 'key'),
    184    filtering duplicate replies within the network (bloomfilter) and
    185    content validation (for details, please read the subsection on the
    186    block library)
    187 
    188 -  can (optionally) return paths taken by the PUT and GET operations to
    189    the application
    190 
    191 -  provides content replication to handle churn
    192 
    193 GNUnet's DHT is randomized and unreliable. Unreliable means that there
    194 is no strict guarantee that a value stored in the DHT is always found
    195 — values are only found with high probability. While this is somewhat
    196 true in all P2P DHTs, GNUnet developers should be particularly wary of
    197 this fact (this will help you write secure, fault-tolerant code). Thus,
    198 when writing any application using the DHT, you should always consider
    199 the possibility that a value stored in the DHT by you or some other peer
    200 might simply not be returned, or returned with a significant delay. Your
    201 application logic must be written to tolerate this (naturally, some loss
    202 of performance or quality of service is expected in this case).
    203 
    204 .. _Block-library-and-plugins:
    205 
    206 Block library and plugins
    207 -------------------------
    208 
    209 .. _What-is-a-Block_003f:
    210 
    211 What is a Block?
    212 ^^^^^^^^^^^^^^^^
    213 
    214 Blocks are small (< 63k) pieces of data stored under a key (struct
    215 GNUNET_HashCode). Blocks have a type (enum GNUNET_BlockType) which
    216 defines their data format. Blocks are used in GNUnet as units of static
    217 data exchanged between peers and stored (or cached) locally. Uses of
    218 blocks include file-sharing (the files are broken up into blocks), the
    219 VPN (DNS information is stored in blocks) and the DHT (all information
    220 in the DHT and meta-information for the maintenance of the DHT are both
    221 stored using blocks). The block subsystem provides a few common
    222 functions that must be available for any type of block.
    223 
    224 
    225 .. [R5N2011] https://bib.gnunet.org/date.html#R5N
    226 .. [EVANS2011] https://d-nb.info/1015129951
    227 .. index:: 
    228    double: File sharing; subsystem
    229    see: FS; File sharing
    230 
    231 .. _File_002dsharing-_0028FS_0029-Subsystem:
    232 
    233 FS — File sharing over GNUnet
    234 =============================
    235 
    236 This chapter describes the details of how the file-sharing service
    237 works. As with all services, it is split into an API (libgnunetfs), the
    238 service process (gnunet-service-fs) and user interface(s). The
    239 file-sharing service uses the datastore service to store blocks and the
    240 DHT (and indirectly datacache) for lookups for non-anonymous
    241 file-sharing. Furthermore, the file-sharing service uses the block
    242 library (and the block fs plugin) for validation of DHT operations.
    243 
    244 In contrast to many other services, libgnunetfs is rather complex since
    245 the client library includes a large number of high-level abstractions;
    246 this is necessary since the FS service itself largely only operates on
    247 the block level. The FS library is responsible for providing a
    248 file-based abstraction to applications, including directories, meta
    249 data, keyword search, verification, and so on.
    250 
    251 The method used by GNUnet to break large files into blocks and to use
    252 keyword search is called the \"Encoding for Censorship Resistant
    253 Sharing\" (ECRS). ECRS is largely implemented in the fs library; block
    254 validation is also reflected in the block FS plugin and the FS service.
    255 ECRS on-demand encoding is implemented in the FS service.
    256 
    257 .. note:: The documentation in this chapter is quite incomplete.
    258 
    259 .. _Encoding-for-Censorship_002dResistant-Sharing-_0028ECRS_0029:
    260 
    261 .. index::
    262    see: Encoding for Censorship-Resistant Sharing; ECRS
    263 
    264 :index:`ECRS — Encoding for Censorship-Resistant Sharing <single: ECRS>`
    265 ECRS — Encoding for Censorship-Resistant Sharing
    266 ------------------------------------------------
    267 
    268 When GNUnet shares files, it uses a content encoding that is called
    269 ECRS, the Encoding for Censorship-Resistant Sharing. Most of ECRS is
    270 described in the (so far unpublished) research paper attached to this
    271 page. ECRS obsoletes the previous ESED and ESED II encodings which were
    272 used in GNUnet before version 0.7.0. The rest of this page assumes that
    273 the reader is familiar with the attached paper. What follows is a
    274 description of some minor extensions that GNUnet makes over what is
    275 described in the paper. The reason why these extensions are not in the
    276 paper is that we felt that they were obvious or trivial extensions to
    277 the original scheme and thus did not warrant space in the research
    278 report.
    279 
    280 .. todo:: Find missing link to file system paper.
    281 
    282 .. index::
    283    double: GNU Name System; subsystem
    284    see: GNS; GNU Name System
    285 
    286 .. _GNU-Name-System-_0028GNS_0029:
    287 
    288 GNS - the GNU Name system
    289 -------------------------
    290 
    291 The GNU Name System (GNS) is a decentralized database that enables users
    292 to securely resolve names to values. Names can be used to identify other
    293 users (for example, in social networking), or network services (for
    294 example, VPN services running at a peer in GNUnet, or purely IP-based
    295 services on the Internet). Users interact with GNS by typing in a
    296 hostname that ends in a top-level domain that is configured in the "GNS"
    297 section, matches an identity of the user or ends in a Base32-encoded
    298 public key.
    299 
    300 Videos giving an overview of most of the GNS and the motivations behind
    301 it is available here and here. The remainder of this chapter targets
    302 developers that are familiar with high level concepts of GNS as
    303 presented in these talks.
    304 
    305 .. todo:: Link to videos and GNS talks?
    306 
    307 GNS-aware applications should use the GNS resolver to obtain the
    308 respective records that are stored under that name in GNS. Each record
    309 consists of a type, value, expiration time and flags.
    310 
    311 The type specifies the format of the value. Types below 65536 correspond
    312 to DNS record types, larger values are used for GNS-specific records.
    313 Applications can define new GNS record types by reserving a number and
    314 implementing a plugin (which mostly needs to convert the binary value
    315 representation to a human-readable text format and vice-versa). The
    316 expiration time specifies how long the record is to be valid. The GNS
    317 API ensures that applications are only given non-expired values. The
    318 flags are typically irrelevant for applications, as GNS uses them
    319 internally to control visibility and validity of records.
    320 
    321 Records are stored along with a signature. The signature is generated
    322 using the private key of the authoritative zone. This allows any GNS
    323 resolver to verify the correctness of a name-value mapping.
    324 
    325 Internally, GNS uses the NAMECACHE to cache information obtained from
    326 other users, the NAMESTORE to store information specific to the local
    327 users, and the DHT to exchange data between users. A plugin API is used
    328 to enable applications to define new GNS record types.
    329 
    330 .. index::
    331    single: GNS; name cache
    332    double: subsystem; NAMECACHE
    333 
    334 .. _GNS-Namecache:
    335 
    336 NAMECACHE — DHT caching of GNS results
    337 ======================================
    338 
    339 The NAMECACHE subsystem is responsible for caching (encrypted)
    340 resolution results of the GNU Name System (GNS). GNS makes zone
    341 information available to other users via the DHT. However, as accessing
    342 the DHT for every lookup is expensive (and as the DHT's local cache is
    343 lost whenever the peer is restarted), GNS uses the NAMECACHE as a more
    344 persistent cache for DHT lookups. Thus, instead of always looking up
    345 every name in the DHT, GNS first checks if the result is already
    346 available locally in the NAMECACHE. Only if there is no result in the
    347 NAMECACHE, GNS queries the DHT. The NAMECACHE stores data in the same
    348 (encrypted) format as the DHT. It thus makes no sense to iterate over
    349 all items in the NAMECACHE – the NAMECACHE does not have a way to
    350 provide the keys required to decrypt the entries.
    351 
    352 Blocks in the NAMECACHE share the same expiration mechanism as blocks in
    353 the DHT – the block expires wheneever any of the records in the
    354 (encrypted) block expires. The expiration time of the block is the only
    355 information stored in plaintext. The NAMECACHE service internally
    356 performs all of the required work to expire blocks, clients do not have
    357 to worry about this. Also, given that NAMECACHE stores only GNS blocks
    358 that local users requested, there is no configuration option to limit
    359 the size of the NAMECACHE. It is assumed to be always small enough (a
    360 few MB) to fit on the drive.
    361 
    362 The NAMECACHE supports the use of different database backends via a
    363 plugin API.
    364 
    365 .. index:: 
    366    double: subsystem; NAMESTORE
    367 
    368 .. _NAMESTORE-Subsystem:
    369 
    370 NAMESTORE — Storage of local GNS zones
    371 ======================================
    372 
    373 The NAMESTORE subsystem provides persistent storage for local GNS zone
    374 information. All local GNS zone information are managed by NAMESTORE. It
    375 provides both the functionality to administer local GNS information
    376 (e.g. delete and add records) as well as to retrieve GNS information
    377 (e.g to list name information in a client). NAMESTORE does only manage
    378 the persistent storage of zone information belonging to the user running
    379 the service: GNS information from other users obtained from the DHT are
    380 stored by the NAMECACHE subsystem.
    381 
    382 NAMESTORE uses a plugin-based database backend to store GNS information
    383 with good performance. Here sqlite and PostgreSQL are supported
    384 database backends. NAMESTORE clients interact with the IDENTITY
    385 subsystem to obtain cryptographic information about zones based on egos
    386 as described with the IDENTITY subsystem, but internally NAMESTORE
    387 refers to zones using the respective private key.
    388 
    389 NAMESTORE is queried and monitored by the ZONEMASTER service which periodically
    390 publishes public records of GNS zones. ZONEMASTER also
    391 collaborates with the NAMECACHE subsystem and stores zone information
    392 when local information are modified in the NAMECACHE cache to increase look-up
    393 performance for local information and to enable local access to private records
    394 in zones through GNS.
    395 
    396 NAMESTORE provides functionality to look-up and store records, to
    397 iterate over a specific or all zones and to monitor zones for changes.
    398 NAMESTORE functionality can be accessed using the NAMESTORE C API, the NAMESTORE
    399 REST API, or the NAMESTORE command line tool.
    400 
    401 .. index::
    402    double: HOSTLIST; subsystem
    403 
    404 .. _HOSTLIST-Subsystem:
    405 
    406 HOSTLIST — HELLO bootstrapping and gossip
    407 =========================================
    408 
    409 Peers in the GNUnet overlay network need address information so that
    410 they can connect with other peers. GNUnet uses so called HELLO messages
    411 to store and exchange peer addresses. GNUnet provides several methods
    412 for peers to obtain this information:
    413 
    414 -  out-of-band exchange of HELLO messages (manually, using for example
    415    gnunet-core)
    416 
    417 -  HELLO messages shipped with GNUnet (automatic with distribution)
    418 
    419 -  UDP neighbor discovery in LAN (IPv4 broadcast, IPv6 multicast)
    420 
    421 -  topology gossiping (learning from other peers we already connected
    422    to), and
    423 
    424 -  the HOSTLIST daemon covered in this section, which is particularly
    425    relevant for bootstrapping new peers.
    426 
    427 New peers have no existing connections (and thus cannot learn from
    428 gossip among peers), may not have other peers in their LAN and might be
    429 started with an outdated set of HELLO messages from the distribution. In
    430 this case, getting new peers to connect to the network requires either
    431 manual effort or the use of a HOSTLIST to obtain HELLOs.
    432 
    433 .. _HELLOs:
    434 
    435 HELLOs
    436 ------
    437 
    438 The basic information peers require to connect to other peers are
    439 contained in so called HELLO messages you can think of as a business
    440 card. Besides the identity of the peer (based on the cryptographic
    441 public key) a HELLO message may contain address information that
    442 specifies ways to contact a peer. By obtaining HELLO messages, a peer
    443 can learn how to contact other peers.
    444 
    445 .. _Overview-for-the-HOSTLIST-subsystem:
    446 
    447 Overview for the HOSTLIST subsystem
    448 -----------------------------------
    449 
    450 The HOSTLIST subsystem provides a way to distribute and obtain contact
    451 information to connect to other peers using a simple HTTP GET request.
    452 Its implementation is split in three parts, the main file for the
    453 daemon itself (``gnunet-daemon-hostlist.c``), the HTTP client used to
    454 download peer information (``hostlist-client.c``) and the server
    455 component used to provide this information to other peers
    456 (``hostlist-server.c``). The server is basically a small HTTP web server
    457 (based on GNU libmicrohttpd) which provides a list of HELLOs known to
    458 the local peer for download. The client component is basically a HTTP
    459 client (based on libcurl) which can download hostlists from one or more
    460 websites. The hostlist format is a binary blob containing a sequence of
    461 HELLO messages. Note that any HTTP server can theoretically serve a
    462 hostlist, the built-in hostlist server makes it simply convenient to
    463 offer this service.
    464 
    465 .. _Features:
    466 
    467 Features
    468 ^^^^^^^^
    469 
    470 The HOSTLIST daemon can:
    471 
    472 -  provide HELLO messages with validated addresses obtained from
    473    PEERINFO to download for other peers
    474 
    475 -  download HELLO messages and forward these message to the TRANSPORT
    476    subsystem for validation
    477 
    478 -  advertises the URL of this peer's hostlist address to other peers via
    479    gossip
    480 
    481 -  automatically learn about hostlist servers from the gossip of other
    482    peers
    483 
    484 .. _HOSTLIST-_002d-Limitations:
    485 
    486 HOSTLIST - Limitations
    487 ^^^^^^^^^^^^^^^^^^^^^^
    488 
    489 The HOSTLIST daemon does not:
    490 
    491 -  verify the cryptographic information in the HELLO messages
    492 
    493 -  verify the address information in the HELLO messages
    494 
    495 .. _Interacting-with-the-HOSTLIST-daemon:
    496 
    497 Interacting with the HOSTLIST daemon
    498 ------------------------------------
    499 
    500 The HOSTLIST subsystem is currently implemented as a daemon, so there is
    501 no need for the user to interact with it and therefore there is no
    502 command line tool and no API to communicate with the daemon. In the
    503 future, we can envision changing this to allow users to manually trigger
    504 the download of a hostlist.
    505 
    506 Since there is no command line interface to interact with HOSTLIST, the
    507 only way to interact with the hostlist is to use STATISTICS to obtain or
    508 modify information about the status of HOSTLIST:
    509 
    510 ::
    511 
    512    $ gnunet-statistics -s hostlist
    513 
    514 In particular, HOSTLIST includes a **persistent** value in statistics
    515 that specifies when the hostlist server might be queried next. As this
    516 value is exponentially increasing during runtime, developers may want to
    517 reset or manually adjust it. Note that HOSTLIST (but not STATISTICS)
    518 needs to be shutdown if changes to this value are to have any effect on
    519 the daemon (as HOSTLIST does not monitor STATISTICS for changes to the
    520 download frequency).
    521 
    522 .. _Hostlist-security-address-validation:
    523 
    524 Hostlist security address validation
    525 ------------------------------------
    526 
    527 Since information obtained from other parties cannot be trusted without
    528 validation, we have to distinguish between *validated* and *not
    529 validated* addresses. Before using (and so trusting) information from
    530 other parties, this information has to be double-checked (validated).
    531 Address validation is not done by HOSTLIST but by the TRANSPORT service.
    532 
    533 The HOSTLIST component is functionally located between the PEERINFO and
    534 the TRANSPORT subsystem. When acting as a server, the daemon obtains
    535 valid (*validated*) peer information (HELLO messages) from the PEERINFO
    536 service and provides it to other peers. When acting as a client, it
    537 contacts the HOSTLIST servers specified in the configuration, downloads
    538 the (unvalidated) list of HELLO messages and forwards these information
    539 to the TRANSPORT server to validate the addresses.
    540 
    541 .. _The-HOSTLIST-daemon:
    542 
    543 :index:`The HOSTLIST daemon <double: daemon; HOSTLIST>`
    544 The HOSTLIST daemon
    545 -------------------
    546 
    547 The hostlist daemon is the main component of the HOSTLIST subsystem. It
    548 is started by the ARM service and (if configured) starts the HOSTLIST
    549 client and server components.
    550 
    551 GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT
    552 If the daemon provides a hostlist itself it can advertise it's own
    553 hostlist to other peers. To do so it sends a
    554 ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT`` message to other peers
    555 when they connect to this peer on the CORE level. This hostlist
    556 advertisement message contains the URL to access the HOSTLIST HTTP
    557 server of the sender. The daemon may also subscribe to this type of
    558 message from CORE service, and then forward these kind of message to the
    559 HOSTLIST client. The client then uses all available URLs to download
    560 peer information when necessary.
    561 
    562 When starting, the HOSTLIST daemon first connects to the CORE subsystem
    563 and if hostlist learning is enabled, registers a CORE handler to receive
    564 this kind of messages. Next it starts (if configured) the client and
    565 server. It passes pointers to CORE connect and disconnect and receive
    566 handlers where the client and server store their functions, so the
    567 daemon can notify them about CORE events.
    568 
    569 To clean up on shutdown, the daemon has a cleaning task, shutting down
    570 all subsystems and disconnecting from CORE.
    571 
    572 .. _The-HOSTLIST-server:
    573 
    574 :index:`The HOSTLIST server <single: HOSTLIST; server>`
    575 The HOSTLIST server
    576 -------------------
    577 
    578 The server provides a way for other peers to obtain HELLOs. Basically it
    579 is a small web server other peers can connect to and download a list of
    580 HELLOs using standard HTTP; it may also advertise the URL of the
    581 hostlist to other peers connecting on CORE level.
    582 
    583 .. _The-HTTP-Server:
    584 
    585 The HTTP Server
    586 ^^^^^^^^^^^^^^^
    587 
    588 During startup, the server starts a web server listening on the port
    589 specified with the HTTPPORT value (default 8080). In addition it
    590 connects to the PEERINFO service to obtain peer information. The
    591 HOSTLIST server uses the GNUNET_PEERINFO_iterate function to request
    592 HELLO information for all peers and adds their information to a new
    593 hostlist if they are suitable (expired addresses and HELLOs without
    594 addresses are both not suitable) and the maximum size for a hostlist is
    595 not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000). When PEERINFO finishes
    596 (with a last NULL callback), the server destroys the previous hostlist
    597 response available for download on the web server and replaces it with
    598 the updated hostlist. The hostlist format is basically a sequence of
    599 HELLO messages (as obtained from PEERINFO) without any special
    600 tokenization. Since each HELLO message contains a size field, the
    601 response can easily be split into separate HELLO messages by the client.
    602 
    603 A HOSTLIST client connecting to the HOSTLIST server will receive the
    604 hostlist as an HTTP response and the server will terminate the
    605 connection with the result code ``HTTP 200 OK``. The connection will be
    606 closed immediately if no hostlist is available.
    607 
    608 .. _Advertising-the-URL:
    609 
    610 Advertising the URL
    611 ^^^^^^^^^^^^^^^^^^^
    612 
    613 The server also advertises the URL to download the hostlist to other
    614 peers if hostlist advertisement is enabled. When a new peer connects and
    615 has hostlist learning enabled, the server sends a
    616 ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT`` message to this peer
    617 using the CORE service.
    618 
    619 HOSTLIST client
    620 .. _The-HOSTLIST-client:
    621 
    622 The HOSTLIST client
    623 -------------------
    624 
    625 The client provides the functionality to download the list of HELLOs
    626 from a set of URLs. It performs a standard HTTP request to the URLs
    627 configured and learned from advertisement messages received from other
    628 peers. When a HELLO is downloaded, the HOSTLIST client forwards the
    629 HELLO to the TRANSPORT service for validation.
    630 
    631 The client supports two modes of operation:
    632 
    633 -  download of HELLOs (bootstrapping)
    634 
    635 -  learning of URLs
    636 
    637 .. _Bootstrapping:
    638 
    639 Bootstrapping
    640 ^^^^^^^^^^^^^
    641 
    642 For bootstrapping, it schedules a task to download the hostlist from the
    643 set of known URLs. The downloads are only performed if the number of
    644 current connections is smaller than a minimum number of connections (at
    645 the moment 4). The interval between downloads increases exponentially;
    646 however, the exponential growth is limited if it becomes longer than an
    647 hour. At that point, the frequency growth is capped at (#number of
    648 connections \* 1h).
    649 
    650 Once the decision has been taken to download HELLOs, the daemon chooses
    651 a random URL from the list of known URLs. URLs can be configured in the
    652 configuration or be learned from advertisement messages. The client uses
    653 a HTTP client library (libcurl) to initiate the download using the
    654 libcurl multi interface. Libcurl passes the data to the
    655 callback_download function which stores the data in a buffer if space is
    656 available and the maximum size for a hostlist download is not exceeded
    657 (MAX_BYTES_PER_HOSTLISTS = 500000). When a full HELLO was downloaded,
    658 the HOSTLIST client offers this HELLO message to the TRANSPORT service
    659 for validation. When the download is finished or failed, statistical
    660 information about the quality of this URL is updated.
    661 
    662 .. _Learning:
    663 
    664 :index:`Learning <single: HOSTLIST; learning>`
    665 Learning
    666 ^^^^^^^^
    667 
    668 The client also manages hostlist advertisements from other peers. The
    669 HOSTLIST daemon forwards ``GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT``
    670 messages to the client subsystem, which extracts the URL from the
    671 message. Next, a test of the newly obtained URL is performed by
    672 triggering a download from the new URL. If the URL works correctly, it
    673 is added to the list of working URLs.
    674 
    675 The size of the list of URLs is restricted, so if an additional server
    676 is added and the list is full, the URL with the worst quality ranking
    677 (determined through successful downloads and number of HELLOs e.g.) is
    678 discarded. During shutdown the list of URLs is saved to a file for
    679 persistence and loaded on startup. URLs from the configuration file are
    680 never discarded.
    681 
    682 .. _Usage:
    683 
    684 Usage
    685 -----
    686 
    687 To start HOSTLIST by default, it has to be added to the DEFAULTSERVICES
    688 section for the ARM services. This is done in the default configuration.
    689 
    690 For more information on how to configure the HOSTLIST subsystem see the
    691 installation handbook: Configuring the hostlist to bootstrap Configuring
    692 your peer to provide a hostlist
    693 
    694 .. index::
    695    double: IDENTITY; subsystem 
    696 
    697 .. _IDENTITY-Subsystem:
    698 
    699 IDENTITY — Ego management
    700 =========================
    701 
    702 Identities of \"users\" in GNUnet are called egos. Egos can be used as
    703 pseudonyms (\"fake names\") or be tied to an organization (for example,
    704 \"GNU\") or even the actual identity of a human. GNUnet users are
    705 expected to have many egos. They might have one tied to their real
    706 identity, some for organizations they manage, and more for different
    707 domains where they want to operate under a pseudonym.
    708 
    709 The IDENTITY service allows users to manage their egos. The identity
    710 service manages the private keys egos of the local user; it does not
    711 manage identities of other users (public keys). Public keys for other
    712 users need names to become manageable. GNUnet uses the GNU Name System
    713 (GNS) to give names to other users and manage their public keys
    714 securely. This chapter is about the IDENTITY service, which is about the
    715 management of private keys.
    716 
    717 On the network, an ego corresponds to an ECDSA key (over Curve25519,
    718 using RFC 6979, as required by GNS). Thus, users can perform actions
    719 under a particular ego by using (signing with) a particular private key.
    720 Other users can then confirm that the action was really performed by
    721 that ego by checking the signature against the respective public key.
    722 
    723 The IDENTITY service allows users to associate a human-readable name
    724 with each ego. This way, users can use names that will remind them of
    725 the purpose of a particular ego. The IDENTITY service will store the
    726 respective private keys and allows applications to access key
    727 information by name. Users can change the name that is locally (!)
    728 associated with an ego. Egos can also be deleted, which means that the
    729 private key will be removed and it thus will not be possible to perform
    730 actions with that ego in the future.
    731 
    732 Additionally, the IDENTITY subsystem can associate service functions
    733 with egos. For example, GNS requires the ego that should be used for the
    734 shorten zone. GNS will ask IDENTITY for an ego for the \"gns-short\"
    735 service. The IDENTITY service has a mapping of such service strings to
    736 the name of the ego that the user wants to use for this service, for
    737 example \"my-short-zone-ego\".
    738 
    739 Finally, the IDENTITY API provides access to a special ego, the
    740 anonymous ego. The anonymous ego is special in that its private key is
    741 not really private, but fixed and known to everyone. Thus, anyone can
    742 perform actions as anonymous. This can be useful as with this trick,
    743 code does not have to contain a special case to distinguish between
    744 anonymous and pseudonymous egos.
    745 
    746 .. index::
    747    double: subsystem; MESSENGER
    748 
    749 .. _MESSENGER-Subsystem:
    750 
    751 MESSENGER — Room-based end-to-end messaging 
    752 ===========================================
    753 
    754 The MESSENGER subsystem is responsible for secure end-to-end
    755 communication in groups of nodes in the GNUnet overlay network.
    756 MESSENGER builds on the CADET subsystem which provides a reliable and
    757 secure end-to-end communication between the nodes inside of these
    758 groups.
    759 
    760 Additionally to the CADET security benefits, MESSENGER provides
    761 following properties designed for application level usage:
    762 
    763 -  MESSENGER provides integrity by signing the messages with the users
    764    provided ego
    765 
    766 -  MESSENGER adds (optional) forward secrecy by replacing the key pair
    767    of the used ego and signing the propagation of the new one with old
    768    one (chaining egos)
    769 
    770 -  MESSENGER provides verification of a original sender by checking
    771    against all used egos from a member which are currently in active use
    772    (active use depends on the state of a member session)
    773 
    774 -  MESSENGER offsers (optional) decentralized message forwarding between
    775    all nodes in a group to improve availability and prevent MITM-attacks
    776 
    777 -  MESSENGER handles new connections and disconnections from nodes in
    778    the group by reconnecting them preserving an efficient structure for
    779    message distribution (ensuring availability and accountablity)
    780 
    781 -  MESSENGER provides replay protection (messages can be uniquely
    782    identified via SHA-512, include a timestamp and the hash of the last
    783    message)
    784 
    785 -  MESSENGER allows detection for dropped messages by chaining them
    786    (messages refer to the last message by their hash) improving
    787    accountability
    788 
    789 -  MESSENGER allows requesting messages from other peers explicitly to
    790    ensure availability
    791 
    792 -  MESSENGER provides confidentiality by padding messages to few
    793    different sizes (512 bytes, 4096 bytes, 32768 bytes and maximal
    794    message size from CADET)
    795 
    796 -  MESSENGER adds (optional) confidentiality with ECDHE to exchange and
    797    use symmetric encryption, encrypting with both AES-256 and Twofish
    798    but allowing only selected members to decrypt (using the receivers
    799    ego for ECDHE)
    800 
    801 Also MESSENGER provides multiple features with privacy in mind:
    802 
    803 -  MESSENGER allows deleting messages from all peers in the group by the
    804    original sender (uses the MESSENGER provided verification)
    805 
    806 -  MESSENGER allows using the publicly known anonymous ego instead of
    807    any unique identifying ego
    808 
    809 -  MESSENGER allows your node to decide between acting as host of the
    810    used messaging room (sharing your peer's identity with all nodes in
    811    the group) or acting as guest (sharing your peer's identity only with
    812    the nodes you explicitly open a connection to)
    813 
    814 -  MESSENGER handles members independently of the peer's identity making
    815    forwarded messages indistinguishable from directly received ones (
    816    complicating the tracking of messages and identifying its origin)
    817 
    818 -  MESSENGER allows names of members being not unique (also names are
    819    optional)
    820 
    821 -  MESSENGER does not include information about the selected receiver of
    822    an explicitly encrypted message in its header, complicating it for
    823    other members to draw conclusions from communication partners
    824 
    825 
    826 
    827 .. index::
    828    single: subsystem; Network size estimation
    829    see: NSE; Network size estimation
    830 
    831 .. _NSE-Subsystem:
    832 
    833 NSE — Network size estimation
    834 =============================
    835 
    836 NSE stands for Network Size Estimation. The NSE subsystem provides other
    837 subsystems and users with a rough estimate of the number of peers
    838 currently participating in the GNUnet overlay. The computed value is not
    839 a precise number as producing a precise number in a decentralized,
    840 efficient and secure way is impossible. While NSE's estimate is
    841 inherently imprecise, NSE also gives the expected range. For a peer that
    842 has been running in a stable network for a while, the real network size
    843 will typically (99.7% of the time) be in the range of [2/3 estimate, 3/2
    844 estimate]. We will now give an overview of the algorithm used to
    845 calculate the estimate; all of the details can be found in this
    846 technical report.
    847 
    848 .. todo:: link to the report.
    849 
    850 .. _Motivation:
    851 
    852 Motivation
    853 ----------
    854 
    855 Some subsystems, like DHT, need to know the size of the GNUnet network
    856 to optimize some parameters of their own protocol. The decentralized
    857 nature of GNUnet makes efficient and securely counting the exact number
    858 of peers infeasible. Although there are several decentralized algorithms
    859 to count the number of peers in a system, so far there is none to do so
    860 securely. Other protocols may allow any malicious peer to manipulate the
    861 final result or to take advantage of the system to perform Denial of
    862 Service (DoS) attacks against the network. GNUnet's NSE protocol avoids
    863 these drawbacks.
    864 
    865 NSE security
    866 .. _Security:
    867 
    868 :index:`Security <single: NSE; security>`
    869 Security
    870 ^^^^^^^^
    871 
    872 The NSE subsystem is designed to be resilient against these attacks. It
    873 uses `proofs of
    874 work <http://en.wikipedia.org/wiki/Proof-of-work_system>`__ to prevent
    875 one peer from impersonating a large number of participants, which would
    876 otherwise allow an adversary to artificially inflate the estimate. The
    877 DoS protection comes from the time-based nature of the protocol: the
    878 estimates are calculated periodically and out-of-time traffic is either
    879 ignored or stored for later retransmission by benign peers. In
    880 particular, peers cannot trigger global network communication at will.
    881 
    882 .. _Principle:
    883 
    884 :index:`Principle <single: NSE; principle of operation>`
    885 Principle
    886 ---------
    887 
    888 The algorithm calculates the estimate by finding the globally closest
    889 peer ID to a random, time-based value.
    890 
    891 The idea is that the closer the ID is to the random value, the more
    892 \"densely packed\" the ID space is, and therefore, more peers are in the
    893 network.
    894 
    895 .. _Example:
    896 
    897 Example
    898 ^^^^^^^
    899 
    900 Suppose all peers have IDs between 0 and 100 (our ID space), and the
    901 random value is 42. If the closest peer has the ID 70 we can imagine
    902 that the average \"distance\" between peers is around 30 and therefore
    903 the are around 3 peers in the whole ID space. On the other hand, if the
    904 closest peer has the ID 44, we can imagine that the space is rather
    905 packed with peers, maybe as much as 50 of them. Naturally, we could have
    906 been rather unlucky, and there is only one peer and happens to have the
    907 ID 44. Thus, the current estimate is calculated as the average over
    908 multiple rounds, and not just a single sample.
    909 
    910 .. _Algorithm:
    911 
    912 Algorithm
    913 ^^^^^^^^^
    914 
    915 Given that example, one can imagine that the job of the subsystem is to
    916 efficiently communicate the ID of the closest peer to the target value
    917 to all the other peers, who will calculate the estimate from it.
    918 
    919 .. _Target-value:
    920 
    921 Target value
    922 ^^^^^^^^^^^^
    923 
    924 The target value itself is generated by hashing the current time,
    925 rounded down to an agreed value. If the rounding amount is 1h (default)
    926 and the time is 12:34:56, the time to hash would be 12:00:00. The
    927 process is repeated each rounding amount (in this example would be every
    928 hour). Every repetition is called a round.
    929 
    930 .. _Timing:
    931 
    932 Timing
    933 ^^^^^^
    934 
    935 The NSE subsystem has some timing control to avoid everybody
    936 broadcasting its ID all at one. Once each peer has the target random
    937 value, it compares its own ID to the target and calculates the
    938 hypothetical size of the network if that peer were to be the closest.
    939 Then it compares the hypothetical size with the estimate from the
    940 previous rounds. For each value there is an associated point in the
    941 period, let's call it \"broadcast time\". If its own hypothetical
    942 estimate is the same as the previous global estimate, its \"broadcast
    943 time\" will be in the middle of the round. If its bigger it will be
    944 earlier and if its smaller (the most likely case) it will be later. This
    945 ensures that the peers closest to the target value start broadcasting
    946 their ID the first.
    947 
    948 .. _Controlled-Flooding:
    949 
    950 Controlled Flooding
    951 ^^^^^^^^^^^^^^^^^^^
    952 
    953 When a peer receives a value, first it verifies that it is closer than
    954 the closest value it had so far, otherwise it answers the incoming
    955 message with a message containing the better value. Then it checks a
    956 proof of work that must be included in the incoming message, to ensure
    957 that the other peer's ID is not made up (otherwise a malicious peer
    958 could claim to have an ID of exactly the target value every round). Once
    959 validated, it compares the broadcast time of the received value with the
    960 current time and if it's not too early, sends the received value to its
    961 neighbors. Otherwise it stores the value until the correct broadcast
    962 time comes. This prevents unnecessary traffic of sub-optimal values,
    963 since a better value can come before the broadcast time, rendering the
    964 previous one obsolete and saving the traffic that would have been used
    965 to broadcast it to the neighbors.
    966 
    967 .. _Calculating-the-estimate:
    968 
    969 Calculating the estimate
    970 ^^^^^^^^^^^^^^^^^^^^^^^^
    971 
    972 Once the closest ID has been spread across the network each peer gets
    973 the exact distance between this ID and the target value of the round and
    974 calculates the estimate with a mathematical formula described in the
    975 tech report. The estimate generated with this method for a single round
    976 is not very precise. Remember the case of the example, where the only
    977 peer is the ID 44 and we happen to generate the target value 42,
    978 thinking there are 50 peers in the network. Therefore, the NSE subsystem
    979 remembers the last 64 estimates and calculates an average over them,
    980 giving a result of which usually has one bit of uncertainty (the real
    981 size could be half of the estimate or twice as much). Note that the
    982 actual network size is calculated in powers of two of the raw input,
    983 thus one bit of uncertainty means a factor of two in the size estimate.
    984 
    985 .. index::
    986    double: subsystem; PEERINFO
    987 
    988 .. _PEERINFO-Subsystem:
    989 
    990 PEERINFO — Persistent HELLO storage
    991 ===================================
    992 
    993 The PEERINFO subsystem is used to store verified (validated) information
    994 about known peers in a persistent way. It obtains these addresses for
    995 example from TRANSPORT service which is in charge of address validation.
    996 Validation means that the information in the HELLO message are checked
    997 by connecting to the addresses and performing a cryptographic handshake
    998 to authenticate the peer instance stating to be reachable with these
    999 addresses. Peerinfo does not validate the HELLO messages itself but only
   1000 stores them and gives them to interested clients.
   1001 
   1002 As future work, we think about moving from storing just HELLO messages
   1003 to providing a generic persistent per-peer information store. More and
   1004 more subsystems tend to need to store per-peer information in persistent
   1005 way. To not duplicate this functionality we plan to provide a PEERSTORE
   1006 service providing this functionality.
   1007 
   1008 .. _PEERINFO-_002d-Features:
   1009 
   1010 PEERINFO - Features
   1011 -------------------
   1012 
   1013 -  Persistent storage
   1014 
   1015 -  Client notification mechanism on update
   1016 
   1017 -  Periodic clean up for expired information
   1018 
   1019 -  Differentiation between public and friend-only HELLO
   1020 
   1021 .. _PEERINFO-_002d-Limitations:
   1022 
   1023 PEERINFO - Limitations
   1024 ----------------------
   1025 
   1026 -  Does not perform HELLO validation
   1027 
   1028 .. _DeveloperPeer-Information:
   1029 
   1030 DeveloperPeer Information
   1031 -------------------------
   1032 
   1033 The PEERINFO subsystem stores these information in the form of HELLO
   1034 messages you can think of as business cards. These HELLO messages
   1035 contain the public key of a peer and the addresses a peer can be reached
   1036 under. The addresses include an expiration date describing how long they
   1037 are valid. This information is updated regularly by the TRANSPORT
   1038 service by revalidating the address. If an address is expired and not
   1039 renewed, it can be removed from the HELLO message.
   1040 
   1041 Some peer do not want to have their HELLO messages distributed to other
   1042 peers, especially when GNUnet's friend-to-friend modus is enabled. To
   1043 prevent this undesired distribution. PEERINFO distinguishes between
   1044 *public* and *friend-only* HELLO messages. Public HELLO messages can be
   1045 freely distributed to other (possibly unknown) peers (for example using
   1046 the hostlist, gossiping, broadcasting), whereas friend-only HELLO
   1047 messages may not be distributed to other peers. Friend-only HELLO
   1048 messages have an additional flag ``friend_only`` set internally. For
   1049 public HELLO message this flag is not set. PEERINFO does and cannot not
   1050 check if a client is allowed to obtain a specific HELLO type.
   1051 
   1052 The HELLO messages can be managed using the GNUnet HELLO library. Other
   1053 GNUnet systems can obtain these information from PEERINFO and use it for
   1054 their purposes. Clients are for example the HOSTLIST component providing
   1055 these information to other peers in form of a hostlist or the TRANSPORT
   1056 subsystem using these information to maintain connections to other
   1057 peers.
   1058 
   1059 .. _Startup:
   1060 
   1061 Startup
   1062 -------
   1063 
   1064 During startup the PEERINFO services loads persistent HELLOs from disk.
   1065 First PEERINFO parses the directory configured in the HOSTS value of the
   1066 ``PEERINFO`` configuration section to store PEERINFO information. For
   1067 all files found in this directory valid HELLO messages are extracted. In
   1068 addition it loads HELLO messages shipped with the GNUnet distribution.
   1069 These HELLOs are used to simplify network bootstrapping by providing
   1070 valid peer information with the distribution. The use of these HELLOs
   1071 can be prevented by setting the ``USE_INCLUDED_HELLOS`` in the
   1072 ``PEERINFO`` configuration section to ``NO``. Files containing invalid
   1073 information are removed.
   1074 
   1075 .. _Managing-Information:
   1076 
   1077 Managing Information
   1078 --------------------
   1079 
   1080 The PEERINFO services stores information about known PEERS and a single
   1081 HELLO message for every peer. A peer does not need to have a HELLO if no
   1082 information are available. HELLO information from different sources, for
   1083 example a HELLO obtained from a remote HOSTLIST and a second HELLO
   1084 stored on disk, are combined and merged into one single HELLO message
   1085 per peer which will be given to clients. During this merge process the
   1086 HELLO is immediately written to disk to ensure persistence.
   1087 
   1088 PEERINFO in addition periodically scans the directory where information
   1089 are stored for empty HELLO messages with expired TRANSPORT addresses.
   1090 This periodic task scans all files in the directory and recreates the
   1091 HELLO messages it finds. Expired TRANSPORT addresses are removed from
   1092 the HELLO and if the HELLO does not contain any valid addresses, it is
   1093 discarded and removed from the disk.
   1094 
   1095 .. _Obtaining-Information:
   1096 
   1097 Obtaining Information
   1098 ---------------------
   1099 
   1100 When a client requests information from PEERINFO, PEERINFO performs a
   1101 lookup for the respective peer or all peers if desired and transmits
   1102 this information to the client. The client can specify if friend-only
   1103 HELLOs have to be included or not and PEERINFO filters the respective
   1104 HELLO messages before transmitting information.
   1105 
   1106 To notify clients about changes to PEERINFO information, PEERINFO
   1107 maintains a list of clients interested in this notifications. Such a
   1108 notification occurs if a HELLO for a peer was updated (due to a merge
   1109 for example) or a new peer was added.
   1110 
   1111 .. index::
   1112    double: subsystem; PEERSTORE
   1113 
   1114 .. _PEERSTORE-Subsystem:
   1115 
   1116 PEERSTORE — Extensible local persistent data storage
   1117 ====================================================
   1118 
   1119 GNUnet's PEERSTORE subsystem offers persistent per-peer storage for
   1120 other GNUnet subsystems. GNUnet subsystems can use PEERSTORE to
   1121 persistently store and retrieve arbitrary data. Each data record stored
   1122 with PEERSTORE contains the following fields:
   1123 
   1124 -  subsystem: Name of the subsystem responsible for the record.
   1125 
   1126 -  peerid: Identity of the peer this record is related to.
   1127 
   1128 -  key: a key string identifying the record.
   1129 
   1130 -  value: binary record value.
   1131 
   1132 -  expiry: record expiry date.
   1133 
   1134 .. _Functionality:
   1135 
   1136 Functionality
   1137 -------------
   1138 
   1139 Subsystems can store any type of value under a (subsystem, peerid, key)
   1140 combination. A \"replace\" flag set during store operations forces the
   1141 PEERSTORE to replace any old values stored under the same (subsystem,
   1142 peerid, key) combination with the new value. Additionally, an expiry
   1143 date is set after which the record is \*possibly\* deleted by PEERSTORE.
   1144 
   1145 Subsystems can iterate over all values stored under any of the following
   1146 combination of fields:
   1147 
   1148 -  (subsystem)
   1149 
   1150 -  (subsystem, peerid)
   1151 
   1152 -  (subsystem, key)
   1153 
   1154 -  (subsystem, peerid, key)
   1155 
   1156 Subsystems can also request to be notified about any new values stored
   1157 under a (subsystem, peerid, key) combination by sending a \"watch\"
   1158 request to PEERSTORE.
   1159 
   1160 .. _Architecture:
   1161 
   1162 Architecture
   1163 ------------
   1164 
   1165 PEERSTORE implements the following components:
   1166 
   1167 -  PEERSTORE service: Handles store, iterate and watch operations.
   1168 
   1169 -  PEERSTORE API: API to be used by other subsystems to communicate and
   1170    issue commands to the PEERSTORE service.
   1171 
   1172 -  PEERSTORE plugins: Handles the persistent storage. At the moment,
   1173    only an \"sqlite\" plugin is implemented.
   1174 
   1175 .. index::
   1176    double: subsystem; REGEX
   1177 
   1178 .. _REGEX-Subsystem:
   1179 
   1180 REGEX — Service discovery using regular expressions
   1181 ===================================================
   1182 
   1183 Using the REGEX subsystem, you can discover peers that offer a
   1184 particular service using regular expressions. The peers that offer a
   1185 service specify it using a regular expressions. Peers that want to
   1186 patronize a service search using a string. The REGEX subsystem will then
   1187 use the DHT to return a set of matching offerers to the patrons.
   1188 
   1189 For the technical details, we have Max's defense talk and Max's Master's
   1190 thesis.
   1191 
   1192 .. note:: An additional publication is under preparation and available
   1193    to team members (in Git).
   1194 
   1195 .. todo:: Missing links to Max's talk and Master's thesis
   1196 
   1197 .. _How-to-run-the-regex-profiler:
   1198 
   1199 How to run the regex profiler
   1200 -----------------------------
   1201 
   1202 The gnunet-regex-profiler can be used to profile the usage of mesh/regex
   1203 for a given set of regular expressions and strings. Mesh/regex allows
   1204 you to announce your peer ID under a certain regex and search for peers
   1205 matching a particular regex using a string. See
   1206 `szengel2012ms <https://bib.gnunet.org/full/date.html#2012_5f2>`__ for a
   1207 full introduction.
   1208 
   1209 First of all, the regex profiler uses GNUnet testbed, thus all the
   1210 implications for testbed also apply to the regex profiler (for example
   1211 you need password-less ssh login to the machines listed in your hosts
   1212 file).
   1213 
   1214 **Configuration**
   1215 
   1216 Moreover, an appropriate configuration file is needed. In the following
   1217 paragraph the important details are highlighted.
   1218 
   1219 Announcing of the regular expressions is done by the
   1220 gnunet-daemon-regexprofiler, therefore you have to make sure it is
   1221 started, by adding it to the START_ON_DEMAND set of ARM:
   1222 
   1223 ::
   1224 
   1225    [regexprofiler]
   1226    START_ON_DEMAND = YES
   1227 
   1228 Furthermore you have to specify the location of the binary:
   1229 
   1230 ::
   1231 
   1232    [regexprofiler]
   1233    # Location of the gnunet-daemon-regexprofiler binary.
   1234    BINARY = /home/szengel/gnunet/src/mesh/.libs/gnunet-daemon-regexprofiler
   1235    # Regex prefix that will be applied to all regular expressions and
   1236    # search string.
   1237    REGEX_PREFIX = "GNVPN-0001-PAD"
   1238 
   1239 When running the profiler with a large scale deployment, you probably
   1240 want to reduce the workload of each peer. Use the following options to
   1241 do this.
   1242 
   1243 ::
   1244 
   1245    [dht]
   1246    # Force network size estimation
   1247    FORCE_NSE = 1
   1248 
   1249    [dhtcache]
   1250    DATABASE = heap
   1251    # Disable RC-file for Bloom filter? (for benchmarking with limited IO
   1252    # availability)
   1253    DISABLE_BF_RC = YES
   1254    # Disable Bloom filter entirely
   1255    DISABLE_BF = YES
   1256 
   1257    [nse]
   1258    # Minimize proof-of-work CPU consumption by NSE
   1259    WORKBITS = 1
   1260 
   1261 **Options**
   1262 
   1263 To finally run the profiler some options and the input data need to be
   1264 specified on the command line.
   1265 
   1266 ::
   1267 
   1268    gnunet-regex-profiler -c config-file -d log-file -n num-links \
   1269    -p path-compression-length -s search-delay -t matching-timeout \
   1270    -a num-search-strings hosts-file policy-dir search-strings-file
   1271 
   1272 Where\...
   1273 
   1274 -  \... ``config-file`` means the configuration file created earlier.
   1275 
   1276 -  \... ``log-file`` is the file where to write statistics output.
   1277 
   1278 -  \... ``num-links`` indicates the number of random links between
   1279    started peers.
   1280 
   1281 -  \... ``path-compression-length`` is the maximum path compression
   1282    length in the DFA.
   1283 
   1284 -  \... ``search-delay`` time to wait between peers finished linking and
   1285    starting to match strings.
   1286 
   1287 -  \... ``matching-timeout`` timeout after which to cancel the
   1288    searching.
   1289 
   1290 -  \... ``num-search-strings`` number of strings in the
   1291    search-strings-file.
   1292 
   1293 -  \... the ``hosts-file`` should contain a list of hosts for the
   1294    testbed, one per line in the following format:
   1295 
   1296    -  ``user@host_ip:port``
   1297 
   1298 -  \... the ``policy-dir`` is a folder containing text files containing
   1299    one or more regular expressions. A peer is started for each file in
   1300    that folder and the regular expressions in the corresponding file are
   1301    announced by this peer.
   1302 
   1303 -  \... the ``search-strings-file`` is a text file containing search
   1304    strings, one in each line.
   1305 
   1306 You can create regular expressions and search strings for every AS in
   1307 the Internet using the attached scripts. You need one of the `CAIDA
   1308 routeviews
   1309 prefix2as <http://data.caida.org/datasets/routing/routeviews-prefix2as/>`__
   1310 data files for this. Run
   1311 
   1312 ::
   1313 
   1314    create_regex.py <filename> <output path>
   1315 
   1316 to create the regular expressions and
   1317 
   1318 ::
   1319 
   1320    create_strings.py <input path> <outfile>
   1321 
   1322 to create a search strings file from the previously created regular
   1323 expressions.
   1324 
   1325 
   1326 
   1327 .. index::
   1328   double: subsystem; REST
   1329 
   1330 .. _REST-Subsystem:
   1331 
   1332 REST — RESTful GNUnet Web APIs
   1333 ==============================
   1334 
   1335 .. todo:: Define REST
   1336 
   1337 Using the REST subsystem, you can expose REST-based APIs or services.
   1338 The REST service is designed as a pluggable architecture.
   1339 
   1340 **Configuration**
   1341 
   1342 The REST service can be configured in various ways. The reference config
   1343 file can be found in ``src/rest/rest.conf``:
   1344 
   1345 ::
   1346 
   1347    [rest]
   1348    REST_PORT=7776
   1349    REST_ALLOW_HEADERS=Authorization,Accept,Content-Type
   1350    REST_ALLOW_ORIGIN=*
   1351    REST_ALLOW_CREDENTIALS=true
   1352 
   1353 The port as well as CORS (cross-origin resource sharing) headers 
   1354 that are supposed to be advertised by the rest service are configurable.
   1355 
   1356 .. index::
   1357    double: subsystem; REVOCATION
   1358 
   1359 .. _REVOCATION-Subsystem:
   1360 
   1361 REVOCATION — Ego key revocation
   1362 ===============================
   1363 
   1364 The REVOCATION subsystem is responsible for key revocation of Egos. If a
   1365 user learns that their private key has been compromised or has lost it,
   1366 they can use the REVOCATION system to inform all of the other users that
   1367 their private key is no longer valid. The subsystem thus includes ways
   1368 to query for the validity of keys and to propagate revocation messages.
   1369 
   1370 .. _Dissemination:
   1371 
   1372 Dissemination
   1373 -------------
   1374 
   1375 When a revocation is performed, the revocation is first of all
   1376 disseminated by flooding the overlay network. The goal is to reach every
   1377 peer, so that when a peer needs to check if a key has been revoked, this
   1378 will be purely a local operation where the peer looks at its local
   1379 revocation list. Flooding the network is also the most robust form of
   1380 key revocation --- an adversary would have to control a separator of the
   1381 overlay graph to restrict the propagation of the revocation message.
   1382 Flooding is also very easy to implement --- peers that receive a
   1383 revocation message for a key that they have never seen before simply
   1384 pass the message to all of their neighbours.
   1385 
   1386 Flooding can only distribute the revocation message to peers that are
   1387 online. In order to notify peers that join the network later, the
   1388 revocation service performs efficient set reconciliation over the sets
   1389 of known revocation messages whenever two peers (that both support
   1390 REVOCATION dissemination) connect. The SET service is used to perform
   1391 this operation efficiently.
   1392 
   1393 .. _Revocation-Message-Design-Requirements:
   1394 
   1395 Revocation Message Design Requirements
   1396 --------------------------------------
   1397 
   1398 However, flooding is also quite costly, creating O(\|E\|) messages on a
   1399 network with \|E\| edges. Thus, revocation messages are required to
   1400 contain a proof-of-work, the result of an expensive computation (which,
   1401 however, is cheap to verify). Only peers that have expended the CPU time
   1402 necessary to provide this proof will be able to flood the network with
   1403 the revocation message. This ensures that an attacker cannot simply
   1404 flood the network with millions of revocation messages. The
   1405 proof-of-work required by GNUnet is set to take days on a typical PC to
   1406 compute; if the ability to quickly revoke a key is needed, users have
   1407 the option to pre-compute revocation messages to store off-line and use
   1408 instantly after their key has expired.
   1409 
   1410 Revocation messages must also be signed by the private key that is being
   1411 revoked. Thus, they can only be created while the private key is in the
   1412 possession of the respective user. This is another reason to create a
   1413 revocation message ahead of time and store it in a secure location.
   1414 
   1415 .. index::
   1416    double: subsystems; Random peer sampling
   1417    see: RPS; Random peer sampling
   1418 
   1419 .. _RPS-Subsystem:
   1420 
   1421 RPS — Random peer sampling
   1422 ==========================
   1423 
   1424 In literature, Random Peer Sampling (RPS) refers to the problem of
   1425 reliably [1]_ drawing random samples from an unstructured p2p network.
   1426 
   1427 Doing so in a reliable manner is not only hard because of inherent
   1428 problems but also because of possible malicious peers that could try to
   1429 bias the selection.
   1430 
   1431 It is useful for all kind of gossip protocols that require the selection
   1432 of random peers in the whole network like gathering statistics,
   1433 spreading and aggregating information in the network, load balancing and
   1434 overlay topology management.
   1435 
   1436 The approach chosen in the RPS service implementation in GNUnet follows
   1437 the `Brahms <https://bib.gnunet.org/full/date.html\#2009_5f0>`__ design.
   1438 
   1439 The current state is \"work in progress\". There are a lot of things
   1440 that need to be done, primarily finishing the experimental evaluation
   1441 and a re-design of the API.
   1442 
   1443 The abstract idea is to subscribe to connect to/start the RPS service
   1444 and request random peers that will be returned when they represent a
   1445 random selection from the whole network with high probability.
   1446 
   1447 An additional feature to the original Brahms-design is the selection of
   1448 sub-groups: The GNUnet implementation of RPS enables clients to ask for
   1449 random peers from a group that is defined by a common shared secret.
   1450 (The secret could of course also be public, depending on the use-case.)
   1451 
   1452 Another addition to the original protocol was made: The sampler
   1453 mechanism that was introduced in Brahms was slightly adapted and used to
   1454 actually sample the peers and returned to the client. This is necessary
   1455 as the original design only keeps peers connected to random other peers
   1456 in the network. In order to return random peers to client requests
   1457 independently random, they cannot be drawn from the connected peers. The
   1458 adapted sampler makes sure that each request for random peers is
   1459 independent from the others.
   1460 
   1461 .. _Brahms:
   1462 
   1463 Brahms
   1464 ------
   1465 
   1466 The high-level concept of Brahms is two-fold: Combining push-pull gossip
   1467 with locally fixing a assumed bias using cryptographic min-wise
   1468 permutations. The central data structure is the view - a peer's current
   1469 local sample. This view is used to select peers to push to and pull
   1470 from. This simple mechanism can be biased easily. For this reason Brahms
   1471 'fixes' the bias by using the so-called sampler. A data structure that
   1472 takes a list of elements as input and outputs a random one of them
   1473 independently of the frequency in the input set. Both an element that
   1474 was put into the sampler a single time and an element that was put into
   1475 it a million times have the same probability of being the output. This
   1476 is achieved with exploiting min-wise independent permutations. In the
   1477 RPS service we use HMACs: On the initialisation of a sampler element, a
   1478 key is chosen at random. On each input the HMAC with the random key is
   1479 computed. The sampler element keeps the element with the minimal HMAC.
   1480 
   1481 In order to fix the bias in the view, a fraction of the elements in the
   1482 view are sampled through the sampler from the random stream of peer IDs.
   1483 
   1484 According to the theoretical analysis of Bortnikov et al. this suffices
   1485 to keep the network connected and having random peers in the view.
   1486 
   1487 .. [1]
   1488    \"Reliable\" in this context means having no bias, neither spatial,
   1489    nor temporal, nor through malicious activity.
   1490 
   1491 .. index::
   1492    double: STATISTICS; subsystem
   1493 
   1494 .. _STATISTICS-Subsystem:
   1495 
   1496 STATISTICS — Runtime statistics publication
   1497 ===========================================
   1498 
   1499 In GNUnet, the STATISTICS subsystem offers a central place for all
   1500 subsystems to publish unsigned 64-bit integer run-time statistics.
   1501 Keeping this information centrally means that there is a unified way for
   1502 the user to obtain data on all subsystems, and individual subsystems do
   1503 not have to always include a custom data export method for performance
   1504 metrics and other statistics. For example, the TRANSPORT system uses
   1505 STATISTICS to update information about the number of directly connected
   1506 peers and the bandwidth that has been consumed by the various plugins.
   1507 This information is valuable for diagnosing connectivity and performance
   1508 issues.
   1509 
   1510 Following the GNUnet service architecture, the STATISTICS subsystem is
   1511 divided into an API which is exposed through the header
   1512 **gnunet_statistics_service.h** and the STATISTICS service
   1513 **gnunet-service-statistics**. The **gnunet-statistics** command-line
   1514 tool can be used to obtain (and change) information about the values
   1515 stored by the STATISTICS service. The STATISTICS service does not
   1516 communicate with other peers.
   1517 
   1518 Data is stored in the STATISTICS service in the form of tuples
   1519 **(subsystem, name, value, persistence)**. The subsystem determines to
   1520 which other GNUnet's subsystem the data belongs. name is the name
   1521 through which value is associated. It uniquely identifies the record
   1522 from among other records belonging to the same subsystem. In some parts
   1523 of the code, the pair **(subsystem, name)** is called a **statistic** as
   1524 it identifies the values stored in the STATISTCS service.The persistence
   1525 flag determines if the record has to be preserved across service
   1526 restarts. A record is said to be persistent if this flag is set for it;
   1527 if not, the record is treated as a non-persistent record and it is lost
   1528 after service restart. Persistent records are written to and read from
   1529 the file **statistics.data** before shutdown and upon startup. The file
   1530 is located in the HOME directory of the peer.
   1531 
   1532 An anomaly of the STATISTICS service is that it does not terminate
   1533 immediately upon receiving a shutdown signal if it has any clients
   1534 connected to it. It waits for all the clients that are not monitors to
   1535 close their connections before terminating itself. This is to prevent
   1536 the loss of data during peer shutdown — delaying the STATISTICS
   1537 service shutdown helps other services to store important data to
   1538 STATISTICS during shutdown.
   1539 
   1540 .. index:: 
   1541    double: TRANSPORT Next Generation; subsystem
   1542 
   1543 .. _TRANSPORT_002dNG-Subsystem:
   1544 
   1545 TRANSPORT-NG — Next-generation transport management
   1546 ===================================================
   1547 
   1548 The current GNUnet TRANSPORT architecture is rooted in the GNUnet 0.4
   1549 design of using plugins for the actual transmission operations and the
   1550 ATS subsystem to select a plugin and allocate bandwidth. The following
   1551 key issues have been identified with this design:
   1552 
   1553 -  Bugs in one plugin can affect the TRANSPORT service and other
   1554    plugins. There is at least one open bug that affects sockets, where
   1555    the origin is difficult to pinpoint due to the large code base.
   1556 
   1557 -  Relevant operating system default configurations often impose a limit
   1558    of 1024 file descriptors per process. Thus, one plugin may impact
   1559    other plugin's connectivity choices.
   1560 
   1561 -  Plugins are required to offer bi-directional connectivity. However,
   1562    firewalls (incl. NAT boxes) and physical environments sometimes only
   1563    allow uni-directional connectivity, which then currently cannot be
   1564    utilized at all.
   1565 
   1566 -  Distance vector routing was implemented in 209 but shortly afterwards
   1567    broken and due to the complexity of implementing it as a plugin and
   1568    dealing with the resource allocation consequences was never useful.
   1569 
   1570 -  Most existing plugins communicate completely using cleartext,
   1571    exposing metad data (message size) and making it easy to fingerprint
   1572    and possibly block GNUnet traffic.
   1573 
   1574 -  Various NAT traversal methods are not supported.
   1575 
   1576 -  The service logic is cluttered with \"manipulation\" support code for
   1577    TESTBED to enable faking network characteristics like lossy
   1578    connections or firewewalls.
   1579 
   1580 -  Bandwidth allocation is done in ATS, requiring the duplication of
   1581    state and resulting in much delayed allocation decisions. As a
   1582    result, often available bandwidth goes unused. Users are expected to
   1583    manually configure bandwidth limits, instead of TRANSPORT using
   1584    congestion control to adapt automatically.
   1585 
   1586 -  TRANSPORT is difficult to test and has bad test coverage.
   1587 
   1588 -  HELLOs include an absolute expiration time. Nodes with unsynchronized
   1589    clocks cannot connect.
   1590 
   1591 -  Displaying the contents of a HELLO requires the respective plugin as
   1592    the plugin-specific data is encoded in binary. This also complicates
   1593    logging.
   1594 
   1595 .. _Design-goals-of-TNG:
   1596 
   1597 Design goals of TNG
   1598 -------------------
   1599 
   1600 In order to address the above issues, we want to:
   1601 
   1602 -  Move plugins into separate processes which we shall call
   1603    *communicators*. Communicators connect as clients to the transport
   1604    service.
   1605 
   1606 -  TRANSPORT should be able to utilize any number of communicators to the
   1607    same peer at the same time.
   1608 
   1609 -  TRANSPORT should be responsible for fragmentation, retransmission,
   1610    flow- and congestion-control. Users should no longer have to
   1611    configure bandwidth limits: TRANSPORT should detect what is available
   1612    and use it.
   1613 
   1614 -  Communicators should be allowed to be uni-directional and
   1615    unreliable. TRANSPORT shall create bi-directional channels from this
   1616    whenever possible.
   1617 
   1618 -  DV should no longer be a plugin, but part of TRANSPORT.
   1619 
   1620 -  TRANSPORT should provide communicators help communicating, for
   1621    example in the case of uni-directional communicators or the need for
   1622    out-of-band signalling for NAT traversal. We call this functionality
   1623    *backchannels*.
   1624 
   1625 -  Transport manipulation should be signalled to CORE on a per-message
   1626    basis instead of an approximate bandwidth.
   1627 
   1628 -  CORE should signal performance requirements (reliability, latency,
   1629    etc.) on a per-message basis to TRANSPORT. If possible, TRANSPORT
   1630    should consider those options when scheduling messages for
   1631    transmission.
   1632 
   1633 -  HELLOs should be in a human-readable format with monotonic time
   1634    expirations.
   1635 
   1636 The new architecture is planned as follows:
   1637 
   1638 .. image:: /images/tng.png
   1639 
   1640 TRANSPORT's main objective is to establish bi-directional virtual links
   1641 using a variety of possibly uni-directional communicators. Links undergo
   1642 the following steps:
   1643 
   1644 1. Communicator informs TRANSPORT A that a queue (direct neighbour) is
   1645    available, or equivalently TRANSPORT A discovers a (DV) path to a
   1646    target B.
   1647 
   1648 2. TRANSPORT A sends a challenge to the target peer, trying to confirm
   1649    that the peer can receive. FIXME: This is not implemented properly
   1650    for DV. Here we should really take a validated DVH and send a
   1651    challenge exactly down that path!
   1652 
   1653 3. The other TRANSPORT, TRANSPORT B, receives the challenge, and sends
   1654    back a response, possibly using a dierent path. If TRANSPORT B does
   1655    not yet have a virtual link to A, it must try to establish a virtual
   1656    link.
   1657 
   1658 4. Upon receiving the response, TRANSPORT A creates the virtual link. If
   1659    the response included a challenge, TRANSPORT A must respond to this
   1660    challenge as well, eectively re-creating the TCP 3-way handshake
   1661    (just with longer challenge values).
   1662 
   1663 .. _HELLO_002dNG:
   1664 
   1665 HELLO-NG
   1666 --------
   1667 
   1668 HELLOs change in three ways. First of all, communicators encode the
   1669 respective addresses in a human-readable URL-like string. This way, we
   1670 do no longer require the communicator to print the contents of a HELLO.
   1671 Second, HELLOs no longer contain an expiration time, only a creation
   1672 time. The receiver must only compare the respective absolute values. So
   1673 given a HELLO from the same sender with a larger creation time, then the
   1674 old one is no longer valid. This also obsoletes the need for the
   1675 gnunet-hello binary to set HELLO expiration times to never. Third, a
   1676 peer no longer generates one big HELLO that always contains all of the
   1677 addresses. Instead, each address is signed individually and shared only
   1678 over the address scopes where it makes sense to share the address. In
   1679 particular, care should be taken to not share MACs across the Internet
   1680 and confine their use to the LAN. As each address is signed separately,
   1681 having multiple addresses valid at the same time (given the new creation
   1682 time expiration logic) requires that those addresses must have exactly
   1683 the same creation time. Whenever that monotonic time is increased, all
   1684 addresses must be re-signed and re-distributed.
   1685 
   1686 .. _Priorities-and-preferences:
   1687 
   1688 Priorities and preferences
   1689 --------------------------
   1690 
   1691 In the new design, TRANSPORT adopts a feature (which was previously
   1692 already available in CORE) of the MQ API to allow applications to
   1693 specify priorities and preferences per message (or rather, per MQ
   1694 envelope). The (updated) MQ API allows applications to specify one of
   1695 four priority levels as well as desired preferences for transmission by
   1696 setting options on an envelope. These preferences currently are:
   1697 
   1698 -  GNUNET_MQ_PREF_UNRELIABLE: Disables TRANSPORT waiting for ACKS on
   1699    unreliable channels like UDP. Now it is fire and forget. These
   1700    messages then cannot be used for RTT estimates either.
   1701 
   1702 -  GNUNET_MQ_PREF_LOW_LATENCY: Directs TRANSPORT to select the
   1703    lowest-latency transmission choices possible.
   1704 
   1705 -  GNUNET_MQ_PREF_CORK_ALLOWED: Allows TRANSPORT to delay transmission
   1706    to group the message with other messages into a larger batch to
   1707    reduce the number of packets sent.
   1708 
   1709 -  GNUNET_MQ_PREF_GOODPUT: Directs TRANSPORT to select the highest
   1710    goodput channel available.
   1711 
   1712 -  GNUNET_MQ_PREF_OUT_OF_ORDER: Allows TRANSPORT to reorder the messages
   1713    as it sees fit, otherwise TRANSPORT should attempt to preserve
   1714    transmission order.
   1715 
   1716 Each MQ envelope is always able to store those options (and the
   1717 priority), and in the future this uniform API will be used by TRANSPORT,
   1718 CORE, CADET and possibly other subsystems that send messages (like
   1719 LAKE). When CORE sets preferences and priorities, it is supposed to
   1720 respect the preferences and priorities it is given from higher layers.
   1721 Similarly, CADET also simply passes on the preferences and priorities of
   1722 the layer above CADET. When a layer combines multiple smaller messages
   1723 into one larger transmission, the ``GNUNET_MQ_env_combine_options()``
   1724 should be used to calculate options for the combined message. We note
   1725 that the exact semantics of the options may differ by layer. For
   1726 example, CADET will always strictly implement reliable and in-order
   1727 delivery of messages, while the same options are only advisory for
   1728 TRANSPORT and CORE: they should try (using ACKs on unreliable
   1729 communicators, not changing the message order themselves), but if
   1730 messages are lost anyway (e.g. because a TCP is dropped in the middle),
   1731 or if messages are reordered (e.g. because they took different paths
   1732 over the network and arrived in a different order) TRANSPORT and CORE do
   1733 not have to correct this. Whether a preference is strict or loose is
   1734 thus dened by the respective layer.
   1735 
   1736 .. _Communicators:
   1737 
   1738 Communicators
   1739 -------------
   1740 
   1741 The API for communicators is defined in
   1742 ``gnunet_transport_communication_service.h``. Each communicator must
   1743 specify its (global) communication characteristics, which for now only
   1744 say whether the communication is reliable (e.g. TCP, HTTPS) or
   1745 unreliable (e.g. UDP, WLAN). Each communicator must specify a unique
   1746 address prex, or NULL if the communicator cannot establish outgoing
   1747 connections (for example because it is only acting as a TCP server). A
   1748 communicator must tell TRANSPORT which addresses it is reachable under.
   1749 Addresses may be added or removed at any time. A communicator may have
   1750 zero addresses (transmission only). Addresses do not have to match the
   1751 address prefix.
   1752 
   1753 TRANSPORT may ask a communicator to try to connect to another address.
   1754 TRANSPORT will only ask for connections where the address matches the
   1755 communicator's address prefix that was provided when the connection was
   1756 established. Communicators should then attempt to establish a
   1757 connection.
   1758 It is under the discretion of the communicator whether to honor this request.
   1759 Reasons for not honoring such a request may be that an existing connection exists
   1760 or resource limitations.
   1761 No response is provided to TRANSPORT service on failure.
   1762 The TRANSPORT service has to ask the communicator explicitly to retry.
   1763 
   1764 If a communicator succeeds in establishing an outgoing connection for
   1765 transmission, or if a communicator receives an incoming bi-directional
   1766 connection, the communicator must inform the TRANSPORT service that a
   1767 message queue (MQ) for transmission is now available.
   1768 For that MQ, the communicator must provide the peer identity claimed by the other end.
   1769 It must also provide a human-readable address (for debugging) and a maximum transfer unit
   1770 (MTU). A MTU of zero means sending is not supported, SIZE_MAX should be
   1771 used for no MTU. The communicator should also tell TRANSPORT what
   1772 network type is used for the queue. The communicator may tell TRANSPORT
   1773 anytime that the queue was deleted and is no longer available.
   1774 
   1775 The communicator API also provides for flow control. First,
   1776 communicators exhibit back-pressure on TRANSPORT: the number of messages
   1777 TRANSPORT may add to a queue for transmission will be limited. So by not
   1778 draining the transmission queue, back-pressure is provided to TRANSPORT.
   1779 In the other direction, communicators may allow TRANSPORT to give
   1780 back-pressure towards the communicator by providing a non-NULL
   1781 ``GNUNET_TRANSPORT_MessageCompletedCallback`` argument to the
   1782 ``GNUNET_TRANSPORT_communicator_receive`` function. In this case,
   1783 TRANSPORT will only invoke this function once it has processed the
   1784 message and is ready to receive more. Communicators should then limit
   1785 how much traffic they receive based on this backpressure. Note that
   1786 communicators do not have to provide a
   1787 ``GNUNET_TRANSPORT_MessageCompletedCallback``; for example, UDP cannot
   1788 support back-pressure due to the nature of the UDP protocol. In this
   1789 case, TRANSPORT will implement its own TRANSPORT-to-TRANSPORT flow
   1790 control to reduce the sender's data rate to acceptable levels.
   1791 
   1792 TRANSPORT may notify a communicator about backchannel messages TRANSPORT
   1793 received from other peers for this communicator. Similarly,
   1794 communicators can ask TRANSPORT to try to send a backchannel message to
   1795 other communicators of other peers. The semantics of the backchannel
   1796 message are up to the communicators which use them. TRANSPORT may fail
   1797 transmitting backchannel messages, and TRANSPORT will not attempt to
   1798 retransmit them.
   1799 
   1800 UDP communicator
   1801 ^^^^^^^^^^^^^^^^
   1802 
   1803 The UDP communicator implements a basic encryption layer to protect from
   1804 metadata leakage.
   1805 The layer tries to establish a shared secret using an Elliptic-Curve Diffie-Hellman
   1806 key exchange in which the initiator of a packet creates an ephemeral key pair
   1807 to encrypt a message for the target peer identity.
   1808 The communicator always offers this kind of transmission queue to a (reachable)
   1809 peer in which messages are encrypted with dedicated keys.
   1810 The performance of this queue is not suitable for high volume data transfer.
   1811 
   1812 If the UDP connection is bi-directional, or the TRANSPORT is able to offer a
   1813 backchannel connection, the resulting key can be re-used if the recieving peer
   1814 is able to ACK the reception.
   1815 This will cause the communicator to offer a new queue (with a higher priority
   1816 than the default queue) to TRANSPORT with a limited capacity.
   1817 The capacity is increased whenever the communicator receives an ACK for a
   1818 transmission.
   1819 This queue is suitable for high-volume data transfer and TRANSPORT will likely
   1820 prioritize this queue (if available).
   1821 
   1822 Communicators that try to establish a connection to a target peer authenticate 
   1823 their peer ID (public key) in the first packets by signing a monotonic time
   1824 stamp, its peer ID, and the target peerID and send this data as well as the signature
   1825 in one of the first packets.
   1826 Receivers should keep track (persist) of the monotonic time stamps for each
   1827 peer ID to reject possible replay attacks.
   1828 
   1829 FIXME: Handshake wire format? KX, Flow.
   1830 
   1831 TCP communicator
   1832 ^^^^^^^^^^^^^^^^
   1833 
   1834 FIXME: Handshake wire format? KX, Flow.
   1835 
   1836 QUIC communicator
   1837 ^^^^^^^^^^^^^^^^^
   1838 The QUIC communicator runs over a bi-directional UDP connection.
   1839 TLS layer with self-signed certificates (binding/signed with peer ID?).
   1840 Single, bi-directional stream?
   1841 FIXME: Handshake wire format? KX, Flow.
   1842 
   1843 .. index::
   1844    double: TRANSPORT; subsystem
   1845 
   1846 .. _TRANSPORT-Subsystem:
   1847 
   1848 TRANSPORT — Overlay transport management
   1849 ========================================
   1850 
   1851 This chapter documents how the GNUnet transport subsystem works. The
   1852 GNUnet transport subsystem consists of three main components: the
   1853 transport API (the interface used by the rest of the system to access
   1854 the transport service), the transport service itself (most of the
   1855 interesting functions, such as choosing transports, happens here) and
   1856 the transport plugins. A transport plugin is a concrete implementation
   1857 for how two GNUnet peers communicate; many plugins exist, for example
   1858 for communication via TCP, UDP, HTTP, HTTPS and others. Finally, the
   1859 transport subsystem uses supporting code, especially the NAT/UPnP
   1860 library to help with tasks such as NAT traversal.
   1861 
   1862 Key tasks of the transport service include:
   1863 
   1864 -  Create our HELLO message, notify clients and neighbours if our HELLO
   1865    changes (using NAT library as necessary)
   1866 
   1867 -  Validate HELLOs from other peers (send PING), allow other peers to
   1868    validate our HELLO's addresses (send PONG)
   1869 
   1870 -  Upon request, establish connections to other peers (using address
   1871    selection from ATS subsystem) and maintain them (again using PINGs
   1872    and PONGs) as long as desired
   1873 
   1874 -  Accept incoming connections, give ATS service the opportunity to
   1875    switch communication channels
   1876 
   1877 -  Notify clients about peers that have connected to us or that have
   1878    been disconnected from us
   1879 
   1880 -  If a (stateful) connection goes down unexpectedly (without explicit
   1881    DISCONNECT), quickly attempt to recover (without notifying clients)
   1882    but do notify clients quickly if reconnecting fails
   1883 
   1884 -  Send (payload) messages arriving from clients to other peers via
   1885    transport plugins and receive messages from other peers, forwarding
   1886    those to clients
   1887 
   1888 -  Enforce inbound traffic limits (using flow-control if it is
   1889    applicable); outbound traffic limits are enforced by CORE, not by us
   1890    (!)
   1891 
   1892 -  Enforce restrictions on P2P connection as specified by the blacklist
   1893    configuration and blacklisting clients
   1894 
   1895 Note that the term \"clients\" in the list above really refers to the
   1896 GNUnet-CORE service, as CORE is typically the only client of the
   1897 transport service.
   1898 
   1899 .. index::
   1900    double: subsystem; SET
   1901 
   1902 .. _SET-Subsystem:
   1903 
   1904 SET — Peer to peer set operations (Deprecated)
   1905 ==============================================
   1906 
   1907 .. note:: 
   1908 
   1909    The SET subsystem is in process of being replaced by the SETU and SETI
   1910    subsystems, which provide basically the same functionality, just using
   1911    two different subsystems. SETI and SETU should be used for new code.
   1912 
   1913 The SET service implements efficient set operations between two peers
   1914 over a CADET tunnel. Currently, set union and set intersection are the
   1915 only supported operations. Elements of a set consist of an *element
   1916 type* and arbitrary binary *data*. The size of an element's data is
   1917 limited to around 62 KB.
   1918 
   1919 .. _Local-Sets:
   1920 
   1921 Local Sets
   1922 ----------
   1923 
   1924 Sets created by a local client can be modified and reused for multiple
   1925 operations. As each set operation requires potentially expensive special
   1926 auxiliary data to be computed for each element of a set, a set can only
   1927 participate in one type of set operation (either union or intersection).
   1928 The type of a set is determined upon its creation. If a the elements of
   1929 a set are needed for an operation of a different type, all of the set's
   1930 element must be copied to a new set of appropriate type.
   1931 
   1932 .. _Set-Modifications:
   1933 
   1934 Set Modifications
   1935 -----------------
   1936 
   1937 Even when set operations are active, one can add to and remove elements
   1938 from a set. However, these changes will only be visible to operations
   1939 that have been created after the changes have taken place. That is,
   1940 every set operation only sees a snapshot of the set from the time the
   1941 operation was started. This mechanism is *not* implemented by copying
   1942 the whole set, but by attaching *generation information* to each element
   1943 and operation.
   1944 
   1945 .. _Set-Operations:
   1946 
   1947 Set Operations
   1948 --------------
   1949 
   1950 Set operations can be started in two ways: Either by accepting an
   1951 operation request from a remote peer, or by requesting a set operation
   1952 from a remote peer. Set operations are uniquely identified by the
   1953 involved *peers*, an *application id* and the *operation type*.
   1954 
   1955 The client is notified of incoming set operations by *set listeners*. A
   1956 set listener listens for incoming operations of a specific operation
   1957 type and application id. Once notified of an incoming set request, the
   1958 client can accept the set request (providing a local set for the
   1959 operation) or reject it.
   1960 
   1961 .. _Result-Elements:
   1962 
   1963 Result Elements
   1964 ---------------
   1965 
   1966 The SET service has three *result modes* that determine how an
   1967 operation's result set is delivered to the client:
   1968 
   1969 -  **Full Result Set.** All elements of set resulting from the set
   1970    operation are returned to the client.
   1971 
   1972 -  **Added Elements.** Only elements that result from the operation and
   1973    are not already in the local peer's set are returned. Note that for
   1974    some operations (like set intersection) this result mode will never
   1975    return any elements. This can be useful if only the remove peer is
   1976    actually interested in the result of the set operation.
   1977 
   1978 -  **Removed Elements.** Only elements that are in the local peer's
   1979    initial set but not in the operation's result set are returned. Note
   1980    that for some operations (like set union) this result mode will never
   1981    return any elements. This can be useful if only the remove peer is
   1982    actually interested in the result of the set operation.
   1983 
   1984 .. index::
   1985    double: subsystem; SETI
   1986 
   1987 .. _SETI-Subsystem:
   1988 
   1989 SETI — Peer to peer set intersections
   1990 =====================================
   1991 
   1992 The SETI service implements efficient set intersection between two peers
   1993 over a CADET tunnel. Elements of a set consist of an *element type* and
   1994 arbitrary binary *data*. The size of an element's data is limited to
   1995 around 62 KB.
   1996 
   1997 .. _Intersection-Sets:
   1998 
   1999 Intersection Sets
   2000 -----------------
   2001 
   2002 Sets created by a local client can be modified (by adding additional
   2003 elements) and reused for multiple operations. If elements are to be
   2004 removed, a fresh set must be created by the client.
   2005 
   2006 .. _Set-Intersection-Modifications:
   2007 
   2008 Set Intersection Modifications
   2009 ------------------------------
   2010 
   2011 Even when set operations are active, one can add elements to a set.
   2012 However, these changes will only be visible to operations that have been
   2013 created after the changes have taken place. That is, every set operation
   2014 only sees a snapshot of the set from the time the operation was started.
   2015 This mechanism is *not* implemented by copying the whole set, but by
   2016 attaching *generation information* to each element and operation.
   2017 
   2018 .. _Set-Intersection-Operations:
   2019 
   2020 Set Intersection Operations
   2021 ---------------------------
   2022 
   2023 Set operations can be started in two ways: Either by accepting an
   2024 operation request from a remote peer, or by requesting a set operation
   2025 from a remote peer. Set operations are uniquely identified by the
   2026 involved *peers*, an *application id* and the *operation type*.
   2027 
   2028 The client is notified of incoming set operations by *set listeners*. A
   2029 set listener listens for incoming operations of a specific operation
   2030 type and application id. Once notified of an incoming set request, the
   2031 client can accept the set request (providing a local set for the
   2032 operation) or reject it.
   2033 
   2034 .. _Intersection-Result-Elements:
   2035 
   2036 Intersection Result Elements
   2037 ----------------------------
   2038 
   2039 The SET service has two *result modes* that determine how an operation's
   2040 result set is delivered to the client:
   2041 
   2042 -  **Return intersection.** All elements of set resulting from the set
   2043    intersection are returned to the client.
   2044 
   2045 -  **Removed Elements.** Only elements that are in the local peer's
   2046    initial set but not in the intersection are returned.
   2047 
   2048 
   2049 
   2050 
   2051 .. index:: 
   2052    double: SETU; subsystem
   2053 
   2054 .. _SETU-Subsystem:
   2055 
   2056 SETU — Peer to peer set unions
   2057 ==============================
   2058 
   2059 The SETU service implements efficient set union operations between two
   2060 peers over a CADET tunnel. Elements of a set consist of an *element
   2061 type* and arbitrary binary *data*. The size of an element's data is
   2062 limited to around 62 KB.
   2063 
   2064 .. _Union-Sets:
   2065 
   2066 Union Sets
   2067 ----------
   2068 
   2069 Sets created by a local client can be modified (by adding additional
   2070 elements) and reused for multiple operations. If elements are to be
   2071 removed, a fresh set must be created by the client.
   2072 
   2073 .. _Set-Union-Modifications:
   2074 
   2075 Set Union Modifications
   2076 -----------------------
   2077 
   2078 Even when set operations are active, one can add elements to a set.
   2079 However, these changes will only be visible to operations that have been
   2080 created after the changes have taken place. That is, every set operation
   2081 only sees a snapshot of the set from the time the operation was started.
   2082 This mechanism is *not* implemented by copying the whole set, but by
   2083 attaching *generation information* to each element and operation.
   2084 
   2085 .. _Set-Union-Operations:
   2086 
   2087 Set Union Operations
   2088 --------------------
   2089 
   2090 Set operations can be started in two ways: Either by accepting an
   2091 operation request from a remote peer, or by requesting a set operation
   2092 from a remote peer. Set operations are uniquely identified by the
   2093 involved *peers*, an *application id* and the *operation type*.
   2094 
   2095 The client is notified of incoming set operations by *set listeners*. A
   2096 set listener listens for incoming operations of a specific operation
   2097 type and application id. Once notified of an incoming set request, the
   2098 client can accept the set request (providing a local set for the
   2099 operation) or reject it.
   2100 
   2101 .. _Union-Result-Elements:
   2102 
   2103 Union Result Elements
   2104 ---------------------
   2105 
   2106 The SET service has three *result modes* that determine how an
   2107 operation's result set is delivered to the client:
   2108 
   2109 -  **Locally added Elements.** Elements that are in the union but not
   2110    already in the local peer's set are returned.
   2111 
   2112 -  **Remote added Elements.** Additionally, notify the client if the
   2113    remote peer lacked some elements and thus also return to the local
   2114    client those elements that we are sending to the remote peer to be
   2115    added to its union. Obtaining these elements requires setting the
   2116    ``GNUNET_SETU_OPTION_SYMMETRIC`` option.