quickjs-tart

quickjs-based runtime for wallet-core logic
Log | Files | Refs | README | LICENSE

UINT_SETS.md (5567B)


      1 <!--
      2 Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
      3 
      4 SPDX-License-Identifier: curl
      5 -->
      6 
      7 # Unsigned Int Sets
      8 
      9 The multi handle tracks added easy handles via an unsigned int
     10 it calls an `mid`. There are four data structures for unsigned int
     11 optimized for the multi use case.
     12 
     13 ## `uint_tbl`
     14 
     15 `uint_table`, implemented in `uint-table.[ch]` manages an array
     16 of `void *`. The unsigned int are the index into this array. It is
     17 created with a *capacity* which can be *resized*. The table assigns
     18 the index when a `void *` is *added*. It keeps track of the last
     19 assigned index and uses the next available larger index for a
     20 subsequent add. Reaching *capacity* it wraps around.
     21 
     22 The table *can not* store `NULL` values. The largest possible index
     23 is `UINT_MAX - 1`.
     24 
     25 The table is iterated over by asking for the *first* existing index,
     26 meaning the smallest number that has an entry, if the table is not
     27 empty. To get the *next* entry, one passes the index of the previous
     28 iteration step. It does not matter if the previous index is still
     29 in the table. Sample code for a table iteration would look like this:
     30 
     31 ```c
     32 unsigned int mid;
     33 void *entry;
     34 
     35 if(Curl_uint_tbl_first(tbl, &mid, &entry)) {
     36   do {
     37      /* operate on entry with index mid */
     38   }
     39   while(Curl_uint_tbl_next(tbl, mid, &mid, &entry));
     40 }
     41 
     42 ```
     43 
     44 This iteration has the following properties:
     45 
     46 * entries in the table can be added/removed safely.
     47 * all entries that are not removed during the iteration are visited.
     48 * the table may be resized to a larger capacity without affecting visited entries.
     49 * entries added with a larger index than the current are visited.
     50 
     51 ### Memory
     52 
     53 For storing 1000 entries, the table would allocate one block of 8KB on a 64-bit system,
     54 plus the 2 pointers and 3 unsigned int in its base `struct uint_tbl`. A resize
     55 allocates a completely new pointer array, copy the existing entries and free the previous one.
     56 
     57 ### Performance
     58 
     59 Lookups of entries are only an index into the array, O(1) with a tiny 1. Adding
     60 entries and iterations are more work:
     61 
     62 1. adding an entry means "find the first free index larger than the previous assigned
     63   one". Worst case for this is a table with only a single free index where `capacity - 1`
     64   checks on `NULL` values would be performed, O(N). If the single free index is randomly
     65   distributed, this would be O(N/2).
     66 2. iterating a table scans for the first not `NULL` entry after the start index. This
     67   makes a complete iteration O(N) work.
     68 
     69 In the multi use case, point 1 is remedied by growing the table so that a good chunk
     70 of free entries always exists.
     71 
     72 Point 2 is less of an issue for a multi, since it does not really matter when the
     73 number of transfer is relatively small. A multi managing a larger set needs to operate
     74 event based anyway and table iterations rarely are needed.
     75 
     76 For these reasons, the simple implementation was preferred. Should this become
     77 a concern, there are options like "free index lists" or, alternatively, an internal
     78 bitset that scans better.
     79 
     80 ## `uint_bset`
     81 
     82 A bitset for unsigned integers, allowing fast add/remove operations. It is initialized
     83 with a *capacity*, meaning it can store only the numbers in the range `[0, capacity-1]`.
     84 It can be *resized* and safely *iterated*. `uint_bset` is designed to operate in combination with `uint_tbl`.
     85 
     86 The bitset keeps an array of `curl_uint64_t`. The first array entry keeps the numbers 0 to 63, the
     87 second 64 to 127 and so on. A bitset with capacity 1024 would therefore allocate an array
     88 of 16 64-bit values (128 bytes). Operations for an unsigned int divide it by 64 for the array index and then check/set/clear the bit of the remainder.
     89 
     90 Iterator works the same as with `uint_tbl`: ask the bitset for the *first* number present and
     91 then use that to get the *next* higher number present. Like the table, this safe for
     92 adds/removes and growing the set while iterating.
     93 
     94 ### Memory
     95 
     96 The set only needs 1 bit for each possible number.
     97 A bitset for 40000 transfers occupies 5KB of memory.
     98 
     99 ## Performance
    100 
    101 Operations for add/remove/check are O(1). Iteration needs to scan for the next bit set. The
    102 number of scans is small (see memory footprint) and, for checking bits, many compilers
    103 offer primitives for special CPU instructions.
    104 
    105 ## `uint_spbset`
    106 
    107 While the memory footprint of `uint_bset` is good, it still needs 5KB to store the single number 40000. This
    108 is not optimal when many are needed. For example, in event based processing, each socket needs to
    109 keep track of the transfers involved. There are many sockets potentially, but each one mostly tracks
    110 a single transfer or few (on HTTP/2 connection borderline up to 100).
    111 
    112 For such uses cases, the `uint_spbset` is intended: track a small number of unsigned int, potentially
    113 rather "close" together. It keeps "chunks" with an offset and has no capacity limit.
    114 
    115 Example: adding the number 40000 to an empty sparse bitset would have one chunk with offset 39936, keeping
    116 track of the numbers 39936 to 40192 (a chunk has 4 64-bit values). The numbers in that range can be handled
    117 without further allocations.
    118 
    119 The worst case is then storing 100 numbers that lie in separate intervals. Then 100 chunks
    120 would need to be allocated and linked, resulting in overall 4 KB of memory used.
    121 
    122 Iterating a sparse bitset works the same as for bitset and table.
    123 
    124 ## `uint_hash`
    125 
    126 At last, there are places in libcurl such as the HTTP/2 and HTTP/3 protocol implementations that need
    127 to store their own data related to a transfer. `uint_hash` allows then to associate an unsigned int,
    128 e.g. the transfer's `mid`, to their own data.