summaryrefslogtreecommitdiff
path: root/benchmark/writing-and-running-benchmarks.md
diff options
context:
space:
mode:
Diffstat (limited to 'benchmark/writing-and-running-benchmarks.md')
-rw-r--r--benchmark/writing-and-running-benchmarks.md475
1 files changed, 475 insertions, 0 deletions
diff --git a/benchmark/writing-and-running-benchmarks.md b/benchmark/writing-and-running-benchmarks.md
new file mode 100644
index 0000000000..6097830aee
--- /dev/null
+++ b/benchmark/writing-and-running-benchmarks.md
@@ -0,0 +1,475 @@
+# How to Write and Run Benchmarks in Node.js Core
+
+## Table of Contents
+
+* [Prerequisites](#prerequisites)
+ * [HTTP Benchmark Requirements](#http-benchmark-requirements)
+ * [Benchmark Analysis Requirements](#benchmark-analysis-requirements)
+* [Running benchmarks](#running-benchmarks)
+ * [Running individual benchmarks](#running-individual-benchmarks)
+ * [Running all benchmarks](#running-all-benchmarks)
+ * [Comparing Node.js versions](#comparing-nodejs-versions)
+ * [Comparing parameters](#comparing-parameters)
+ * [Running Benchmarks on the CI](#running-benchmarks-on-the-ci)
+* [Creating a benchmark](#creating-a-benchmark)
+ * [Basics of a benchmark](#basics-of-a-benchmark)
+ * [Creating an HTTP benchmark](#creating-an-http-benchmark)
+
+## Prerequisites
+
+Basic Unix tools are required for some benchmarks.
+[Git for Windows][git-for-windows] includes Git Bash and the necessary tools,
+which need to be included in the global Windows `PATH`.
+
+### HTTP Benchmark Requirements
+
+Most of the HTTP benchmarks require a benchmarker to be installed. This can be
+either [`wrk`][wrk] or [`autocannon`][autocannon].
+
+`Autocannon` is a Node.js script that can be installed using
+`npm install -g autocannon`. It will use the Node.js executable that is in the
+path. In order to compare two HTTP benchmark runs, make sure that the
+Node.js version in the path is not altered.
+
+`wrk` may be available through one of the available package managers. If not,
+it can be easily built [from source][wrk] via `make`.
+
+By default, `wrk` will be used as the benchmarker. If it is not available,
+`autocannon` will be used in its place. When creating an HTTP benchmark, the
+benchmarker to be used should be specified by providing it as an argument:
+
+`node benchmark/run.js --set benchmarker=autocannon http`
+
+`node benchmark/http/simple.js benchmarker=autocannon`
+
+#### HTTP/2 Benchmark Requirements
+
+To run the `http2` benchmarks, the `h2load` benchmarker must be used. The
+`h2load` tool is a component of the `nghttp2` project and may be installed
+from [nghttp2.org][] or built from source.
+
+`node benchmark/http2/simple.js benchmarker=autocannon`
+
+### Benchmark Analysis Requirements
+
+To analyze the results, `R` should be installed. Use one of the available
+package managers or download it from https://www.r-project.org/.
+
+The R packages `ggplot2` and `plyr` are also used and can be installed using
+the R REPL.
+
+```R
+$ R
+install.packages("ggplot2")
+install.packages("plyr")
+```
+
+In the event that a message is reported stating that a CRAN mirror must be
+selected first, specify a mirror by adding in the repo parameter.
+
+If we used the "http://cran.us.r-project.org" mirror, it could look something
+like this:
+
+```R
+install.packages("ggplot2", repo="http://cran.us.r-project.org")
+```
+
+Of course, use an appropriate mirror based on location.
+A list of mirrors is [located here](https://cran.r-project.org/mirrors.html).
+
+## Running benchmarks
+
+### Running individual benchmarks
+
+This can be useful for debugging a benchmark or doing a quick performance
+measure. But it does not provide the statistical information to make any
+conclusions about the performance.
+
+Individual benchmarks can be executed by simply executing the benchmark script
+with node.
+
+```console
+$ node benchmark/buffers/buffer-tostring.js
+
+buffers/buffer-tostring.js n=10000000 len=0 arg=true: 62710590.393305704
+buffers/buffer-tostring.js n=10000000 len=1 arg=true: 9178624.591787899
+buffers/buffer-tostring.js n=10000000 len=64 arg=true: 7658962.8891432695
+buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 4136904.4060201733
+buffers/buffer-tostring.js n=10000000 len=0 arg=false: 22974354.231509723
+buffers/buffer-tostring.js n=10000000 len=1 arg=false: 11485945.656765845
+buffers/buffer-tostring.js n=10000000 len=64 arg=false: 8718280.70650129
+buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 4103857.0726124765
+```
+
+Each line represents a single benchmark with parameters specified as
+`${variable}=${value}`. Each configuration combination is executed in a separate
+process. This ensures that benchmark results aren't affected by the execution
+order due to V8 optimizations. **The last number is the rate of operations
+measured in ops/sec (higher is better).**
+
+Furthermore a subset of the configurations can be specified, by setting them in
+the process arguments:
+
+```console
+$ node benchmark/buffers/buffer-tostring.js len=1024
+
+buffers/buffer-tostring.js n=10000000 len=1024 arg=true: 3498295.68561504
+buffers/buffer-tostring.js n=10000000 len=1024 arg=false: 3783071.1678948295
+```
+
+### Running all benchmarks
+
+Similar to running individual benchmarks, a group of benchmarks can be executed
+by using the `run.js` tool. To see how to use this script,
+run `node benchmark/run.js`. Again this does not provide the statistical
+information to make any conclusions.
+
+```console
+$ node benchmark/run.js assert
+
+assert/deepequal-buffer.js
+assert/deepequal-buffer.js method="deepEqual" strict=0 len=100 n=20000: 773,200.4995493788
+assert/deepequal-buffer.js method="notDeepEqual" strict=0 len=100 n=20000: 964,411.712953848
+...
+
+assert/deepequal-map.js
+assert/deepequal-map.js method="deepEqual_primitiveOnly" strict=0 len=500 n=500: 20,445.06368453332
+assert/deepequal-map.js method="deepEqual_objectOnly" strict=0 len=500 n=500: 1,393.3481642240833
+...
+
+assert/deepequal-object.js
+assert/deepequal-object.js method="deepEqual" strict=0 size=100 n=5000: 1,053.1950937538475
+assert/deepequal-object.js method="notDeepEqual" strict=0 size=100 n=5000: 9,734.193251965213
+...
+```
+
+It is possible to execute more groups by adding extra process arguments.
+
+```console
+$ node benchmark/run.js assert async_hooks
+```
+
+### Comparing Node.js versions
+
+To compare the effect of a new Node.js version use the `compare.js` tool. This
+will run each benchmark multiple times, making it possible to calculate
+statistics on the performance measures. To see how to use this script,
+run `node benchmark/compare.js`.
+
+As an example on how to check for a possible performance improvement, the
+[#5134](https://github.com/nodejs/node/pull/5134) pull request will be used as
+an example. This pull request _claims_ to improve the performance of the
+`string_decoder` module.
+
+First build two versions of Node.js, one from the master branch (here called
+`./node-master`) and another with the pull request applied (here called
+`./node-pr-5134`).
+
+To run multiple compiled versions in parallel you need to copy the output of the
+build: `cp ./out/Release/node ./node-master`. Check out the following example:
+
+```console
+$ git checkout master
+$ ./configure && make -j4
+$ cp ./out/Release/node ./node-master
+
+$ git checkout pr-5134
+$ ./configure && make -j4
+$ cp ./out/Release/node ./node-pr-5134
+```
+
+The `compare.js` tool will then produce a csv file with the benchmark results.
+
+```console
+$ node benchmark/compare.js --old ./node-master --new ./node-pr-5134 string_decoder > compare-pr-5134.csv
+```
+
+*Tips: there are some useful options of `benchmark/compare.js`. For example,
+if you want to compare the benchmark of a single script instead of a whole
+module, you can use the `--filter` option:*
+
+```console
+ --new ./new-node-binary new node binary (required)
+ --old ./old-node-binary old node binary (required)
+ --runs 30 number of samples
+ --filter pattern string to filter benchmark scripts
+ --set variable=value set benchmark variable (can be repeated)
+ --no-progress don't show benchmark progress indicator
+```
+
+For analysing the benchmark results use the `compare.R` tool.
+
+```console
+$ cat compare-pr-5134.csv | Rscript benchmark/compare.R
+
+ confidence improvement accuracy (*) (**) (***)
+ string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii' *** -3.76 % ±1.36% ±1.82% ±2.40%
+ string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='utf8' ** -0.81 % ±0.53% ±0.71% ±0.93%
+ string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii' *** -2.70 % ±0.83% ±1.11% ±1.45%
+ string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='base64-ascii' *** -1.57 % ±0.83% ±1.11% ±1.46%
+...
+```
+
+In the output, _improvement_ is the relative improvement of the new version,
+hopefully this is positive. _confidence_ tells if there is enough
+statistical evidence to validate the _improvement_. If there is enough evidence
+then there will be at least one star (`*`), more stars is just better. **However
+if there are no stars, then don't make any conclusions based on the
+_improvement_.** Sometimes this is fine, for example if no improvements are
+expected, then there shouldn't be any stars.
+
+**A word of caution:** Statistics is not a foolproof tool. If a benchmark shows
+a statistical significant difference, there is a 5% risk that this
+difference doesn't actually exist. For a single benchmark this is not an
+issue. But when considering 20 benchmarks it's normal that one of them
+will show significance, when it shouldn't. A possible solution is to instead
+consider at least two stars (`**`) as the threshold, in that case the risk
+is 1%. If three stars (`***`) is considered the risk is 0.1%. However this
+may require more runs to obtain (can be set with `--runs`).
+
+_For the statistically minded, the R script performs an [independent/unpaired
+2-group t-test][t-test], with the null hypothesis that the performance is the
+same for both versions. The confidence field will show a star if the p-value
+is less than `0.05`._
+
+The `compare.R` tool can also produce a box plot by using the `--plot filename`
+option. In this case there are 48 different benchmark combinations, and there
+may be a need to filter the csv file. This can be done while benchmarking
+using the `--set` parameter (e.g. `--set encoding=ascii`) or by filtering
+results afterwards using tools such as `sed` or `grep`. In the `sed` case be
+sure to keep the first line since that contains the header information.
+
+```console
+$ cat compare-pr-5134.csv | sed '1p;/encoding='"'"ascii"'"'/!d' | Rscript benchmark/compare.R --plot compare-plot.png
+
+ confidence improvement accuracy (*) (**) (***)
+ string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=128 encoding='ascii' *** -3.76 % ±1.36% ±1.82% ±2.40%
+ string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=32 encoding='ascii' *** -2.70 % ±0.83% ±1.11% ±1.45%
+ string_decoder/string-decoder.js n=2500000 chunkLen=16 inLen=4096 encoding='ascii' *** -4.06 % ±0.31% ±0.41% ±0.54%
+ string_decoder/string-decoder.js n=2500000 chunkLen=256 inLen=1024 encoding='ascii' *** -1.42 % ±0.58% ±0.77% ±1.01%
+...
+```
+
+![compare tool boxplot](doc_img/compare-boxplot.png)
+
+### Comparing parameters
+
+It can be useful to compare the performance for different parameters, for
+example to analyze the time complexity.
+
+To do this use the `scatter.js` tool, this will run a benchmark multiple times
+and generate a csv with the results. To see how to use this script,
+run `node benchmark/scatter.js`.
+
+```console
+$ node benchmark/scatter.js benchmark/string_decoder/string-decoder.js > scatter.csv
+```
+
+After generating the csv, a comparison table can be created using the
+`scatter.R` tool. Even more useful it creates an actual scatter plot when using
+the `--plot filename` option.
+
+```console
+$ cat scatter.csv | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log
+
+aggregating variable: inLen
+
+chunkLen encoding rate confidence.interval
+ 16 ascii 1515855.1 334492.68
+ 16 base64-ascii 403527.2 89677.70
+ 16 base64-utf8 322352.8 70792.93
+ 16 utf16le 1714567.5 388439.81
+ 16 utf8 1100181.6 254141.32
+ 64 ascii 3550402.0 661277.65
+ 64 base64-ascii 1093660.3 229976.34
+ 64 base64-utf8 997804.8 227238.04
+ 64 utf16le 3372234.0 647274.88
+ 64 utf8 1731941.2 360854.04
+ 256 ascii 5033793.9 723354.30
+ 256 base64-ascii 1447962.1 236625.96
+ 256 base64-utf8 1357269.2 231045.70
+ 256 utf16le 4039581.5 655483.16
+ 256 utf8 1828672.9 360311.55
+ 1024 ascii 5677592.7 624771.56
+ 1024 base64-ascii 1494171.7 227302.34
+ 1024 base64-utf8 1399218.9 224584.79
+ 1024 utf16le 4157452.0 630416.28
+ 1024 utf8 1824266.6 359628.52
+```
+
+Because the scatter plot can only show two variables (in this case _chunkLen_
+and _encoding_) the rest is aggregated. Sometimes aggregating is a problem, this
+can be solved by filtering. This can be done while benchmarking using the
+`--set` parameter (e.g. `--set encoding=ascii`) or by filtering results
+afterwards using tools such as `sed` or `grep`. In the `sed` case be
+sure to keep the first line since that contains the header information.
+
+```console
+$ cat scatter.csv | sed -E '1p;/([^,]+, ){3}128,/!d' | Rscript benchmark/scatter.R --xaxis chunkLen --category encoding --plot scatter-plot.png --log
+
+chunkLen encoding rate confidence.interval
+ 16 ascii 1302078.5 71692.27
+ 16 base64-ascii 338669.1 15159.54
+ 16 base64-utf8 281904.2 20326.75
+ 16 utf16le 1381515.5 58533.61
+ 16 utf8 831183.2 33631.01
+ 64 ascii 4363402.8 224030.00
+ 64 base64-ascii 1036825.9 48644.72
+ 64 base64-utf8 780059.3 60994.98
+ 64 utf16le 3900749.5 158366.84
+ 64 utf8 1723710.6 80665.65
+ 256 ascii 8472896.1 511822.51
+ 256 base64-ascii 2215884.6 104347.53
+ 256 base64-utf8 1996230.3 131778.47
+ 256 utf16le 5824147.6 234550.82
+ 256 utf8 2019428.8 100913.36
+ 1024 ascii 8340189.4 598855.08
+ 1024 base64-ascii 2201316.2 111777.68
+ 1024 base64-utf8 2002272.9 128843.11
+ 1024 utf16le 5789281.7 240642.77
+ 1024 utf8 2025551.2 81770.69
+```
+
+![compare tool boxplot](doc_img/scatter-plot.png)
+
+### Running Benchmarks on the CI
+
+To see the performance impact of a Pull Request by running benchmarks on
+the CI, check out [How to: Running core benchmarks on Node.js CI][benchmark-ci].
+
+## Creating a benchmark
+
+### Basics of a benchmark
+
+All benchmarks use the `require('../common.js')` module. This contains the
+`createBenchmark(main, configs[, options])` method which will setup the
+benchmark.
+
+The arguments of `createBenchmark` are:
+
+* `main` {Function} The benchmark function,
+ where the code running operations and controlling timers should go
+* `configs` {Object} The benchmark parameters. `createBenchmark` will run all
+ possible combinations of these parameters, unless specified otherwise.
+ Each configuration is a property with an array of possible values.
+ The configuration values can only be strings or numbers.
+* `options` {Object} The benchmark options. At the moment only the `flags`
+ option for specifying command line flags is supported.
+
+`createBenchmark` returns a `bench` object, which is used for timing
+the runtime of the benchmark. Run `bench.start()` after the initialization
+and `bench.end(n)` when the benchmark is done. `n` is the number of operations
+performed in the benchmark.
+
+The benchmark script will be run twice:
+
+The first pass will configure the benchmark with the combination of
+parameters specified in `configs`, and WILL NOT run the `main` function.
+In this pass, no flags except the ones directly passed via commands
+when running the benchmarks will be used.
+
+In the second pass, the `main` function will be run, and the process
+will be launched with:
+
+* The flags passed into `createBenchmark` (the third argument)
+* The flags in the command passed when the benchmark was run
+
+Beware that any code outside the `main` function will be run twice
+in different processes. This could be troublesome if the code
+outside the `main` function has side effects. In general, prefer putting
+the code inside the `main` function if it's more than just declaration.
+
+```js
+'use strict';
+const common = require('../common.js');
+const { SlowBuffer } = require('buffer');
+
+const configs = {
+ // Number of operations, specified here so they show up in the report.
+ // Most benchmarks just use one value for all runs.
+ n: [1024],
+ type: ['fast', 'slow'], // Custom configurations
+ size: [16, 128, 1024] // Custom configurations
+};
+
+const options = {
+ // Add --expose-internals in order to require internal modules in main
+ flags: ['--zero-fill-buffers']
+};
+
+// `main` and `configs` are required, `options` is optional.
+const bench = common.createBenchmark(main, configs, options);
+
+// Any code outside main will be run twice,
+// in different processes, with different command line arguments.
+
+function main(conf) {
+ // Only flags that have been passed to createBenchmark
+ // earlier when main is run will be in effect.
+ // In order to benchmark the internal modules, require them here. For example:
+ // const URL = require('internal/url').URL
+
+ // Start the timer
+ bench.start();
+
+ // Do operations here
+ const BufferConstructor = conf.type === 'fast' ? Buffer : SlowBuffer;
+
+ for (let i = 0; i < conf.n; i++) {
+ new BufferConstructor(conf.size);
+ }
+
+ // End the timer, pass in the number of operations
+ bench.end(conf.n);
+}
+```
+
+### Creating an HTTP benchmark
+
+The `bench` object returned by `createBenchmark` implements
+`http(options, callback)` method. It can be used to run external tool to
+benchmark HTTP servers.
+
+```js
+'use strict';
+
+const common = require('../common.js');
+
+const bench = common.createBenchmark(main, {
+ kb: [64, 128, 256, 1024],
+ connections: [100, 500]
+});
+
+function main(conf) {
+ const http = require('http');
+ const len = conf.kb * 1024;
+ const chunk = Buffer.alloc(len, 'x');
+ const server = http.createServer((req, res) => {
+ res.end(chunk);
+ });
+
+ server.listen(common.PORT, () => {
+ bench.http({
+ connections: conf.connections,
+ }, () => {
+ server.close();
+ });
+ });
+}
+```
+
+Supported options keys are:
+
+* `port` - defaults to `common.PORT`
+* `path` - defaults to `/`
+* `connections` - number of concurrent connections to use, defaults to 100
+* `duration` - duration of the benchmark in seconds, defaults to 10
+* `benchmarker` - benchmarker to use, defaults to
+`common.default_http_benchmarker`
+
+[autocannon]: https://github.com/mcollina/autocannon
+[wrk]: https://github.com/wg/wrk
+[t-test]: https://en.wikipedia.org/wiki/Student%27s_t-test#Equal_or_unequal_sample_sizes.2C_unequal_variances
+[git-for-windows]: https://git-scm.com/download/win
+[nghttp2.org]: https://nghttp2.org
+[benchmark-ci]: https://github.com/nodejs/benchmarking/blob/master/docs/core_benchmarks.md