Skip to content

Conversation

@addaleax
Copy link
Member

@addaleax addaleax commented Feb 2, 2018

Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

Benchmark results
$ ./node benchmark/compare.js --new ./node --old ./node-before --set n=25e4 string_decoder | Rscript benchmark/compare.R
[03:07:14|% 100| 2/2 files | 60/60 runs | 80/80 configs]: Done
                                                                                             confidence improvement accuracy (*)    (**)   (***)
 string_decoder/string-decoder-create.js n=250000 encoding="ascii"                                 ***    -77.32 %       ±4.32%  ±5.80%  ±7.68%
 string_decoder/string-decoder-create.js n=250000 encoding="AscII"                                 ***    -74.15 %       ±2.55%  ±3.43%  ±4.53%
 string_decoder/string-decoder-create.js n=250000 encoding="base64"                                ***    -56.88 %       ±4.75%  ±6.36%  ±8.37%
 string_decoder/string-decoder-create.js n=250000 encoding="ucs2"                                  ***    -64.00 %       ±3.53%  ±4.72%  ±6.21%
 string_decoder/string-decoder-create.js n=250000 encoding="UTF-16LE"                              ***    -54.64 %       ±3.82%  ±5.12%  ±6.74%
 string_decoder/string-decoder-create.js n=250000 encoding="utf-8"                                 ***    -66.98 %       ±4.92%  ±6.62%  ±8.75%
 string_decoder/string-decoder-create.js n=250000 encoding="UTF-8"                                 ***    -59.94 %       ±3.45%  ±4.61%  ±6.03%
 string_decoder/string-decoder-create.js n=250000 encoding="utf8"                                  ***    -66.55 %       ±4.65%  ±6.26%  ±8.28%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="ascii"                       12.89 %      ±14.67% ±19.54% ±25.46%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-ascii"         **     12.83 %       ±9.20% ±12.24% ±15.93%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-utf8"          **     12.19 %       ±8.97% ±11.93% ±15.53%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf16le"                      7.69 %      ±11.02% ±14.67% ±19.12%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf8"                         0.55 %       ±8.17% ±10.88% ±14.17%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="ascii"                  *     23.55 %      ±17.71% ±23.57% ±30.69%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-ascii"         ***     45.47 %      ±13.60% ±18.13% ±23.66%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-utf8"          ***     39.64 %      ±14.04% ±18.70% ±24.37%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf16le"                *     13.59 %      ±13.43% ±17.86% ±23.26%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf8"                         11.52 %      ±12.00% ±15.98% ±20.81%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="ascii"                   *     14.10 %      ±13.51% ±18.01% ±23.50%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-ascii"          ***     67.77 %      ±17.03% ±22.72% ±29.69%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-utf8"           ***     54.15 %      ±16.54% ±22.08% ±28.87%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf16le"                       15.89 %      ±16.84% ±22.44% ±29.26%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf8"                           9.28 %      ±12.42% ±16.52% ±21.51%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="ascii"                       10.09 %      ±10.59% ±14.09% ±18.34%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-ascii"                 6.93 %       ±8.86% ±11.79% ±15.36%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-utf8"           *      9.24 %       ±7.98% ±10.62% ±13.83%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf16le"                     -0.67 %       ±9.40% ±12.51% ±16.31%
 string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf8"                         2.27 %       ±8.52% ±11.34% ±14.76%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="ascii"                          8.53 %      ±10.65% ±14.18% ±18.48%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-ascii"          ***     54.82 %      ±10.50% ±14.00% ±18.29%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-utf8"           ***     51.75 %      ±10.83% ±14.43% ±18.85%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf16le"                       -5.01 %       ±9.15% ±12.18% ±15.85%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf8"                           4.55 %       ±8.20% ±10.92% ±14.21%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="ascii"                    *     13.88 %      ±12.62% ±16.81% ±21.91%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-ascii"           ***     52.79 %      ±10.56% ±14.08% ±18.39%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-utf8"            ***     52.56 %      ±11.39% ±15.21% ±19.90%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf16le"                         0.43 %      ±10.06% ±13.38% ±17.42%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf8"                            3.33 %      ±10.17% ±13.53% ±17.63%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="ascii"                   ***     31.81 %      ±14.71% ±19.58% ±25.52%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-ascii"            ***     51.38 %      ±13.23% ±17.65% ±23.09%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-utf8"             ***     48.89 %      ±14.06% ±18.82% ±24.71%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf16le"                          8.70 %      ±12.70% ±16.90% ±22.01%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf8"                      *     17.94 %      ±14.58% ±19.40% ±25.26%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="ascii"                  **     13.42 %       ±9.78% ±13.03% ±17.00%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-ascii"          ***     49.02 %      ±10.60% ±14.13% ±18.45%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-utf8"           ***     52.44 %      ±11.03% ±14.72% ±19.23%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf16le"                       -5.55 %       ±9.18% ±12.22% ±15.92%
 string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf8"                           6.32 %       ±8.12% ±10.80% ±14.07%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="ascii"                         8.21 %      ±12.47% ±16.61% ±21.65%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-ascii"         ***     28.39 %       ±9.32% ±12.42% ±16.19%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-utf8"          ***     21.83 %       ±9.94% ±13.24% ±17.26%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf16le"                       5.92 %      ±10.21% ±13.60% ±17.71%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf8"                          2.83 %       ±8.47% ±11.27% ±14.67%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="ascii"                   *     16.92 %      ±16.02% ±21.34% ±27.81%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-ascii"          ***     36.24 %      ±14.01% ±18.68% ±24.37%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-utf8"           ***     24.74 %      ±13.22% ±17.65% ±23.10%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf16le"                        6.26 %      ±12.01% ±15.98% ±20.81%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf8"                           1.51 %      ±10.89% ±14.50% ±18.88%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="ascii"                   **     25.58 %      ±19.19% ±25.54% ±33.28%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-ascii"           ***     67.36 %      ±15.88% ±21.21% ±27.76%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-utf8"            ***     55.56 %      ±17.89% ±23.92% ±31.36%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf16le"                 **     22.91 %      ±15.06% ±20.03% ±26.07%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf8"                           11.45 %      ±12.80% ±17.03% ±22.18%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="ascii"                  *     10.71 %      ±10.00% ±13.32% ±17.35%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-ascii"         ***     21.56 %       ±9.11% ±12.12% ±15.77%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-utf8"          ***     25.72 %       ±9.04% ±12.03% ±15.68%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf16le"                       0.64 %       ±9.49% ±12.64% ±16.48%
 string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf8"                          1.64 %       ±8.54% ±11.37% ±14.80%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="ascii"                         11.38 %      ±11.58% ±15.44% ±20.15%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-ascii"          ***     44.38 %      ±10.72% ±14.31% ±18.70%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-utf8"           ***     40.75 %      ±10.92% ±14.57% ±19.03%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf16le"                        0.01 %       ±9.09% ±12.09% ±15.73%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf8"                           4.25 %       ±8.23% ±10.95% ±14.26%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="ascii"                    *     21.15 %      ±16.30% ±21.71% ±28.31%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-ascii"           ***     50.31 %      ±11.71% ±15.64% ±20.46%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-utf8"            ***     42.86 %      ±11.50% ±15.34% ±20.07%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf16le"                         3.08 %      ±14.43% ±19.21% ±25.01%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf8"                            5.19 %      ±10.15% ±13.51% ±17.59%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="ascii"                           12.04 %      ±16.01% ±21.32% ±27.79%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-ascii"            ***     52.38 %      ±15.90% ±21.27% ±27.92%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-utf8"             ***     38.86 %      ±16.88% ±22.57% ±29.60%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf16le"                  **     19.55 %      ±13.93% ±18.54% ±24.15%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf8"                     **     19.12 %      ±12.42% ±16.52% ±21.51%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="ascii"                  **     14.49 %       ±9.70% ±12.91% ±16.81%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-ascii"          ***     37.39 %      ±10.67% ±14.21% ±18.52%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-utf8"           ***     46.33 %      ±10.02% ±13.34% ±17.40%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf16le"                       -1.83 %       ±9.42% ±12.54% ±16.32%
 string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf8"                           2.81 %       ±8.96% ±11.93% ±15.52%
Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

string_decoder

CI: https://ci.nodejs.org/job/node-test-pull-request/12913
CI: https://ci.nodejs.org/job/node-test-commit/15905/

@addaleax addaleax added the string_decoder Issues and PRs related to the string_decoder subsystem. label Feb 2, 2018
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Feb 2, 2018
Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

    $ ./node benchmark/compare.js --new ./node --old ./node-before --set n=25e4 string_decoder | Rscript benchmark/compare.R
    [03:07:14|% 100| 2/2 files | 60/60 runs | 80/80 configs]: Done
                                                                                                 confidence improvement accuracy (*)    (**)   (***)
     string_decoder/string-decoder-create.js n=250000 encoding="ascii"                                 ***    -77.32 %       ±4.32%  ±5.80%  ±7.68%
     string_decoder/string-decoder-create.js n=250000 encoding="AscII"                                 ***    -74.15 %       ±2.55%  ±3.43%  ±4.53%
     string_decoder/string-decoder-create.js n=250000 encoding="base64"                                ***    -56.88 %       ±4.75%  ±6.36%  ±8.37%
     string_decoder/string-decoder-create.js n=250000 encoding="ucs2"                                  ***    -64.00 %       ±3.53%  ±4.72%  ±6.21%
     string_decoder/string-decoder-create.js n=250000 encoding="UTF-16LE"                              ***    -54.64 %       ±3.82%  ±5.12%  ±6.74%
     string_decoder/string-decoder-create.js n=250000 encoding="utf-8"                                 ***    -66.98 %       ±4.92%  ±6.62%  ±8.75%
     string_decoder/string-decoder-create.js n=250000 encoding="UTF-8"                                 ***    -59.94 %       ±3.45%  ±4.61%  ±6.03%
     string_decoder/string-decoder-create.js n=250000 encoding="utf8"                                  ***    -66.55 %       ±4.65%  ±6.26%  ±8.28%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="ascii"                       12.89 %      ±14.67% ±19.54% ±25.46%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-ascii"         **     12.83 %       ±9.20% ±12.24% ±15.93%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="base64-utf8"          **     12.19 %       ±8.97% ±11.93% ±15.53%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf16le"                      7.69 %      ±11.02% ±14.67% ±19.12%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=1024 encoding="utf8"                         0.55 %       ±8.17% ±10.88% ±14.17%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="ascii"                  *     23.55 %      ±17.71% ±23.57% ±30.69%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-ascii"         ***     45.47 %      ±13.60% ±18.13% ±23.66%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="base64-utf8"          ***     39.64 %      ±14.04% ±18.70% ±24.37%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf16le"                *     13.59 %      ±13.43% ±17.86% ±23.26%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=128 encoding="utf8"                         11.52 %      ±12.00% ±15.98% ±20.81%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="ascii"                   *     14.10 %      ±13.51% ±18.01% ±23.50%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-ascii"          ***     67.77 %      ±17.03% ±22.72% ±29.69%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="base64-utf8"           ***     54.15 %      ±16.54% ±22.08% ±28.87%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf16le"                       15.89 %      ±16.84% ±22.44% ±29.26%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=32 encoding="utf8"                           9.28 %      ±12.42% ±16.52% ±21.51%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="ascii"                       10.09 %      ±10.59% ±14.09% ±18.34%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-ascii"                 6.93 %       ±8.86% ±11.79% ±15.36%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="base64-utf8"           *      9.24 %       ±7.98% ±10.62% ±13.83%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf16le"                     -0.67 %       ±9.40% ±12.51% ±16.31%
     string_decoder/string-decoder.js n=250000 chunkLen=1024 inLen=4096 encoding="utf8"                         2.27 %       ±8.52% ±11.34% ±14.76%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="ascii"                          8.53 %      ±10.65% ±14.18% ±18.48%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-ascii"          ***     54.82 %      ±10.50% ±14.00% ±18.29%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="base64-utf8"           ***     51.75 %      ±10.83% ±14.43% ±18.85%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf16le"                       -5.01 %       ±9.15% ±12.18% ±15.85%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=1024 encoding="utf8"                           4.55 %       ±8.20% ±10.92% ±14.21%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="ascii"                    *     13.88 %      ±12.62% ±16.81% ±21.91%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-ascii"           ***     52.79 %      ±10.56% ±14.08% ±18.39%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="base64-utf8"            ***     52.56 %      ±11.39% ±15.21% ±19.90%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf16le"                         0.43 %      ±10.06% ±13.38% ±17.42%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=128 encoding="utf8"                            3.33 %      ±10.17% ±13.53% ±17.63%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="ascii"                   ***     31.81 %      ±14.71% ±19.58% ±25.52%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-ascii"            ***     51.38 %      ±13.23% ±17.65% ±23.09%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="base64-utf8"             ***     48.89 %      ±14.06% ±18.82% ±24.71%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf16le"                          8.70 %      ±12.70% ±16.90% ±22.01%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=32 encoding="utf8"                      *     17.94 %      ±14.58% ±19.40% ±25.26%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="ascii"                  **     13.42 %       ±9.78% ±13.03% ±17.00%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-ascii"          ***     49.02 %      ±10.60% ±14.13% ±18.45%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="base64-utf8"           ***     52.44 %      ±11.03% ±14.72% ±19.23%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf16le"                       -5.55 %       ±9.18% ±12.22% ±15.92%
     string_decoder/string-decoder.js n=250000 chunkLen=16 inLen=4096 encoding="utf8"                           6.32 %       ±8.12% ±10.80% ±14.07%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="ascii"                         8.21 %      ±12.47% ±16.61% ±21.65%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-ascii"         ***     28.39 %       ±9.32% ±12.42% ±16.19%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="base64-utf8"          ***     21.83 %       ±9.94% ±13.24% ±17.26%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf16le"                       5.92 %      ±10.21% ±13.60% ±17.71%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=1024 encoding="utf8"                          2.83 %       ±8.47% ±11.27% ±14.67%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="ascii"                   *     16.92 %      ±16.02% ±21.34% ±27.81%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-ascii"          ***     36.24 %      ±14.01% ±18.68% ±24.37%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="base64-utf8"           ***     24.74 %      ±13.22% ±17.65% ±23.10%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf16le"                        6.26 %      ±12.01% ±15.98% ±20.81%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=128 encoding="utf8"                           1.51 %      ±10.89% ±14.50% ±18.88%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="ascii"                   **     25.58 %      ±19.19% ±25.54% ±33.28%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-ascii"           ***     67.36 %      ±15.88% ±21.21% ±27.76%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="base64-utf8"            ***     55.56 %      ±17.89% ±23.92% ±31.36%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf16le"                 **     22.91 %      ±15.06% ±20.03% ±26.07%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=32 encoding="utf8"                           11.45 %      ±12.80% ±17.03% ±22.18%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="ascii"                  *     10.71 %      ±10.00% ±13.32% ±17.35%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-ascii"         ***     21.56 %       ±9.11% ±12.12% ±15.77%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="base64-utf8"          ***     25.72 %       ±9.04% ±12.03% ±15.68%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf16le"                       0.64 %       ±9.49% ±12.64% ±16.48%
     string_decoder/string-decoder.js n=250000 chunkLen=256 inLen=4096 encoding="utf8"                          1.64 %       ±8.54% ±11.37% ±14.80%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="ascii"                         11.38 %      ±11.58% ±15.44% ±20.15%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-ascii"          ***     44.38 %      ±10.72% ±14.31% ±18.70%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="base64-utf8"           ***     40.75 %      ±10.92% ±14.57% ±19.03%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf16le"                        0.01 %       ±9.09% ±12.09% ±15.73%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=1024 encoding="utf8"                           4.25 %       ±8.23% ±10.95% ±14.26%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="ascii"                    *     21.15 %      ±16.30% ±21.71% ±28.31%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-ascii"           ***     50.31 %      ±11.71% ±15.64% ±20.46%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="base64-utf8"            ***     42.86 %      ±11.50% ±15.34% ±20.07%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf16le"                         3.08 %      ±14.43% ±19.21% ±25.01%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=128 encoding="utf8"                            5.19 %      ±10.15% ±13.51% ±17.59%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="ascii"                           12.04 %      ±16.01% ±21.32% ±27.79%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-ascii"            ***     52.38 %      ±15.90% ±21.27% ±27.92%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="base64-utf8"             ***     38.86 %      ±16.88% ±22.57% ±29.60%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf16le"                  **     19.55 %      ±13.93% ±18.54% ±24.15%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=32 encoding="utf8"                     **     19.12 %      ±12.42% ±16.52% ±21.51%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="ascii"                  **     14.49 %       ±9.70% ±12.91% ±16.81%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-ascii"          ***     37.39 %      ±10.67% ±14.21% ±18.52%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="base64-utf8"           ***     46.33 %      ±10.02% ±13.34% ±17.40%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf16le"                       -1.83 %       ±9.42% ±12.54% ±16.32%
     string_decoder/string-decoder.js n=250000 chunkLen=64 inLen=4096 encoding="utf8"                           2.81 %       ±8.96% ±11.93% ±15.52%
@addaleax addaleax force-pushed the stream-decoder-native branch from 5cea9aa to 7eae61d Compare February 2, 2018 22:50
@addaleax addaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 6, 2018
ssize_t* nread_ptr) {
Local<String> prepend, body;

size_t nread = *nread_ptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nread_ptr should probably be a size_t* (see comment further down) but otherwise do a CHECK_GE(*nread_ptr, 0) sanity check first.

#endif
state_[kBufferedBytes]++;
if ((data[i] & 0xC0) == 0x80) {
// This byte does not start a character (a "trailing" bytes).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: s/bytes/byte/

}
}

if (!prepend.IsEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably reads a bit more natural if you invert (or un-invert really) the logic:

if (prepend.IsEmpty()) return body;
return String::Concat(prepend, body);

CHECK_EQ(BufferedBytes(), 0);
}

if (Encoding() == UCS2 && (BufferedBytes() % 2) == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous parens.

StringDecoder* decoder =
reinterpret_cast<StringDecoder*>(Buffer::Data(args[0]));
CHECK_NE(decoder, nullptr);
ssize_t nread = Buffer::Length(args[1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer::Length() returns a size_t.

class StringDecoder {
public:
StringDecoder() { state_[kEncodingField] = BUFFER; }
void SetEncoding(enum encoding encoding);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods that are implemented in string_decoder-inl.h should be marked inline here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnoordhuis I’m happy to do that, but is there a specific reason for that style?

It should be transparent to consumers of the header file whether a function is provided inline or not, right? And e.g. our HTTP2 code is a wonderful example of how these would get out of sync anyway when code is being moved around…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a style issue, at least not exactly; it's to give the methods inline linkage right off the bat.

Right now, if you include string_decoder.h (but not string_decoder-inl.h) and call SetEncoding(), it will compile just fine but generate a link-time error.

If instead you mark it inline in the class definition, it's a compile-time error, or at least a warning.

@addaleax addaleax removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 9, 2018
@addaleax addaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 9, 2018
@addaleax
Copy link
Member Author

addaleax commented Feb 9, 2018

@bnoordhuis Okay, should have applied everything now.

CI: https://ci.nodejs.org/job/node-test-commit/16099/

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only did a quick scan this time around but LGTM.

@BridgeAR
Copy link
Member

Landed in 180af17 🎉

@BridgeAR BridgeAR closed this Feb 10, 2018
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Feb 10, 2018
Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

PR-URL: nodejs#18537
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
@TimothyGu
Copy link
Member

Hmm, it would be a fun experiment to try implementing this in WebAssembly…

@addaleax addaleax deleted the stream-decoder-native branch February 10, 2018 21:26
@addaleax
Copy link
Member Author

@TimothyGu As long as you have no way to decode/encode strings as a JS or WASM primitive (which doesn’t call into native code), it isn’t really going to be faster… I think the main speedup here is from the reduced # of JS/C++ boundary calls?

@apapirovski
Copy link
Contributor

@addaleax It appears this broke https://github.com/expressjs/body-parser in CitGM. I confirmed locally by reverting this commit and running it successfully against the test suite. Would you have time to look into it?

apapirovski added a commit that referenced this pull request Feb 13, 2018
There are libraries which invoke StringDecoder using .call and
.inherits, which directly conflicts with making StringDecoder
be a class which can only be invoked with the new keyword.
Revert to declaring it as a function.

StringDecoder#lastNeed was not defined, redefine it using
the new interface and fix StringDecoder#lastTotal.

PR-URL: #18723
Refs: #18537
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Evan Lucas <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
@MylesBorins
Copy link
Contributor

Should this be backported to v9.x-staging? If yes please follow the guide and raise a backport PR, if not let me know or add the dont-land-on label.

@addaleax addaleax added backport-requested-v9.x and removed backport-requested-v9.x author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Feb 21, 2018
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request May 1, 2018
Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

PR-URL: nodejs#18537
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request May 1, 2018
There are libraries which invoke StringDecoder using .call and
.inherits, which directly conflicts with making StringDecoder
be a class which can only be invoked with the new keyword.
Revert to declaring it as a function.

StringDecoder#lastNeed was not defined, redefine it using
the new interface and fix StringDecoder#lastTotal.

PR-URL: nodejs#18723
Refs: nodejs#18537
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Evan Lucas <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
MayaLekova pushed a commit to MayaLekova/node that referenced this pull request May 8, 2018
Implement string decoder in C++. The perks are a decent speed boost
(for decoding, whereas creation show some performance degradation),
that this can now be used more easily to add native decoding support
to C++ streams and (arguably) more readable variable names.

PR-URL: nodejs#18537
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
MayaLekova pushed a commit to MayaLekova/node that referenced this pull request May 8, 2018
There are libraries which invoke StringDecoder using .call and
.inherits, which directly conflicts with making StringDecoder
be a class which can only be invoked with the new keyword.
Revert to declaring it as a function.

StringDecoder#lastNeed was not defined, redefine it using
the new interface and fix StringDecoder#lastTotal.

PR-URL: nodejs#18723
Refs: nodejs#18537
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Evan Lucas <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. string_decoder Issues and PRs related to the string_decoder subsystem.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants