Skip to content

Instantly share code, notes, and snippets.

@albe
Last active August 23, 2020 12:20
Show Gist options
  • Save albe/39c7b79f46daa49d2cf373ffab3c4513 to your computer and use it in GitHub Desktop.
Save albe/39c7b79f46daa49d2cf373ffab3c4513 to your computer and use it in GitHub Desktop.
Benchmark nodejs Array vs UInt32Array
const Benchmark = require('benchmark');
const benchmarks = require('beautify-benchmark');
const Suite = new Benchmark.Suite('fixed array');
Suite.on('cycle', (event) => benchmarks.add(event.target));
Suite.on('complete', () => benchmarks.log());
const fileBuffer = Buffer.alloc(16 * 128);
let offs = 0;
for (let i = 0; i<128; i++, offs+=16) {
fileBuffer.writeUInt32LE(1 + i*4, offs);
fileBuffer.writeUInt32LE(2 + i*4, offs+4);
fileBuffer.writeUInt32LE(3 + i*4, offs+8);
fileBuffer.writeUInt32LE(4 + i*4, offs+12);
}
Suite.add('entry read', () => {
for (let i = 0; i<128; i++) {
let entry = new Array(4);
entry[0] = fileBuffer.readUInt32LE(i*16);
entry[1] = fileBuffer.readUInt32LE(i*16 + 4);
entry[1] = fileBuffer.readUInt32LE(i*16 + 8);
entry[1] = fileBuffer.readUInt32LE(i*16 + 12);
if (entry[0] !== i * 4 + 1) {
console.log('Invalid data at entry #' + i, entry[0]);
}
}
});
Suite.add('entryview read', () => {
for (let i = 0; i < 128; i++) {
let entry = new Uint32Array(fileBuffer.buffer, fileBuffer.byteOffset + i * 16, 4);
if (entry[0] !== i * 4 + 1) {
console.log('Invalid data at entry #' + i, entry[0]);
}
}
});
Suite.run();
> node --version
v12.18.3
> node bench-array
2 tests completed.
entry read x 1,180,097 ops/sec ±1.49% (92 runs sampled)
entryview read x 198,301 ops/sec ±1.00% (90 runs sampled)
> WHY?!
@albe
Copy link
Author

albe commented Aug 16, 2020

Rough use case:

  • a binary file of uint32LE bytes that needs to be read and "parsed" as blocks of 16 bytes

According to docs, the UInt32Array should only create a view on top of the fileBuffer, without any copying.
https://nodejs.org/api/buffer.html#buffer_buffers_and_typedarrays
Hence, the only work is the construction of the UInt32Array object itself. So this should be super fast.

In the array case we create a new Array object, then read and write the 16 bytes for each case. How can this be roughly an order of magnitude faster?

The only explanation I have:

  • Array is super optimized in nodejs, both in terms of instanciation and write performance
  • UInt32Array constructor is super slow for some unknown reasons and dominates over the zero-copy view

@albe
Copy link
Author

albe commented Aug 16, 2020

Addendum: The same with DataView instead of UInt32Array is slower by another factor

Suite.add('entry dataview', () => {
    for (let i = 0; i<128; i++) {
        let entry = new DataView(fileBuffer.buffer, fileBuffer.byteOffset + i*16, 16);
        if (entry.getUint32(0, true) !== i * 4 + 1) {
            console.log('Invalid data at entry #' + i, entry[0]);
        }
    }
});

entry dataview x 75,273 ops/sec ±1.07% (89 runs sampled)

And the read check does not really play a factor in all three cases. Why is DataView so much slower than UInt32Array?

Also noteworthy: Doing let entry = fileBuffer.slice(i*16, (i+1)*16); will result in roughly same performance as the UInt32Array case, which makes me believe they might do similar things. However, according do docs, again a slice on a Buffer should result in only a view on the region inside the buffers memory. So the performance penalty is not really understandable, unless some allocation+memcpy is in place.

@albe
Copy link
Author

albe commented Aug 23, 2020

nodejs/help#2926

Maybe to clarify a bit: While JS engines probably could optimize out the typed array creation, at least V8 currently doesn’t. That comes with allocation and object creation overhead.

I once overhead a JS engine developer say (slightly tongue-in-cheek) that the only thing typed arrays are really better at than plain Arrays is fast passing of data between JS and native code :)

@albe
Copy link
Author

albe commented Aug 23, 2020

Another interesting observation: Creating a custom class, that just holds a reference to the buffer and offset, then uses readUInt32LE() on demand will perform at ~2,8M ops/sec for the above benchmark. So whatever UInt32Array does, it's more than just the object creation and referencing of the buffer for access (maybe some ref counting which involves an additional allocation?).
Note though that it will start to perform worse relative to amount of value accesses and slower than the array solution starting with accessing the four values more than once.

So that means: buffer read access < object property access and typed array instanciation < generic object instanciation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment