Bit packing binary in JavaScript and JSON
I like to help out on the Glitch support forum which is mostly like a more friendly StackOverflow.
Recently a young person was trying to train a neural net on a 256*256 square of grayscale pixels with random values from 0-255. But body-parser
wouldn’t accept the payload because it was too large.
Below is my latest, best, biggest reply…
This has been really bothering me (in a good way). Thank you for posting this problem, it’s a good intersection of JS and Computer Science.
I did some experiments. Essentially we are bumping into fundamental limits of what you’re allowed to transmit in JSON, which boils down to the primitives of string, number, and boolean.
What’s been bothering me is that you have data like [43,16,163]
which are all small integers that can be stored in a single 8-bit byte each. In binary, they look like this:
00101011,
00010000,
10100011,
(Each 0/1 is a bit, hence 8 bits)
But to store them in JSON, they have to go as strings one way or another, because even if we use the number type, each character takes up one 8-bit byte character, plus commas. So instead of storing three bytes as above, we transmit, at minimum:
00110100 00110011 00101100
00110001 00110110 00101100
00110001 00110110 00110011
…which is 4 3 comma 1 6 comma 1 6 3.
It SUCKS that you have to use 3 bytes to store the number 163 in JSON. It turns out this is why BSON (Binary JSON) exists (but I won’t go into that here…)
About base64
Base64 doesn’t really make things smaller. It reduces your character set from any possible ASCII or unicode character down to 64 “safe” characters. So in the case of your numbers, going from string array to base64 will naturally make the set bigger because we’re using a smaller character set to represent the same data.
What can we do
The smallest representation of your data would be to say, well, what ASCII character is the 43rd, the 16th, the 163rd? But you quickly get problems because the first 32 characters are “control characters” i.e. weird garbage. And JSON doesn’t allow you to transmit those.
(Find an ASCII table at: https://duckduckgo.com/?t=ffab&q=ascii+table&ia=answer&iax=answer)
So!
Let’s look at some disgusting magic: https://codereview.stackexchange.com/questions/3569/pack-and-unpack-bytes-to-strings
This guy is looking for a way to do what you’re doing, pack binary digits 0-255 into a unicode string.
The accepted answer has final code like this:
function pack(bytes) {
var chars = [];
for(var i = 0, n = bytes.length; i < n;) {
chars.push(((bytes[i++] & 0xff) << 8) | (bytes[i++] & 0xff));
}
return String.fromCharCode.apply(null, chars);
}
function unpack(str) {
var bytes = [];
for(var i = 0, n = str.length; i < n; i++) {
var char = str.charCodeAt(i);
bytes.push(char >>> 8, char & 0xFF);
}
return bytes;
}
Here’s an output from a program I made to test various packing methods on your data:
$ node ui8a.js
intArrayString: 43,16,163,14,248,184,59,227,243,146,7,218,100,90,204,129,118,86,
28,235,131,114,87,60,197,223,168,61,212,101,233,203
intArrayString - Length: 115
intArrayBase64: NDMsMTYsMTYzLDE0LDI0OCwxODQsNTksMjI3LDI0MywxNDYsNywyMTgsMTAwLDkw
LDIwNCwxMjksMTE4LDg2LDI4LDIzNSwxMzEsMTE0LDg3LDYwLDE5NywyMjMsMTY4LDYxLDIxMiwxMDEs
MjMzLDIwMw==
intArrayBase64 - Length: 156
packIntArray: ⬐ꌎ㯣ߚ摚첁癖ᳫ荲圼엟푥
packIntArray - Length: 16
unpackIntArrayString: 43,16,163,14,248,184,59,227,243,146,7,218,100,90,204,129,1
18,86,28,235,131,114,87,60,197,223,168,61,212,101,233,203
unpackIntArrayString - Length: 115
Remember I said “disgusting magic” so this isn’t necessarily the way to go.
HOWEVER, what we’re seeing here is that we can pack 32 ints into just 16 characters using the 2-byte unicode strings in JavaScript and it will come out as a dreadful mix of asian languages and missing unicode chars: ⬐ꌎ㯣ߚ摚첁癖ᳫ荲圼엟푥
Unpacking the data, we get back exactly the same that you started with, i.e. it “round-trips” correctly: 43,16,163,14,248,184,59,227,243,146,7,218,100,90,204,129,1
18,86,28,235,131,114,87,60,197,223,168,61,212,101,233,203
I think this is promising and if your JSON libraries allow this string through, this lets you truly encode your binary data as binary instead of as a massive waste of space!
I’ve dropped my test program source code in a gist: https://gist.github.com/SteGriff/a546cb4056c0c3ae501f8da176320f38
You need to run npm install abab
for the test program to work locally.
Hope it helps someone!