Lately, I’ve been working an web application which will need to save binary blobs inside JSON objects. Looking around the web it seems that base64 encoding is the method of choice in these cases. However, this adds a 30% overhead and decoding large base64 strings to Javascript typed arrays (ArrayBuffer
) is an expensive tasks.
So I’ve been looking at different binary data formats: BSON, Protocol Buffers, Smile Format, UBJSON, BJSON and others. Eventually, I decided to give BJSON a try for the following reasons.
- BJSON is easy to make a lightweight implementation
- It can encapsulate any JSON object
- BJSON documents can be represented as JSON objects with ArrayBuffers for binary blobs.
My primary motivation is the fact that BJSON can serialize ArrayBuffers, as an added bonus a BJSON encoding of JSON object is typically smaller than the traditional string encoding with JSON.stringify()
. Now, I’m sure there is valid arguments to use another binary encoding of JSON objects, so I’m going to stop with the arguments and talk code instead…
Well, time to introduce BJSON.coffee, a CoffeeScript implementation of BJSON for modern browsers. Aparts from null, booleans, numbers, arrays and dictionaries also available JSON, the BJSON specification also defines the inclusion of binary data. The specification notes that “this is not fully transcodable“, but as you might have guessed BJSON.coffee uses ArrayBuffers to represent binary data.
Essentially, BJSON.serialize
takes a JSON object that is allowed to contain ArrayBuffers and serializes to a single ArrayBuffer. While, BJSON.parse
takes an ArrayBuffer and returns a JSON object which may contain ArrayBuffers.For those interested in using BJSON instead of a normal string encoding of JSON objects, there is both good and bad news. The bad news is that UTF-8 string encoding in modern browsers is so slow, that BJSON is slower than a conventional string encoding of JSON objects. Although, this might not be the case when/if the string encoding specification is implemented.
The good news is that the BJSON encoding is 5-10% smaller than the conventional string encoding of JSON objects. The table/terminal output below from my testing script, shows some common JSON objects harvested from common web APIs.
Test: size (JSON): size (BJSON): Compression: Success: bitly-burstphrases.json 17714 16372 16% true bitly-clickrate.json 104 88 24% true bitly-hotphrases.json 65665 60592 16% true bitly-linkinfo.json 497 468 10% true complex.json 424 384 25% true flickr-hottags.json 3610 3205 11% true twitter-search.json 12968 11994 10% true yahoo-weather.json 1241 1172 5% true youtube-comments.json 26296 24938 5% true youtube-featured.json 110873 104422 5% true youtube-search.json 93202 88473 5% true
BJSON.coffee is available at github.com/jonasfj/BJSON.coffee. It should work in all modern browsers with support for typed arrays, Firefox 15+, Chrome 22+, IE 10+, Opera 12.1+, Safari 5.1+. However, I have pushed a github page which runs unit-tests in the browser and shows compatibility results from other browsers using Browserscope. So please visit it here, click “run tests” and help figure out where BJSON.coffee works.
Update: Being bored today I decided to a quick jsperf benchmark of JSON.stringify
and BJSON.serialize
to see how much slower BJSON.serialize
is. You can find the test here, which seems to suggest that BJSON.serialize
might be unreasonably slow at the moment. However, it seems that slow UTF-8 encoding is responsible for much of this, and I believe it is possible to improve the current UTF-8 encoding speed.