Data Serialization Options

Basics

But, why?

Why? (contd.)

Thoughts going into this:

  • save on bandwidth
  • save on cpu usage

Does Gzip solve all these problems?

Not quite.

Gzip from the client to server is not recommended. https://stackoverflow.com/a/48572297

Examples of Scale

Formats

JSON

https://www.json.org/

XML

https://w3.org/XML/

Warning

There be dragons binary

The following string representations are only for visualization purposes.

Avro

https://avro.apache.org/

Something(Bwidgetscogs

Avro Schema

Avro Schema (contd.)

Bencode

https://en.wikipedia.org/wiki/Bencode

  • Pronounced “B-encode”
  • Used by the Bittorrent protocol
d8:contentsl7:widgets4:cogse4:name9:Something6:weighti42ee

BSON

http://bsonspec.org/

  • Used by MongoDB
  • Pronounced [bee · sahn]
  • Stands for “Binary JSON”
Oname
Somethingweight*contents 0widgets1cogs

MessagePack

https://msgpack.org/index.html

��name�Something�weight*�contents��widgets�cogs

Protobuf

https://developers.google.com/protocol-buffers/

  • By Google
Something(Bwidgetscogs

Protobuf Schema (proto)

Protobuff Schema (json)

Others (unimplemented here)

  • Flatbuffers
  • Thrift
  • YAML
  • SOAP

Testing Methodology

Different Object Sizes

  • medium (~10kb)
  • large (~100kb)

Metrics

Content-Length (Uncompressed)

Content-Length (Gzipped)

Library overhead

Browser Runtimes

Browser: Encoding 10K

Browser: Encoding 100K

Browser: Decoding 10K

Browser: Decoding 100K

Node CPU Runtime

Node: Encoding 10K

Node: Encoding 100K

Node: Decoding 10K

Node: Decoding 100K

Other Languages

Go

https://github.com/alecthomas/go_serialization_benchmarks

Java

https://dzone.com/articles/is-protobuf-5x-faster-than-json

Summary

Browser Speed

Nothing beats JSON

Desktop Browser Inbound Data

You can save 8% with Avro (after gzip)

Desktop Browser Outbound Data

You can save up to ~30% with Avro
but that data is free anyway

Mobile Data

Data is not free for the user (in most cases)

  • mobile games
  • realtime stock or cryptocurrency data
  • bandwidth limited IoT

Server to Server

  • Speed* depends on your server languages
  • AWS still charges for outbound “Inter-region” data transfer
  • Gzip can almost always help (when the source is trusted)

Citations