Improve JSON Performance of Unstructured Structs in Golang


A Go-related story about how PerimeterX trimmed 70% of they JSON parsing cloud costs by writing an open-source Golang library called marshmallow. 🍬🚀

Project Motivation

When we think about JSON parsing, the first use case that pops to mind usually involves a predetermined structure. It’s the simplest way for two components to interact: they both agree on a message schema and then use JSON to carry it around – super easy.

A popular and intuitive choice, although it’s not the most efficient.

When the structure of the message is fully expected, libraries that leverage code generation perform best. However, using other protocols to communicate improves performance by a lot more. For instance, strict, binary protocols like protobuf and avro would improve networking efficiency due to reduced message size and then trim down CPU usage due to a much more efficient encoding and decoding mechanism. 👏🏾

This is to be expected. Strict schema environments are not where JSON protocol really shines. JSON, however, possesses something that the others don’t: it fully specifies the name and the type of each field in the data. In other words, it carries both the data and the structure itself. This makes it the go-to choice for document databases and other non-schema-strict use cases. In our case, it was unstructured structs: when some of the fields are known and some aren’t. We’ll provide an example shortly.

Performance-Driven

The journey began with an important StackOverflow question. It simply asks what’s the best way to parse a JSON object when some of the fields are known and some aren’t. We were facing the same problem and actively looked for a solution, so we started digging into it, investigating and exploring the solutions proposed and any other solution we could find. 👩‍💻🕵🏾‍♀️

Using a Map

The first thing we can do is use a native map[string]any. This captures all the data and allows you to access it. However, it’s inefficient, inconvenient and unsafe. Consider the following use case: in order to determine whether a user is allowed to drive, you need to reference two specific fields from the data (age and has_drivers_license), then, iterate the rest of the fields and look for prior convictions.