blaubart.com

Fast JSON Schema validation in Ruby

Aug 30, 2018
/assets/images/blog/blog-post-2.jpg
Walter White in a Chrysler 300 as seen in "Fifty-One" (S05E04, Breaking Bad)

Defining JSON APIs using JSON Schema is growing in popularity, and for good reason. A schema can help with validation, documentation, and type definitions. When it becomes an integral part of your stack, it’s important to make sure that validation in particular is performing well enough not to slow down your API.

What is JSON Schema?

JSON Schema is a vocabulary that you can use to annotate and validate JSON documents. It is particularly useful when implementing JSON (REST) APIs in a web application backend and offers several advantages over the use of framework or language-specific tools: You can formulate standardized schemas that can be shared across service and application boundaries easily. They can even be used as a basis for public API definition.

Here are a few JSON Schema examples. Unlike XSD (XML Schema), it’s relatively straightforward to parse and understand as a human - after all, it’s just another JSON document.

While it is still actively in development, it’s mature enough to be used in production systems. Libraries to support validation and more exist for all major programming languages. I’ve gradually introduced JSON Schema to several of the web applications I work with on a regular basis and it has become one of the go-to tools for me whenever building new JSON APIs.

JSON Schema validation in a Ruby web app

I routinely work with Ruby web application backends using either Sinatra or Ruby on Rails. The most common gem to enable JSON Schema validation in Ruby, at the time of writing, is Ruby JSON Schema Validator. It can be used in mutliple ways and comes with a number of advanced options. Bindings for ActiveRecord to integrate json-schema into Rails exist as well.

It’s been working perfectly for me and my team, until we started noticing major slowdowns in some of our API endpoints - sometimes upwards of 150ms. Using our profiler, we narrowed the problem down to the schema validation of the endpoint’s JSON response body. As a performance-minded engineer, this was not acceptable to me: the library incurred a significant overhead when validating large JSON payloads, which is common enough in our apps to have a negative impact on user experience. Alternative gems didn’t perform any better (many of which, it turned out, didn’t even implement the specification correctly).

Boosting performance by harnessing RapidJSON

I realized that, instead of writing a JSON Schema implementation in Ruby that would perform better, using an existing native code solution and writing a corresponding Ruby binding might be the way to go. After all, programming languages such as C and C++ are well-suited to high performance demands. After a quick search, I stumbled upon RapidJSON, which is not only one of the fastest JSON parsers (if not the fastest) out there, but also comes with a implementation of JSON Schema draft-04, one of the most recent drafts (the same that json-schema supports).

Even though documentation on how to write a Ruby library binding to native code has been hard to come by, I managed to create a functioning gem interfacing with the schema validation capabilities of RapidJSON over the course of a weekend. Here’s the results of a benchmark script comparing the performance of rj_schema with other gems:

report i/s x
rj_schema (valid?) (cached) 370.9 1
rj_schema (validate) (cached) 187.5 1.98x slower
json_schemer (valid?) (cached) 135.6 2.73x slower
rj_schema (valid?) 132.7 2.79x slower
rj_schema (validate) 96.9 3.83x slower
json-schema 10.6 34.92x slower
json_schema 3.7 101.26x slower

Based on the official JSON Schema test suite, rj_schema is about ~30 times faster compared to json-schema - the de-facto standard Ruby gem when it comes to JSON Schema. The overhead of 150ms melts down to roughly 5ms, which was confirmed by subsequent integration of rj_schema into our apps. While it arguably lacks the advanced options of json-schema and requires native code compilation, it reduces the footprint in terms of speed and memory consumption to an acceptable level - eliminating a bottleneck one ideally shouldn’t have in the first place.

In my opinion, rj_schema is a great example on how native code can be leveraged even when working on web applications and interpreted languages such as Ruby. I’ve open-sourced the gem, so please feel free to give it a try yourself. Feedback and pull requests are welcome!