Previous: (Part 2) Sanitize code-first OpenAPIs

When I receive a JSON that I have to document, I want to only edit relevant fields, but still see the hierarchy. This “annotator” tool will be primarily for wrangling OpenAPI schemas, but it must also be schema-agnostic, meaning that I can apply these principles to any JSON.

Background Context #

Despite claims that OpenAPI “is both human and machine readable” because it can be output in YAML, I would argue that it’s not human-friendly at all. Just look at the extended Petstore example, an incomprehendable, exhaustive contract for specifying an API without ambiguity, more for the machine’s benefit.

Someone asked in the Write The Docs Slack chat:

👩🏽‍💼 Hey folks.
A question about JSON: What would be a good way to document a ton of JSON parameters (type, possible values, dependencies, etc.)? Is there any tool that can help generate an API-reference-docs-like document?

Some of the replies:

👨🏻‍🦲 Yeah, i also recall having the same question for YAML, with the same answer sadly :disappointed:
They are intentionally and annoyingly designed for simplicity…

👩🏼‍🦰 … However, I have found https://stoplight.io/ is a really helpful tool for helping documenting JSON …

🙋🏻‍♂️ JSON Schema is really great, but it’s not documentation as such. I haven’t had a chance to try it on anything real yet, but this tool looks promising https://coveooss.github.io/json-schema-for-humans/

👨‍🏫 This is a real struggle. I usually step completely outside the object and just maintain a separately sourced nested structure of term/definitions, or else I use a dot-notation to indicate nesting, or both…

After reading the replies, it is good to know I’ve come to similar conclusions as the rest of the peeps. 👨‍🏫 had an implementation that was most similar to the direction I was taking.

I was working on “The Annotator,” a CLI program to flatten JSONs and output Markdown from OAS. I wrote it in JavaScript. It was hilariously over-engineered, and I was not happy. Back to the drawing board.

We want a format that looks like:

# ---------- This is the head ---------------
paths./store/order/{orderId}.get:

# ↓↓↓ These are the leaf nodes ↓↓↓
summary: Find purchase order by ID
description: For valid response try integer IDs with value <= 5 or > 10. Other values will generated exceptions
operationId: getOrderById

Extraction #

The first step is to extract information that already exists from the schema. My requirements:

  1. Use dot notation to flatten schemas
  2. Strip superfluous info (get leaf nodes)

Dot Notation #

Dot notation is basically how every programmer strings together logic. Wikipedia describes it as “Object-oriented programming as syntactic sugar for accessing properties.”. It is a breadcrumb that communicates hierarchical relationships.

Initially, I wanted to flatten a JSON into YAML. This is so that editing is linear, and I don’t have to traverse endless brackets.

This JSON

{
"summary": "An example.",
"item": {
"name": "Trinket of gold",
"description": "A hypothetical trinket",
"components": [
{
"name": "a",
"description": "b"
"value": 8000
},
{
"name": "test grounds",
"description": "Lorem ipsum"
"value": 69
},
{
"some": {
"deeply": {
"nested": {
"object": {
"description": "Yikes"
}
}
}
}
]
}

… becomes a flat YAML in dot notation:

summary: An example.
item.name: Trinket of gold
item.description: A hypothetical trinket
item.components.0.name: a
item.components.0.description: b
item.components.0.value: 8000
item.components.1.name: test grounds
item.components.1.description: Lorem ipsum
item.components.1.value: 69
item.components.2.some.deeply.nested.object.description: Yikes

However, there is a lot of tedious redundancy. How much redundancy is acceptable? In OAS, almost any object can have a .description, and that’s all I want.

JSON with description only:

{
"item": {
"description": "A hypothetical trinket",
"components": [
{
"description": "b"
},
{
"description": "Lorem ipsum"
},
{
"some": {
"deeply": {
"nested": {
"object": {
"description": "Yikes"
}
}
}
}
]
}

As YAML:

item:
description: A hypothetical trinket
components:
- description: b
- description: Lorem ipsum
- some:
deeply:
nested:
object:
description: Yikes

This is okay, but with a larger schema, there will be many more properties that aren’t related to description. Too many nested indents are hard to read. I am only concerned with leaf nodes, the last items in a tree.

Leaf Nodes #

With this structure, the leaf nodes are always at a single level of indentation, a linear convenience. The hierarchy can still be gleaned at once.

summary: An example.
item:
description: A hypothetical trinket
item.components.0:
description: b
item.components.1:
description: Lorem ipsum
item.components.2.some.deeply.nested.object:
description: Lorem ipsum

And, we if want to gather additional properties, such as .summary and .name, they will be displayed too:

summary: An example.
item:
name: Trinket of gold
description: A hypothetical trinket
item.components.0:
name: a
description: b
item.components.1:
name: test grounds
description: Lorem ipsum
item.components.2.some.deeply.nested.object:
description: Yikes

Implementation #

As explained in my previous article, I chose jq for its simplicity and cross-platform compatibility. The rest of this post assumes that you know how to use the CLI.

To use these scripts, clone the OpenAPI Utils Repository.

Usage #

jq -rf extract.jq 'schemas/petstore.json'

By default, it will extract operationId, name, summary and description into JSON:

{ ...
"paths./pet.put": {
"summary": "Update an existing pet",
"description": "Update an existing pet by Id",
"operationId": "updatePet"
},
"paths./pet.put.responses.200": {
"description": "Successful operation"
},
"paths./pet.put.responses.200.content.application/xml.example.value": {
"name": "doggie"
},
"paths./pet.put.responses.200.content.application/xml.example.value.category": {
"name": "Dogs"
},
"paths./pet.put.responses.200.content.application/xml.example.value.tags.0": {
"name": "string"
},
"paths./pet.put.responses.400": {
"description": "Invalid ID supplied"
},
"paths./pet.put.responses.404": {
"description": "Pet not found"
},
"paths./pet.put.responses.405": {
"description": "Validation exception"
},
"paths./pet.put.requestBody": {
"description": "Update an existent pet in the store"
},
...
}

(extract.jq source)

Specifying --arg yaml true will output YAML.

jq -rf extract.jq --arg yaml true 'schemas/petstore.json'

We get the desired structure:

paths./store/order/{orderId}.get.parameters.0:
name: orderId
description: ID of order that needs to be fetched
paths./store/order/{orderId}.get.responses.200:
description: successful operation
paths./store/order/{orderId}.get.responses.400:
description: Invalid ID supplied
paths./store/order/{orderId}.get.responses.404:
description: Order not found
paths./store/order/{orderId}.delete:
summary: Delete purchase order by ID
description: For valid response try integer IDs with value < 1000. Anything above 1000 or nonintegers will generate API errors
operationId: deleteOrder
paths./store/order/{orderId}.delete.parameters.0:
name: orderId
description: ID of the order that needs to be deleted
paths./store/order/{orderId}.delete.responses.400:
description: Invalid ID supplied
paths./store/order/{orderId}.delete.responses.404:
description: Order not found

At the end of the command, specifying --args (with an s) and desired properties will only extract those properties.

Get version and tags

jq -rf extract.jq --arg yaml true 'schemas/petstore.json' --args 'version' 'tags'
info:
version: 1.0.6
paths./pet/findByStatus.get:
tags: [
"pet"
]
paths./pet/findByTags.get:
tags: [
"pet"
]
paths./pet/{petId}.get:
tags: [
"pet"
]
...

Using the yq utility #

My YAML formatting is pretty bad, so use yq instead:

jq -rf extract.jq "schemas/petstore.json" --args "tags" "version" |
yq eval -P > tagged.yml

Creates a prettier file called tagged.yml:

info:
version: 1.0.6
paths./pet.post:
tags:
- pet
paths./pet.post.responses.200.content.application/xml.examples.example-1.value:
tags:
- id: 0
name: string
...

Insertion #

Once I have my YAML file, I can edit it however I want. Then, I need to update the schema with my edits.

In tagged.yml, change the version to 1.0.7 :

info:
version: 1.0.7
paths./pet.post:
tags:
- pet
paths./pet.post.responses.200.content.application/xml.examples.example-1.value:
tags:
- id: 0
name: string

Use yq for converting YAML to JSON:

yq eval -j "tagged.yml" |
jq -f insert.jq --slurpfile oas "schemas/petstore.json"

The JSON will be updated with the new value:

  "info": {
"description": "This is a sample Pet Store Server...",
"version": "1.0.7",
}

(insert.jq source)

Multiple annotation files #

First merge multiple YAMLs with yq, then pipe to jq.

yq eval-all '. as $item ireduce ({}; . * $item )' tagged.yml anotherAnnotation.yml |
jq -f insert.jq --slurpfile oas "schemas/petstore.json"

The End #

That’s “The Annotator” for you, which is really more a JSON Leaf Extractor. It achieves part of my overwrite mechanism.

The annotation file can be a way to add custom properties. For example, I may want to add an x-afterword property. Many have created their own extensions for custom data design, as proven by this list of OAS x-tensions found in the wild.

Next up, rendering an OAS into a documentation page in Markdown. I am eyeing RapiDoc, as it looks nice and it’s easily customizable. Cheers! 🍻