JSON output for sq and also acceptance/integration tests for sq
(Related to sequoia!1116 (merged))
I would like to add JSON output to sq
, and these are my thoughts on
how to approach that. I would like feedback before I dig myself deep
in a hole.
Currently, sq has the following subcommands:
encrypt Encrypts a message
decrypt Decrypts a message
sign Signs messages or data files
verify Verifies signed messages or detached signatures
key Manages keys
keyring Manages collections of keys or certs
certify Certifies a User ID for a Certificate
autocrypt Communicates certificates using Autocrypt
keyserver Interacts with keyservers
wkd Interacts with Web Key Directories
armor Converts binary to ASCII
dearmor Converts ASCII to binary
inspect Inspects data, like file(1)
packet Low-level packet manipulation
help Prints this message or the help of the given subcommand(s)
They output from these commands is either nothing, some status or progress messages, specific file formats such as ASCII armored data, or semi-structured ad hoc plain text output.
The output of sq
should be easy to use from programs (shell scripts,
Python scripts, Ruby scripts, etc). Writing a custom Rust program that
uses the Sequoia crate is not always feasible. Supporting script
writers well seems like an important thing for Sequoia to do. Thus,
JSON output for sq
. JSON is the obvious first choice for structured
output, as it's so very well supported by all scripting languages.
I propose to add an option --output-format=json
to each subcommand
that produces structured output. The option will also accept the value
default
, to produce current output, which may or may not be textual.
Other formats can be added later, if there's use for them.
For future-proofing, the JSON output will be versioned. For this, the
output will always be a JSON object (i.e., dict, hashmap, set of
key/value pairs), and there will always be the key
sequoia-json-format
that specifies the version of the JSON schema
being used. For this proposal, the version is a list of the integer 1
(i.e., [1]
). It's a list of integers to allow, say, version 1.2
(list [1,2]
) or 3.14.15 ([3,14,15]
), later on, if that makes sense
in the future.
I propose to add JSON output to at least inspect
and packet dump
to start with. Once we have JSON support for those, adding support for
other commands should be straightforward. (For example, listing keys
in a key ring.)
The output of sq inspect
might look like this:
{
"sequoia_json_version": [1],
"sq_operation": "inspect",
"filename": "liw.key.pgp",
"file_type": "transferable-secret-key",
"user_ids": [
"Lars Wirzenius <liw@liw.fi>"
],
"main_key": {
"fingerprint": "F8D3A8621B90C8589CE5B919627D12E85029C0A3",
"algorithm": "RSA",
"usage": ["encrypt", "sign"],
"key_size": 4096,
"secret_key_encrypted": false,
"creation_time": "2021-08-03 11:53:16 UTC",
"expiration_time": "2024-08-03 05:19:37 UTC (creation time + P1095DT62781S)",
"key_flags": ["certification"]
},
"subkeys": [
{
"fingerprint": "D00EDB7017C9EAE3C6D438B7BFFD4B52F08C7AB1",
"algorithtm": "RSA",
"usage": ["encrypt", "sign"],
"key_size": 4096,
"secret_key_encrypted": false,
"creation_time": "2021-08-03 11:53:16 UTC",
"expiration_time": "2024-08-03 05:19:37 UTC (creation time + P1095DT62781S)",
"key_flags": ["sign"]
}
]
}
Notes on the above example:
- I added a field for the
sq
subcommand being used. Not sure that's useful, but just in case. - Timestamps are textual so that they are human-readable. Additional fields with the same timestamp a Unix timestamp (seconds since 1970 started, in UTC) might also be good for scripting.
Implementation thoughts
Currently inspect
and packet dump
produce output via write!
calls interspersed in the code that examines the parsed input. This is
an obvious way to implement the functionality, but makes supporting
another output format painful.
I propose to change this to form a data structure that can be
serialized to JSON using serde_json
. This would make JSON output
trivial, and would make it quite hard to produce invalid JSON, such as
accidentally leaving a trailing comma. It would also allow "pretty
printed JSON" for free.
The current human-oriented textual output should remain. It would probably mean writing a custom serde serializer, but that is not very difficult.
Using serde would make it fairly easy to support other output formats as well, should that become interesting: YAML, TOML, or S-expressions, for example.
I can see two approaches to using serde: either implement serde serialization for existing types for keys, packets, etc, or introduce new, lightweight types just for output purposes. I'm not yet familiar enough with the Sequoia types and code structure to know which approach would be better.
Approach 1: Implement serialization for
sequoia_openpgp::packet::Key
, sequoia_openpgp::packet::Signature
,
and so on, and change inspect
and packet dump
to use those to
produce output.
Approach 2: Create new types for the things that we want to output,
and serialization for those. For example, a struct for the inspect
output, and additional structs for the key output.