JSON output for sq and also acceptance/integration tests for sq

(Related to !1116 (merged))

I would like to add JSON output to sq, and these are my thoughts on how to approach that. I would like feedback before I dig myself deep in a hole.

Currently, sq has the following subcommands:

encrypt      Encrypts a message
decrypt      Decrypts a message
sign         Signs messages or data files
verify       Verifies signed messages or detached signatures
key          Manages keys
keyring      Manages collections of keys or certs
certify      Certifies a User ID for a Certificate
autocrypt    Communicates certificates using Autocrypt
keyserver    Interacts with keyservers
wkd          Interacts with Web Key Directories
armor        Converts binary to ASCII
dearmor      Converts ASCII to binary
inspect      Inspects data, like file(1)
packet       Low-level packet manipulation
help         Prints this message or the help of the given subcommand(s)

They output from these commands is either nothing, some status or progress messages, specific file formats such as ASCII armored data, or semi-structured ad hoc plain text output.

The output of sq should be easy to use from programs (shell scripts, Python scripts, Ruby scripts, etc). Writing a custom Rust program that uses the Sequoia crate is not always feasible. Supporting script writers well seems like an important thing for Sequoia to do. Thus, JSON output for sq. JSON is the obvious first choice for structured output, as it's so very well supported by all scripting languages.

I propose to add an option --output-format=json to each subcommand that produces structured output. The option will also accept the value default, to produce current output, which may or may not be textual. Other formats can be added later, if there's use for them.

For future-proofing, the JSON output will be versioned. For this, the output will always be a JSON object (i.e., dict, hashmap, set of key/value pairs), and there will always be the key sequoia-json-format that specifies the version of the JSON schema being used. For this proposal, the version is a list of the integer 1 (i.e., [1]). It's a list of integers to allow, say, version 1.2 (list [1,2]) or 3.14.15 ([3,14,15]), later on, if that makes sense in the future.

I propose to add JSON output to at least inspect and packet dump to start with. Once we have JSON support for those, adding support for other commands should be straightforward. (For example, listing keys in a key ring.)

The output of sq inspect might look like this:

{
    "sequoia_json_version": [1],
    "sq_operation": "inspect",
    "filename": "liw.key.pgp",
    "file_type": "transferable-secret-key",
    "user_ids": [
        "Lars Wirzenius <liw@liw.fi>"
    ],
    "main_key": {
        "fingerprint": "F8D3A8621B90C8589CE5B919627D12E85029C0A3",
        "algorithm": "RSA",
        "usage": ["encrypt", "sign"],
        "key_size": 4096,
        "secret_key_encrypted": false,
        "creation_time": "2021-08-03 11:53:16 UTC",
        "expiration_time": "2024-08-03 05:19:37 UTC (creation time + P1095DT62781S)",
        "key_flags": ["certification"]
    },
    "subkeys": [
        {
            "fingerprint": "D00EDB7017C9EAE3C6D438B7BFFD4B52F08C7AB1",
            "algorithtm": "RSA",
            "usage": ["encrypt", "sign"],
            "key_size": 4096,
            "secret_key_encrypted": false,
            "creation_time": "2021-08-03 11:53:16 UTC",
            "expiration_time": "2024-08-03 05:19:37 UTC (creation time + P1095DT62781S)",
            "key_flags": ["sign"]
        }
    ]
}

Notes on the above example:

  • I added a field for the sq subcommand being used. Not sure that's useful, but just in case.
  • Timestamps are textual so that they are human-readable. Additional fields with the same timestamp a Unix timestamp (seconds since 1970 started, in UTC) might also be good for scripting.

Implementation thoughts

Currently inspect and packet dump produce output via write! calls interspersed in the code that examines the parsed input. This is an obvious way to implement the functionality, but makes supporting another output format painful.

I propose to change this to form a data structure that can be serialized to JSON using serde_json. This would make JSON output trivial, and would make it quite hard to produce invalid JSON, such as accidentally leaving a trailing comma. It would also allow "pretty printed JSON" for free.

The current human-oriented textual output should remain. It would probably mean writing a custom serde serializer, but that is not very difficult.

Using serde would make it fairly easy to support other output formats as well, should that become interesting: YAML, TOML, or S-expressions, for example.

I can see two approaches to using serde: either implement serde serialization for existing types for keys, packets, etc, or introduce new, lightweight types just for output purposes. I'm not yet familiar enough with the Sequoia types and code structure to know which approach would be better.

Approach 1: Implement serialization for sequoia_openpgp::packet::Key, sequoia_openpgp::packet::Signature, and so on, and change inspect and packet dump to use those to produce output.

Approach 2: Create new types for the things that we want to output, and serialization for those. For example, a struct for the inspect output, and additional structs for the key output.