This project is archived. Its data is read-only.

V2

Data-encoding-V2

Slack channels: #data-encoding #data-encoding-dev

Motivation

Objective: Provide modern consistent and usable data de/serialisation library to Octez
Key Result: Measurably more efficient de/serialisation
Key Result: Library users (Octez dev) empowered

The current state of the data-encoding library is causing friction in the development of Octez. Incremental changes to the library is difficult because of its current state (architectural and design issues) and its tight coupling with Octez (each incremental change requires a full Octez tree smash).

Scope

The goal of this milestone is to implement a new version of data-encoding:

with clean architecture and design from the get-go;
with a dedicated legacy-compatibility mechanism to ease integration;
with more efficient de/serialisation primitives.

Work Breakdown

Basic library skeleton with just enough flesh to try it out

project boilerplate: build files, ci files, etc.
library breakdown with symmetry and clear separation of concern: data-encoding, binary-data-encoding, json-data-encoding
test framework: PBT, expect tests
80% functionality: support for common types, but not for everything (no recursive encodings)

Underlying buffer library with support for suspend-resume 135

bytes destination with suspend/resume
string sources with suspend/resume
pushing state down into destinations and sources: size-limits, size-header info, offsets

Optimised writing and reading functions

(1 day) benchmarks before (with improvements to the benchmarking framework)
(1 day) partial-application to avoid interpretation during reading
(1 day) partial-application to avoid interpretation during writing
(1 day) benchmark after (with additional tweaks for the partial-applications)
(2 days) better benchmarks with dedicated library and ppx

JSON one-shot parsing

Read JSON data directly into the final OCaml reprensentation (rather than in two steps).

(2 day) Improve support for UTF-8
(3 days) Simple JSON parsing
(4 days) Inlined parsing-and-deserialisation

Full coverage of OCaml types and type constructors

(2 day) remaining 20% functionality
- recursive encodings
- uniform metadata for custom encodings
- int types (uint8, uint16, uint30, uint62, int8, int16, int31, int32, int63, int64)
- arbitrary precision integers (N, Z)
- extended int types (uint32? uint64? uint24? int24?)
- floats
(2 days) common abstraction layer
(2 days) error messages (including tests)

V1 compatibility mechanism

Allows to use V1 encodings inside of V2 (and?or? vice-versa)

Encoding.(list `Uint32 (tuple [ int32; of_v1 operation_encoding ]))

(2 days) merging the two development trees
(1 day) use V2 buffers (buffy) in V1 backend
(1 day) call-out simple implementation
(2 days) simple translation for simple encodings
(?? days) explore the possibility to do more complete translations

Compact

Similar to V1's compact mode, but with a simpler API.

Encoding.(obj ~compact:true (req "small" (union `Uint2 …)) (req "smaller" (union `Uint1 …)))

(1 day) adding support for `Uint1 to `Uint7 in union tags (but not in general)
(2 days) adding support for compact binary tups
(1 day) adding support for high-level API

Slicing and hashing

Allows feeding the bytes directly to an incremental hashing function (or another function).

let hasher = Hash.Incremental.init () in
let dst = Buffy.Dst.of_consumer (Hash.Incremental.feed hasher) in
Binary_data_encoding.Writer.writek encoding dst value;
Hash.Incremental.current hasher

more generic destination interfaces (decouple from bytes)
hasher destinations
slicer destinations

Partial getter with masks

Avoids decoding parts of the data we are not interested in.

let mask = Mask.array (tuple [ ignore; ignore; int64; ignore ]) in
Binary_data_encoding.Reader.with_mask mask encoding data

basic support for tuples
basic support for lists, arrays, and other collections
basic support for unions
[ ]

Schema generation

Schema definitions
Schema pretty-printing
Schema production

Follow-ups

Integration in Octez (Note: An important reason for chosing a full rewrite of data-encoding (rather than incremental changes) is that integration is easier: each component of Octez can be upgraded to use the new library one after the other which works better with our MR review and merging process.)

Deprecation of V1