V2
Data-encoding-V2
Slack channels: #data-encoding #data-encoding-dev
Motivation
Objective: Provide modern consistent and usable data de/serialisation library to Octez
Key Result: Measurably more efficient de/serialisation
Key Result: Library users (Octez dev) empowered
The current state of the data-encoding library is causing friction in the development of Octez. Incremental changes to the library is difficult because of its current state (architectural and design issues) and its tight coupling with Octez (each incremental change requires a full Octez tree smash).
Scope
The goal of this milestone is to implement a new version of data-encoding:
- with clean architecture and design from the get-go;
- with a dedicated legacy-compatibility mechanism to ease integration;
- with more efficient de/serialisation primitives.
Work Breakdown
Basic library skeleton with just enough flesh to try it out
-
project boilerplate: build files, ci files, etc. -
library breakdown with symmetry and clear separation of concern: data-encoding
,binary-data-encoding
,json-data-encoding
-
test framework: PBT, expect tests -
80% functionality: support for common types, but not for everything (no recursive encodings)
135
Underlying buffer library with support for suspend-resume-
bytes destination with suspend/resume -
string sources with suspend/resume -
pushing state down into destinations and sources: size-limits, size-header info, offsets
Optimised writing and reading functions
-
(1 day) benchmarks before (with improvements to the benchmarking framework) -
(1 day) partial-application to avoid interpretation during reading -
(1 day) partial-application to avoid interpretation during writing -
(1 day) benchmark after (with additional tweaks for the partial-applications) -
(2 days) better benchmarks with dedicated library and ppx
JSON one-shot parsing
Read JSON data directly into the final OCaml reprensentation (rather than in two steps).
-
(2 day) Improve support for UTF-8 -
(3 days) Simple JSON parsing -
(4 days) Inlined parsing-and-deserialisation
Full coverage of OCaml types and type constructors
-
(2 day) remaining 20% functionality -
recursive encodings -
uniform metadata for custom encodings -
int types (uint8, uint16, uint30, uint62, int8, int16, int31, int32, int63, int64) -
arbitrary precision integers (N, Z) -
extended int types (uint32? uint64? uint24? int24?) -
floats
-
-
(2 days) common abstraction layer -
(2 days) error messages (including tests)
V1 compatibility mechanism
Allows to use V1 encodings inside of V2 (and?or? vice-versa)
Encoding.(list `Uint32 (tuple [ int32; of_v1 operation_encoding ]))
-
(2 days) merging the two development trees -
(1 day) use V2 buffers (buffy) in V1 backend -
(1 day) call-out simple implementation -
(2 days) simple translation for simple encodings -
(?? days) explore the possibility to do more complete translations
Compact
Similar to V1's compact mode, but with a simpler API.
Encoding.(obj ~compact:true (req "small" (union `Uint2 …)) (req "smaller" (union `Uint1 …)))
-
(1 day) adding support for `Uint1
to`Uint7
in union tags (but not in general) -
(2 days) adding support for compact binary tups -
(1 day) adding support for high-level API
Slicing and hashing
Allows feeding the bytes directly to an incremental hashing function (or another function).
let hasher = Hash.Incremental.init () in
let dst = Buffy.Dst.of_consumer (Hash.Incremental.feed hasher) in
Binary_data_encoding.Writer.writek encoding dst value;
Hash.Incremental.current hasher
-
more generic destination interfaces (decouple from bytes) -
hasher destinations -
slicer destinations
Partial getter with masks
Avoids decoding parts of the data we are not interested in.
let mask = Mask.array (tuple [ ignore; ignore; int64; ignore ]) in
Binary_data_encoding.Reader.with_mask mask encoding data
-
basic support for tuples -
basic support for lists, arrays, and other collections -
basic support for unions - [ ]
Schema generation
-
Schema definitions -
Schema pretty-printing -
Schema production
Follow-ups
Integration in Octez (Note: An important reason for chosing a full rewrite of data-encoding (rather than incremental changes) is that integration is easier: each component of Octez can be upgraded to use the new library one after the other which works better with our MR review and merging process.)
Deprecation of V1