Skip to content

Manifest: generate .opam and dune files

Romain requested to merge nomadic-labs/tezos:romain-manifest-opam into master

Status

Ready! We just want to announce on #devteam before we merge.

Context

When publishing opam packages, a lot of manual work is needed to edit the .opam files in particular (see #1453 (closed) and in particular #1453 (comment 625126630)). This takes about a day, for each release, and only if you're used to doing it. We want to make this process more automatic.

This MR proposes to generate .opam and dune files from an OCaml file, named manifest/main.ml, which contains a merge of all the information that is present in those .opam and dune files. With this, we would be able to generate alternative versions when we release opam packages, with all the manual work done automatically in a robust way (no magic seds :p).

Other Benefits

Generating .opam and dune files from an .ml file has other benefits than generating alternative versions for release (which is still the biggest selling point).

  • Almost no risk to have a dependency which is in the .opam and not in the dune and vice-versa.
  • Can prevent the same dependency being declared multiple times (there were a few occurrences of this in master).
  • Error-prone, complicated patterns like the void-for-linking trick in src/bin_node/dune can be factorised (see below).
  • We don't have to copy-paste the author, repository URL etc. (there were at least one package, lib_sapling, for which the author was different for probably no good reason).
  • If we want we can do analyses to, say, remove dependencies that are already present transitively, or generate a dependency graph.
  • One gets a type-checker and merlin (e.g. to jump to a package definition) when we update the manifest.
  • Prevents forgetting important stanzas like (package) in tests.
  • One could generate the opam tests in the CI from this .ml file (more robust than a shell script).
  • One could generate an opam lock file from the manifest.

The key take-away here is that dune and .opam files are DSLs (domain-specific languages). As most DSLs, they are not made to be Turing-complete. They are made to be descriptive. Which is nice for simple use cases, and not so nice for complex use cases, especially those that were not predicted by the authors of the DSL. This MR replaces those DSLs with an eDSL (embedded DSL). As it is embedded (in OCaml), you can declare functions to share code, use if statements, for loops or List.map, you can use the type-system to perform checks at compile-time, you can write runtime checks, etc.

In Tezos, we reached the point where we are no longer a simple use case a long time ago. In https://dune.readthedocs.io/en/stable/overview.html#project-layout, it is written:

It is recommended to organize your project so that you have exactly one library per directory.

But for a single subdirectory in src, we can have X opam packages declared in Y dune files which themselves declare Z (library) stanzas, with X ≠ Y ≠ Z.

Another issue is how we use the select stanza to declare optional packages to link with (in src/bin_node in particular). Using a (select) stanza to do that is a hack, which is very error-prone. See src/bin_node/dune in master. In fact it is so error-prone that in some releases, we forgot to link 2 protocols (!) while there were actually a trace of them in the part of the dune file that generates the .empty files (!!). Also, those optional dependencies were optional in dune, but not in .opam, as they were declared in the depends section, not the depopts section. By using an eDSL, we are able to share the code that handles this, and now we just have to add Optional in front of the package so that everything is handled mechanically.

Satellite MRs

The dune and .opam files that the manifest generates initially lead to a very large MR. It has been split to merge those generated dune and .opam files. This is the list of such past MRs:

How to Review

I suggest that you first get a taste of what it looks like to declare packages in manifest/main.ml. It contains the definition for almost all packages of src. Try to answer the following questions:

  • does it look easy enough to update dependencies and add new packages?
  • should the file be organized differently? do we want to split it?
  • how much of a benefit is it to use an eDSL instead of a DSL? (for this one in particular, look at the Protocol submodule)

Then, look at manifest.mli. This is the main API that we will use to declare our packages. Try to answer the following questions:

  • is it well-documented?
  • is the semantics too obscure?
  • are there things that sould be generalized (possibly later)?

In particular, look at the Dune submodule of manifest.mli. We use it to define Dune stanzas that are too specific to be put in the Package submodule of manifest.mli. For those specific case, Package lets you write Dune stanzas as Dune.s_expr values. So, try to answer the following questions:

  • is it easy to write Dune stanzas (the Dune.s_expr type), especially with the fact that it abuses the list constructors? you should already have had a taste of it if you read manifest.ml (search for Dune.);
  • there are some functions in the Dune module that allow to build some stanzas which we use often, and they may not be very ironed out yet, do we want to improve their interface before merging, or is it ok to do it later?

Then, you can have a look at the implementation details, in particular the GENERATOR part of manifest.ml, which contains the conversion from Package.t to Dune.t and Opam.t and generates the files.

Finally, take a step back and try to answer general questions:

  • Would it confuse devs too much to have to update main.ml instead of dune and .opam files?
  • Are the benefits worth the cost to learn how to update main.ml?
  • Should we instead revamp our packages completely to try to respect the rule "one library per directory", put .opam files at the root, etc. so that we no longer have to do manual work to update the .opam files?
  • Is there something else that could solve the problem instead? (E.g. generate .opam files from dune-project files — in my opinion this is quite useless as it actually amounts to writing the exact same information in a different file.)
  • Should we use existing libs to output dune and .opam files instead? (My opinion is no, because this is actually very easy to do and since this code is needed to bootstrap the project, it's better if it has absolutely no dependency except on ocamlc.)

Manually testing the MR

Run make -C manifest from the root of the tezos repository. You don't need make build-deps to do that, you only need ocamlc and make in the PATH. This generates all dune and .opam files that are declared in manifest/manifest.ml.

Future Work

We plan to use the generator to generate variants of .opam files for releases. The work is actually already done in a branch: romain-manifest-release-packages

Some libraries and executables are not yet declared in manifest/manifest.ml. The plan (which was approved during an Octez Merge Team meeting) is to do it later.

  • All files under src/proto_* are not manifest-generated. One reason is that protocols already have a generation mechanism (dune.inc). The goal would be to port the dune.inc generator to manifest/manifest.ml, but this would in itself already be quite a large MR to review. Similarly, for the daemons (baker, endorser, accuser) we may want to share code, so the port to manifest should not be as mechanic as for other libraries.

  • lib_time_measurement/ppx is not manifest-generated. This is because its dune files are very different from other libraries. In fact, they are in part generated. So once again, we would probably want to port their generator to the manifest, but this would be a significant MR to review. Also, since it is only used for test, this may not be needed for the original goal which is to automatically publish opam packages for end-users.

  • Tezt is not manifest-generated. It should not be hard to do, but it's only tests, that are not even run by Opam. So, again, this is not needed for the automatic publishing of opam releases.

Edited by Romain

Merge request reports