Things to think about when redesigning YAML

Background

From performance and memory usage perspective, YAML-loading related parts of the code base could make use of some improvements, especially to improve CLI's startup performance for things like bst show.

I am sure this has been discussed and attempted previously: you can find the most recent discussions about it here: https://mail.gnome.org/archives/buildstream-list/2019-March/msg00017.html. I guess, now it can also be a good time to attack it, considering the metrics/benchmarking efforts ongoing. I wanted to create this ticket to capture decisions, things we find while working, and hopefully the finalized roadmap.

Things that complicate a possible redesigning we have noticed so far are:

  • Generally the provenance related logic
    • We mostly use provenance when we encounter an error and want to print useful things. Most of the time, we do not need it.
    • In addition, provenance objects are also used for different reasons, %90 in source.py and %10 in element.py, complicating things..
    • see git grep -n "provenance\." and git grep node_get_provenance .
  • Family of mutable composite functions _yaml.py and the places those are used in the rest of the codebase.
    • See git grep -n -p "composite(".
    • _project.py, element.py and some other small parts seem to be squashing configuration dictionaries using that. There are around 350 lines in _yaml.py for those operations.
  • Includes functionality and Variable substitution
  • The parts in the codebase that does isinstance checks on the raw values and does .get on the node object instead of using node_get, sometimes in order to allow Union[dict, str] types in the yaml files. Not having a clear and restricted API allows other parts to do grow more complex.
    • See git grep -n isinstance | grep -v _frontend | grep -v "_yaml\.py" | grep -v "tests"

I think the points above needs to be thoroughly discussed since a possible redesign/refactor would probably touch all of these. It is possible that points above could be %20 of the case while we are trying to speed up %80 - redesign can turn out to be simpler that we've imagined if we change(fix?) other things first.

Task description

Acceptance Criteria