Rollup node: optimize RISC-V refutation games and tick state caching

What

Reduce memory usage and improve performance of RISC-V refutation games in the smart rollup node, and add metrics-based monitoring of live PVM states during refutation game tests.

Closes TZX-77. Related to TZX-100.

Why

Unlike WASM which uses Irmin for its context (where checkouts are cheap copy-on-write operations and state sharing comes naturally through the content-addressed store), RISC-V PVM states live entirely in memory and context checkouts expensive and means each copy of a state actually consumes additional memory.

  • During refutation games, it's possible to keep the current state and run the PVM some more instead of making checkouts between blocks.
  • The mutable state implementation introduced a performance issue for RISC-V, where the node would create redundant copies of cached states.

We also had no runtime visibility into how many PVM states stay alive during a refutation game, making it difficult to catch regressions.

Performance improvement is as follows, when we look at the number of live states (note that accounting is sampling based and not very precise):

Immutable Mutable
before first commit 131 4
after first commit 133 3
after second commit 67 4

How

When computing dissection states for RISC-V, the node now reuses the in-memory PVM state across block boundaries instead of checking out from disk at each block. This means that we will prefer running the PVM until the end of the block instead of doing an extra checkout because it's worth it for RISC-V.

The tick state caches (local per-game and global shared) now share a single immutable copy instead of each creating their own. The RISC-V refutation test polls the metrics endpoint throughout the game and reports peak live state counts at the end.

MR Stack

  1. !21016 (merged) — Smart rollup node: add state_hash and pvm_status to store
  2. !20996 (merged) — Rollup node: optimize RISC-V refutation games and tick state caching 👈 you are here
  3. !19902 (merged) — Rollup node: Control frequency of context commits to disk
  4. !21156 (merged) — Rollup node/RISC-V: snapshot states to disk in caches for refutation games
Edited by Alain Mebsout

Merge request reports

Loading