Closed
Milestone
Data-Availability Layer (DAL): Harden tezt-cloud deployment, introduce an alert system, improve Ghostnet network monitoring
Description
This Milestone references the work in Q3 2025 enhancing tezt-cloud's deployment infrastructure.
Major improvements included introducing OpenTelemetry compatibility, refining metrics collection and reporting, and establishing alert manager system with Slack notifications for monitoring DAL performance and integrating it with Ghostnet network monitoring.
Documentation was strengthened through the addition of a CONTRIBUTING guidelines file, initializing a cookbook and an internal deployment journal.
A significant effort was also made to improve MacOS compatibility.
Work Breakdown
Metrics
-
!16022 (merged): Tezt/DAL: metrics published/attested_commitments_per_slot -
!15820 (merged): Tezt/Cloud: Make OpenTelemetry compatible with the Proxy mode -
!15761 (merged): Tezt/Cloud: Update bakers info dynamically -
!15760 (merged): Tezt/Cloud: Metrics compute a moving average -
!15663 (merged): Tezt/Cloud: Better metrics -
!15650 (merged): Tezt/Cloud: Enable to measure baker performance for DAL on ghostnet -
!15403 (merged): Tezt-cloud: no open-telemetry unless asked for -
!15146 (merged): DAL/Node: Export metrics from the KVS -
!14890 (merged): Tezt/Cloud: Add the OpenTelemetry stack
Alert system
-
!16001 (merged): Tezt/Cloud: Update alarm data -
!15903 (merged): Tezt/Cloud: extend prometheus alert jingoo object with a 'for_' field -
!16000 (merged): Tezt/Cloud: Implement a manual-confirmation option -
!15804 (merged): Tezt/Cloud: add hackish alert manager slack webhook receiver and adapt the LowDALAttestedCommitmentsRatio alert -
!15796 (merged): Tezt/Cloud: Refactoring of alert_manager -
!15786 (merged): Tezt/Cloud: add alert_manager.yml.jingoo template -
!15693 (merged): Tezt/Cloud: introduce Alert_manager module and use it for the DAL scenario -
!15674 (merged): Tezt/Cloud: Alerting when DAL on ghostnet is not working as expected -
!15642 (merged): Tezt/Cloud: Enable alert manager
Deployment
-
!15999 (merged): Tezt/Cloud: Update some octez binary files in proxy mode if necessary -
!15998 (merged): Tezt/Cloud: Short some of the terraform name variables -
!15907 (merged): Tezt/Cloud: Add a --no-docker-push CLI option -
!15811 (merged): Tezt/DAL: allow the user to provide a node and a DAL node identity when launching a bootstrap DAL node
Tezt-cloud improvements
-
!16110 (merged): Tezt/Cloud: Use an Os module -
!16108 (merged): Octez/P2P: Shutdown the socket gracefully -
!16106 (merged): Tezt/Cloud: Rename '--vms' into 'vms-limit' -
!16097 (merged): Tezt/Cloud/DAL: Use the external RPC server of the L1 node -
!16096 (merged): Tezt/Cloud: Fix dream error when there are no metrics -
!16080 (merged): Tezt/Cloud: Binaries-path can be specified on CLI -
!15974 (merged): Tezt/Cloud: Use dream instead of python http server -
!16079 (merged): Tezt/Cloud: Do not run alert manager stuff when there is no alert -
!16062 (merged): Tezt/Cloud: make the Metadata_size_limit option disableable -
!16059 (merged): Tezt/Cloud: fix website CSS -
!16028 (merged): Tezt/cloud: Put binaries as proxy files of the DAL scenario behind a CLI option -
!16003 (merged): Tezt/Cloud: Do not fetch large manager metadata anymore -
!15800 (merged): Tezt/Tezos: Fix the rpc endpoint of a node with a remote runner -
!15795 (merged): Tezt/Cloud: Making DAL optional -
!15791 (merged): Tezt/Cloud: Get the publisher of a commitment too -
!15784 (merged): Tezt/Cloud: Comments expliciting we are using exponential moving average -
!15738 (merged): Tezt-cloud: Fix network issue for macos users of docker run -
!15722 (merged): Tezt/Cloud: Add a CONTRIBUTING guidelines file -
!15719 (merged): Tezt/Cloud: Enable DAL "observers" on bakers' topics -
!15718 (merged): Tezt/Cloud: Restore public API of Tezt-Cloud -
!15708 (merged): Tezt/Cloud: Better website -
!15698 (merged): Tezt/Cloud: Hide network fundraiser key -
!15697 (merged): Tezt/Cloud: Start to implement compatibility with Mac OS/X -
!15696 (merged): Tezt/Cloud: Initiate the cookbook -
!15678 (merged): Tezt/Cloud: fix setup etherlink configuration parameter -
!15671 (merged): Tezt/Cloud: remove inappropriate unexpected error message -
!15669 (merged): Tezt/Cloud: add missing index.md.jingoo file -
!15649 (merged): Tezt/Cloud: GC for old levels -
!15643 (merged): Tezt/Cloud: Use jingoo to generate the website index -
!15641 (merged): Tezt/Cloud: Better DNS handling -
!15541 (merged): DAL/TeztCloud: add mainnet network to the list of available networks -
!15472 (merged): tezt cloud simplify dns ux -
!15401 (merged): Tezt/Cloud: Fix dockerfile UX -
!15398 (merged): Tezt/Cloud: pin grafana version -
!15346 (merged): Tezt/Cloud: Add main dns entry -
!15252 (merged): Tezt/Cloud/DAL: Enable to specify slot indices for producers -
!15069 (merged): Tezt/Cloud/DAL: update dockerfile and fix bootstrap node -
!15018 (merged): Tezt-cloud: use L1 node from scenario -
!15013 (merged): Tezt cloud/DAL+Etherlink: measure costs -
!15011 (merged): Tezt cloud/DAL: handle commitment publications inside batches -
!14998 (merged): Dal+etherlink/Tezt cloud: dedicated VM for Etherlink dal node -
!14915 (merged): Tezt/Cloud: allow to specify the octez release at the CLI -
!14882 (merged): DAL/Tests/Cloud: fix teztale-server's address -
!14722 (merged) Tezt/Cloud fixes dal node initialisation
Goals and deliverables
-
Make the infrastructure and processes to deploy on Ghostnet more robust -
Bootstrap and producer nodes running on Ghostnet -
Bootstrap node running on Mainnet
Loading
Loading
Loading
Loading