FEATURE: Validator Startup and Churn with no staking
This issue is to note the discussion Johnny & JP had about the desired churn AFTER a 2 node network is live. ## Context Two Service Nodes have synced a genesis file via git, and have successfully started a live chain each with: - Observer - Validator - Signer (and TSS) The Genesis file imported the key-sets for both nodes. The statechain accepts transactions only from whitelisted `bep` addresses. Each key-set: - Operator `bep` addr - Admin `bnb` addr - Observer `bep` addr - Validator `bepv` addr - Signer `bnb` addr ### StateChain Modes **Add-Node**: The statechain will enter mode to increment the number of Service Nodes as soon as they come online. (`2 of 2` -> `2 of 3` -> `3 of 4`), ensuring `t of n` where `t >= 0.67 * n` **Churn Mode**: The statechain will not add validators, but will churn at regular intervals with new nodes as they come online. ### Service Node Status Four different Status of Service Nodes: * `whitelisted` means they are whitelisted and can make statechain tx * `standby` means they have sent in their full key-set * `nominated` means they have been nominated to be the next to join * `ready` means they have passed all liveness tests and are about to be added * `active` means they are actively in consensus, observing and signing. * `queued` means they have been queued to be churned out. ## ADD-NODE MODE Existing operators can put stat chain into this mode, or it can start automatically. We should have *multiple* Nodes queued up to join for redundancy. 1-3 New Service Node (SN) Operators should apply. ### Step 1) Apply for Whitelisting via BNB Stake Tx A new SN candidate stake Xm RUNE via staking command from their Admin `bnb` addr > Xm = 10m Rune for now. `APPLY:bepxxxoperator` with Xm RUNE * Rune is sent to the pool * Observers report * Statechain: - if <Xm RUNE, refund - if >= Xm RUNE, then whitelist `bep` address New Operator is now `whitelisted` can now make statechain tx. Repeat for all new Operators. ### Step 2) New Operators send key-set New Operator is now whitelisted can send statechain tx. * Download `sscli` * Send `sscli` Node Operator keyset with: - Observer `bep` addr - Validator `bepv` addr - Signer `bnb` addr New Operator is now `standby`. The Statechain canonically orders the new Operators based on mathematical ordering of their pub keys. Thus, if we have multiple new nodes, then we know who will be joining first. They will be ordered as soon as they add their keys. ### Step 3) Schedule Add-Node Height Since the Statechain is in `ADD-NODE` mode, it will designate a blockheight `h` blocks ahead (100) when it will add new node if all validation passes. At this point, we nominate the highest ordered node so we all know who is joining even if more join. Highest ordered node: `nominated` ### Step 4) TSS Signer MPC, Pool Tx, Observe As soon as Step (3) is done, the existing signers enter MPC key-gen, and whitelist the nominated signer based on their signing address they applied with (Step 2). New service node operator should run `tssd` AND `observerd` which will: - Join MPC and output a new pool address. - All signers broadcast this new pool address into the Pool: `POOL:bnbpooladdrnew` - All observers observe this pool transaction and put into statechain - If successful, upgrade new Operator to `ready` status This passes all liveness tests: 1) Validate TSS Key-gen passed 2) Validate new Signer is online (they made a signer tx) 3) Validate new Observer is online (they observed it) Failure Case: * If new service node operator is not in `ready` mode prior to blockheight `h`, then: 1) Kick failed operator to bottom of list 2) Re-nominate `h` to be 100 blocks ahead again 3) Next Node in the list is nominated for MPC. Repeat until: 1) We have added new Node! > We don't need to stop trying, since the statechain will keep going just fine. ### Step 5) Add Node Assuming Step (4) passed, then at blockheight `h`: - Increase validator set `n` - Increase `t` for consensus in Observer - Upgrade new Operator status to `active` ## CHURN NODE MODE Statechain has logic that will store a list of `whitelisted` Nodes and look to churn in a new node at a pre-defined block height schedule. **Ordering Validators** We need to canonically order validators so everyone knows who is churning. For now: First-in-Last-Out, with mathematical ordering of keys. the highest ordered (or oldest) active Node is `queued`. Later we add reputation to bump bad validators to the top of churn priority. Prior to starting Churn, we need 1-3 Service Nodes to come online and be available to churn in. Each time a new Service Node joins, then re-order the Service Node List. ### Step 1) Apply for Whitelisting via BNB Stake Tx As above ### Step 2) New Operator can send key-set As above ### Step 3) Schedule Churn-Node Height Since the Statechain is in `CHURN-NODE` mode, designate a block interval (1000) when it will churn new node if all validation passes. > For mainnet we will use weekly - monthly churn interval (TBD) KVStore is used to store: - blockheight `c` at which we churn - blockheight `c-100` at which we freeze the Service Node list and `nominate` a new node, and `queue` an old node. ### Step 4) TSS Signer MPC, Pool Tx, Observe At blockheight `c-100` we nominate and queue the respective nodes. The existing signers enter MPC key-gen with the nominated signer, and blacklist the `queued` signer (to prevent them joining the new committee) Key-gen, pool tx, observe, `ready` as above. Failure Case: * If new service node operator is not in `ready` mode prior to blockheight `c`, then: 1) Put failed node to bottom of list 2) Designate `c` to be 100 blocks ahead 3) Keep `queued` node, re-nominate the next node Repeat until: 1) We have a new Node! ### Step 5) Churn Node At blockheight `c`: - Upgrade new Operator status to `active` - Downgrade `queued` Operator status to `standby` - Remove old Validator from consensus - Remove old Observer from observing - Transfer all assets to new pool address - Refund all incoming asset transfers to the old pool address
issue