Geo Sites - Representing Replicables in their Actual States
Links
For an overload of context please see: #363637 (closed)
Most relevant thread: #363637 (comment 969894457)
Related issues: #218329 (closed), #363669 (closed)
What / Why
The Big Why: Geo currently represents data using a Queued state that doesn't actually exist in the Geo Replicable State Machine. This causes a ton of confusion on locating what a Queued item even is.
We have a State Machine that manages every Replicable Registry's state but we don't currently use it anywhere in the Geo UI.
The proposal here is to introduce every state available to the UI to better represent a holistic view into what state everything is from the Geo Sites overview page (/admin/geo
).
The State Machine
Important: We have two activities we need to look at, Sync and Verification.
Sync States | Verification States |
---|---|
pending: 0
started: 1
synced: 2
failed: 3
|
verification_pending: 0
verification_started: 1
verification_succeeded: 2
verification_failed: 3
verification_disabled: 4
|
The State Machine Diagram
Replication states
Some allowed transitions are omitted for clarity.
stateDiagram-v2
Pending --> Started
Started --> Synced
Started --> Failed
Synced --> Pending: Mark for resync
Failed --> Pending: Mark for resync
Failed --> Started: Retry
Verification states
Some allowed transitions are omitted for clarity.
stateDiagram-v2
Pending --> Started
Pending --> Disabled: No primary checksum
Disabled --> Started: Primary checksum succeeded
Started --> Succeeded
Started --> Failed
Succeeded --> Pending: Mark for reverify
Failed --> Pending: Mark for reverify
Failed --> Started: Retry
How the Frontend Works
Currently the frontend consumes the counts from the Geo Nodes API and generates the count information:
- Fetch Geo Status Info: https://docs.gitlab.com/ee/api/geo_nodes.html#retrieve-status-about-all-geo-nodes
- Loops through each replicable type and gathers the counts
- For LFS Objects (example)
- Get Sync Counts
- Synced:
lfs_objects_synced_count
- Failed:
lfs_objects_failed_count
- Queued:
lfs_objects_count
-Synced
-Failed
- Synced:
- Get Verification Counts
- Verified:
lfs_objects_verified_count
- Failed:
lfs_objects_verification_failed_count
- Queued:
lfs_objects_count
-Verified
-Failed
- Verified:
- Represent the data in the Progress bars on the Geo Sites UI
Proposal
For a first iteration the proposal here is to expose every state from the state machine into the API similar to how we have failed and synced in it currently. Then we can consume the counts in the same fashion and represent them in the progress bar. The UI is written in a generalized way so that it can handle this additional information with very little code change.
POC
There was an POC MR that was created to showcase how this would work: !89174 (closed)
This MR hints on the potential issue that this many states in the progress bars may be a bit overwhelming. Hinting that we may have outgrown this UI component and as a later iteration we may want to represent this data differently.
Challenges
- @brodock brought up some concerns on the reliability of PostgreSQL in regards to this sort of "live" data: #363637 (comment 1065102798)
- We have some legacy Geo replicables that are not in SSF yet. The solution for those can be read up on the Expanding on our Existing UI (Geo Sites) section of this thread: #363637 (comment 969894457)
- As noted above the progress bars may not be the best UI component for this amount of data
Special thank you to @mkozono for the work on exposing the Geo SSF State Machine