Geo POC - What is running on my Geo Site? (Geo Site Activity)
Current Iteration
This issue is in a sort of flux currently and has undergone another iteration! Please check out below on the current conversations: #363637 (comment 969894457)
############################
############################
############################
Everything in description below here may be outdated as of right now!
############################
############################
############################
Prelude
Have you ever looked at the Geo Admin landing page and wondered... What are my sites doing right now?
Recently Geo has had the opportunity of onboarding some new team members and I've been hearing questions like this as well as:
- What does the health status mean?
- How do I know if a site is properly replicating?
- My site is unhealthy and a few different replication bars have errors, what happened?
This brought up more questions of my own around trying to understand what is actually happening on my Geo Sites?
The UI
Our UI currently provides users with:
- A high level overview of the Site's health at a glance (
admin/geo/sites
) - A very granular line by line view at the actual replication results (
admin/geo/sites/${id}/replication/${replicable}
)
We are missing a very important part of this experience. How does replication happen, when does it happen, and where does it happen?
Introducing Geo Activity (POC)
Please check this out live (with mock data) by checking out this MR: !88701 (closed)
The Geo Activity is a Proof of Concept (POC) effort I have been exploring these past few weeks through conversations with team members and through understanding of other parts of the GitLab UI.
Through this research I feel what is happening in Geo at a high level is very similar to what happens in our CI/CD.
- A job is triggered
- A job runs
- things happen
- A result is reached
This is why I started looking at our Pipeline components and tried to visualize how we could use this as a way to expose Geo's "behind the scenes" activity to the UI.
Activity | Activity (Job expanded) | |
---|---|---|
Primary Site | ||
Secondary Site |
The idea here is we show a date range filtered view into the activity that is currently happening on a particular site.
In the table you see 4 sections:
- Status
- This is the overview into the status of the entire activity (ie. Checksumming Projects)
- Action
- This is the action description as well as against which site it is happening (ie. Primary checksumming Projects to Secondary)
- Stages
- This is a view into the order of operations that need to happen for this particular action (ie, prepare, checksum, cleanup)
- Ideally these jobs will link to some sort of bash log dump (similar to the CI/CD) where you can see line by line what is happening.
- Operations
- This is where you can cancel, retry, fire manual actions
Disclaimer
I am a novice when it comes to the deep replication actions of Geo when it comes to the inter-workings on the backend. This is how I perceive things to be happening. I hope I am not too far off the mark. If I am I can go back to the drawing board with the designs but I believe this information could be very informative to a user trying to better understand what "Geo is doing".
Questions
- Does this sort of representation even make sense for this data?
- Do we have the ability to store and track Geo replication and verification in a similar fashion to CI/CD?
- What sort of changes would need to happen for us to start tracking and organizing this data as jobs are fired?
- Does filtering these jobs by a date range (say no more than 31 days) give us enough without overwhelming us with info (ie no more than hopefully 100ish jobs in a month)?
Next Steps
I'd like to gather some feedback from @geo-team on their thoughts towards this.
Feedback on:
- The idea of exposing this data
- The feasibility of exposing this data
- The design of this data presentation
- Anything else
😇
Also a special thank you to @juan-silva for pushing me to take initiative on things I believe would provide value to Geo.
As well as @sranasinghe and @sunjungp on bouncing this idea during our 1/1s