Requesting volunteers to own feature areas to validate on staging after GCP migration failover
From the last discussion in #330 (comment 72461420) with @grzesiek and @andrewn
We will need help with the rest of engineering to test the following features. Currently, we have volunteers from the production and geo teams but this is not enough #330 (comment 71582338) However, not all of the engineers involved in the GCP migration has the adequate setup to run all these tests. It's more effective to delegate these areas to the original teams who wrote the features to verify since they should already have these setup on staging and production as part of the development workflow.
@dhavens @tommy.morgan can both of you volunteer a member on your team to help us out with this, please ?
This directly helps with our GCP migration efforts. Please understand that this is one of the most important initiatives for the company and we appreciate all the help from everyone.
README
If you are volunteering, please make yourself available to join the GCP Migration Rehearsal call. Ping @andrewn or @meks for an invite. The next one is on Friday, May 25 6:00am – 8:00am Pacific Time
- Zoom link: https://gitlab.zoom.us/j/859814316
- Slack channel:
#gcp_migration
The staging migration rehearsal steps can be found here https://gitlab.com/gitlab-com/migration/blob/master/.gitlab/issue_templates/failover.md
- In this failover plan there is a section for QA during Blackout which is when the tests in the test plan will take place.
- Link to the section for QA during Blackout
- Link to the test plan
Note: Any setup for validating the features has to be completed BEFORE the failover. This is so that the state gets carried over to the new env in GCP from staging. We want to shorten the Black out period as much as we can.
Test plan
We aim to keep the test plan as lightweight yet effective as possible. For manual clicking through validations, we are keeping it free form. Just enough description to tell people what to test helps fuzzes the workflow up. We are more likely to catch more critical bugs that way. This is a page taken from the book How Google Tests Software
. For running command in steps please feel free to add more information as sub-bullet items in your respective areas.
Note: As of 2018-05-25, the team decided to move to Google Sheets to avoid mid-air update collisions during the critical phase of the test run.
During the Blackout
Tests that run during the Blackout are high priority. These are the areas that needs to be validated before we cross the point of no return (rollback of migration)
- Repository Operations - @mkozono
- pushing to protected branch (provide access when it should and protect when it should)
- ssh
- forking
- Access existing LFS object
- Create new LFS object
- Uploads - @reprazent
- Access existing upload
- Create new upload
- Artifacts - @mkozono
- Access existing artifacts
- Create new artifacts
- Basic API tests - @rymai
- Using access tokens to perform API requests:
- Generate a new API token if needed at https://staging.gitlab.com/profile/personal_access_tokens and save it at
~/.gitlab-token-qa
export PROJECT_PATH=$(date "+qa-api-tests-%Y-%m-%d-%H-%M-%S")
-
POST
:curl --request POST --header "PRIVATE-TOKEN: $(cat ~/.gitlab-token-qa)" --data "name=$PROJECT_PATH" --data "path=$PROJECT_PATH" https://staging.gitlab.com/api/v4/projects
-
POST
:curl --request POST --header "PRIVATE-TOKEN: $(cat ~/.gitlab-token-qa)" --data "branch=master" --data "content=Hello world" --data "commit_message=Add README.md" https://staging.gitlab.com/api/v4/projects/gitlab-qa%2F$PROJECT_PATH/repository/files/README%2Emd
-
GET
:curl --header "PRIVATE-TOKEN: $(cat ~/.gitlab-token-qa)" https://staging.gitlab.com/api/v4/projects/gitlab-qa%2F$PROJECT_PATH/repository/files/README%2Emd\?ref\=master
-
DELETE
:curl --request DELETE --header "PRIVATE-TOKEN: $(cat ~/.gitlab-token-qa)" --data "branch=master" --data "commit_message=Remove README.md" https://staging.gitlab.com/api/v4/projects/gitlab-qa%2F$PROJECT_PATH/repository/files/README%2Emd
-
GET
:curl --header "PRIVATE-TOKEN: $(cat ~/.gitlab-token-qa)" https://staging.gitlab.com/api/v4/projects/gitlab-qa%2F$PROJECT_PATH/repository/tree
=>[]
-
DELETE
:curl --request DELETE --header "PRIVATE-TOKEN: $(cat ~/.gitlab-token-qa)" https://staging.gitlab.com/api/v4/projects/gitlab-qa%2F$PROJECT_PATH
=>{"message":"202 Accepted"}
- Using session cookie to access API: once logged-in, visit https://staging.gitlab.com/api/v4/user
- Generate a new API token if needed at https://staging.gitlab.com/profile/personal_access_tokens and save it at
- Some basic validation for Webhooks - @fjsanpedro
- add new webhooks
- Trigger webhooks
- Issue boards - @felipe_artur
- create a list on group issue board
- move issue between lists on group issue board
- scope a group issue board
- create a list on project issue board
- move issue between lists on project issue board
- scope a project issue board
- CI/CD Workflow (Basic flow) - @bikebilly @DylanGriffith
- Coverage: project templates / GKE integration / k8s integration / ingress and prometheus integration / Auto DevOps basic flow
-
Setup
- create a new project based on the Rails template
- create a new k8s cluster on GKE using the GitLab interface
- install
helm
,ingress
,prometheus
- enable Auto DevOps and configure the domain
- At this point, a pipeline will be executed automatically.
-
Checks
- all the jobs in the pipeline succeeded
- a deployment to
production
happened - default page is visible at the given url for
production
- performances metrics are shown
- GitLab Service Desk - @stanhu (PRODUCTION ONLY)
- incoming emails
- AlertManager Alerts - @dawsmith on behalf of Production Team
- AlertManager
After the Blackout
Tests that run after the Blackout covers functionality that can be addressed after we are on GCP.
- CI/CD Workflow (Advanced flow) - @bikebilly @DylanGriffith
- Coverage: runner integration / staging deployments / incremental rollout deployments / multi-pod deployments / deploy boards / web terminals
-
Setup
- install runner on the cluster
- disable shared runners in project settings
- run a new manual pipeline adding the following variables:
-
STAGING_ENABLED
->1
-
PRODUCTION_REPLICAS
->10
-
INCREMENTAL_ROLLOUT_ENABLED
->1
- after the
staging
job ended, trigger therollout 10%
manual action - after the
rollout 10%
job ended, trigger therollout 100%
manual action - click on the web terminal button and execute
id
on the remote shell
-
Checks
- all the jobs in the pipeline succeeded
- a deployment to
staging
happened - default page is visible at the given url for
staging
- manual actions are available to do incremental rollout (10%, 25%, 50%, 100%)
- deploy boards show 1 pod for
staging
- deploy boards show 1 new pod and 9 old pods after
rollout 10%
job ended - deploy boards show 10 pods after
rollout 100%
job ended - check for the output of the
id
command in the web terminal
- Email (Production only for outgoing email)
- Does outgoing notification continue to work from new sending hosts DKIM, SPF (PRODUCTION ONLY) - @toon
- Do incoming emails update notes - @toon
- Create an issue from email @jprovaznik
- Wiki Operations - @digitalmoksha
- Access and modify an existing wiki
- Create new wiki
- Clone wiki repository
- Pages - @toon (PRODUCTION ONLY)
- Access existing page
- Create new page
- Access from Azure
- Access from GCP
- Mirror - @vsizov
- Push
- Pull
- Backups - @dawsmith on behalf of Production Team
- Redis
- Postgres
- Git Data
- Kubernetes monitoring - @bjk-gitlab
- TBD
- File locking @rdavila
- Lock/Unlock files through the UI
- Lock/Unlock files through
git lfs lock
- List locked files through
git lfs locks
- CI/CD for external repo - @jamedjo
- TBD
- CI/CD for GitHub - @jamedjo
- TBD
- Multi-project pipelines Follow up issue: #516 (closed)
- TBD
- AutoDevOps workflow Follow up issue: #516 (closed)
-
Canary deployments
needs owner
outline test areas
- TBD
Once we have owners to these points I will update the main test plan template. Which can be found here https://gitlab.com/gitlab-com/migration/blob/master/.gitlab/issue_templates/test_plan.md