The issue is to investigate these transient issues. Additionally we can explore adding a deployment troubleshooting docs to Staging Ref and set up a process for Delivery engineers to create issues if deployment fails.
Asked Authentication & Authorization group for help with intermittent 403 Forbidden - Your account has been blocked. errors in Slack (internal only). By looking at the docs and GitLab code it's not clear why would a user be blocked and then on the next retry the same requests pass without any unblocking actions from our side.
Also noticed that GitLab QA had a lot of similar errors in #351617 (closed) today, wondering if it's some application issue
@rspeicher thanks for the suggestion! Could you please clarify what is the way to get these response headers? Currently Ansible provides response headers for these failed requests (like with x-request-id) but it doesn't have ratelimits information. Based on docs rate limit data should be returned automatically?
TASK [post_configure : Get License Plan] ***************************************fatal: [localhost]: FAILED! => changed=false attempts: 3 cache_control: no-cache connection: close content: '{"message":"403 Forbidden - Your account has been blocked."}' content_length: '60' content_type: application/json date: Tue, 01 Feb 2022 01:48:17 GMT elapsed: 0 json: message: 403 Forbidden - Your account has been blocked. msg: 'Status code was 403 and not [200]: HTTP Error 403: Forbidden' redirected: false referrer_policy: strict-origin-when-cross-origin status: 403 strict_transport_security: max-age=63072000 url: https://staging-ref.gitlab.com/api/v4/license vary: Origin x_content_type_options: nosniff x_frame_options: SAMEORIGIN x_request_id: 01FTSFFRRX4Z385WD5QDNQY415 x_runtime: '0.019757'
Have you seen 403 blocked user error when there is a rate limit before? I thought it should return 429 error when there is a limit, at least we saw a lot of these errors before in QA pipelines
It's possible staging-ref doesn't even have those API limits enabled and this is a pointless investigation.
Have you seen 403 blocked user error when there is a rate limit before? I thought it should return 429 error when there is a limit, at least we saw a lot of these errors before in QA pipelines
That's interesting to see, thanks for this link! It says that it only applies to git and container registry, so it's not clear why would license API fail indeed.
Over the quarter we've made several improvements to increase Staging Ref deployment stability. With today's 2.2.0 GET release deployment should become even more stable now that post configure tasks are done via Rails console. Additionally Staging Ref helped to catch several issues in GET, one of the recent examples is Deployment fails due to NFS mount issue on Redis Cache so it's great that by dogfooding dev GET image we increase coverage for the Toolkit as well.
Will continue to review pipeline stability and work on resolving of any issues in Staging Ref issue tracker.