2024-02-21: elevated error rates for gprd-cny
Customer Impact
Current Status
More information will be added as we investigate the issue. For customers believed to be affected by this incident, please subscribe to this issue or monitor our status page for further updates.

References and helpful links

Recent Events (available internally only):
- Feature Flag Log - Chatops to toggle Feature Flags Documentation
- Infrastructure Configurations
- GCP Events (e.g. host failure)
Deployment Guidance
- Deployments Log | Gitlab.com Latest Updates
- Reach out to Release Managers for S1/S2 incidents to discuss Rollbacks, Hot Patching or speeding up deployments. | Rollback Runbook | Hot Patch Runbook
Use the following links to create related issues to this incident if additional work needs to be completed after it is resolved:
- Corrective action ❙ Infradev
- Incident Review ❙ Infra investigation followup
- Confidential Support contact ❙ QA investigation
Note: In some cases we need to redact information from public view. We only do this in a limited number of documented cases. This might include the summary, timeline or any other bits of information, laid out in our handbook page. Any of this confidential data will be in a linked issue, only visible internally. By default, all information we can share, will be public, in accordance to our transparency value.
Security Note: If anything abnormal is found during the course of your investigation, please do not hesitate to contact security.
No timeline items have been added yet.
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- ops-gitlab-net added an incident timeline event
added an incident timeline event
- ops-gitlab-net assigned to @ayeung
assigned to @ayeung
- ops-gitlab-net changed the description
Compare with previous version changed the description
- ops-gitlab-net changed the severity to Medium - S3
changed the severity to Medium - S3
- ops-gitlab-net added a resource link
added a resource link
- Owner
@schin1 reported elevated error rates for
gprd-cny
in Slack: https://gitlab.slack.com/archives/C8PKBH3M5/p1708474749851039?thread_ts=1708470230.490199&cid=C8PKBH3M5 Info so far:
- deployment of
16.10.202402202010-9c91ec7b214.407c35432a8
in gprd-cny started the climb in error rate on gprd-cny - web service error rates increasing steadily from that deployment on cny (dashboard)
- sentry shows Encoding::CompatibilityError (https://new-sentry.gitlab.net/organizations/gitlab/releases/9c91ec7b214/?project=3) for that release, affecting a bunch of controllers
- logs shows increase in 5xx too, especially for Projects::RawController#show
- deployment of
Collapse replies I haven't been able to reproduce this with an Omnibus install and a Cloud Native GitLab with that version.
I even issued a
curl -H "Cookie: _gitlab_session=<my cookie>" http://localhost:8080/-/asdf
on acny
pod runninga779caa4045
, and the 404 returned fine.I'm wondering if only spiders are able to trigger this with some input.
1I noticed that Rails was loading the
errors.css
as ASCII-8BIT:[ gprd ] production> Rails.application.assets_manifest.find_sources('errors.css').first.to_s.html_safe.encoding => #<Encoding:ASCII-8BIT>
I copied
good.css
from a v16.8.0errors*.css
, and thebad.css
fromregistry.gitlab.com/gitlab-org/security/charts/components/images/gitlab-webservice-ee:16-10-202402202010-9c91ec7b214
:docker pull registry.gitlab.com/gitlab-org/security/charts/components/images/gitlab-webservice-ee:16-10-202402202010-9c91ec7b214 docker run --name bad -it registry.gitlab.com/gitlab-org/security/charts/components/images/gitlab-webservice-ee:16-10-202402202010-9c91ec7b214 bash
In another window:
docker cp bad:/srv/gitlab/public/assets/errors-03910abb66ddd056528b660304c864789459749f77453589d51036fe83101f91.css /tmp/bad.css
I noticed that
bad.css
contains a UTF-8 character, unlike the previousgood.css
:$ file bad.css bad.css: Unicode text, UTF-8 text, with very long lines (55179) $ file good.css good.css: ASCII text, with very long lines (55432)
Where is this UTF-8 character?
$ fold -w 80 bad.css > bad-formatted.css $ grep -P "[^\x00-\x7F]" bad-formatted.css before{content:"·";display:inline-block;padding:0 1em}}.tanuki-logo{width:210px
Notice the presence of the
·
character.I could see an issue if Rails is loading the
errors.css
incorrectly as ASCII-8BIT, and then tries to add UTF-8 data with it.On the Omnibus instance with this version:
irb(main):001:0> Rails.application.assets_manifest.class => Sprockets::Manifest irb(main):002:0> error = Rails.application.assets_manifest.find_sources('errors.css').first.to_s.html_safe => "*,*::before,*::after{box-sizing:border-box}html{font-family:sans-serif;line-height:1.15;-webkit-text-size-adjust:100%;-webkit-tap-hig... irb(main):003:0> "#{error} 中國" (irb):3:in `<main>': incompatible character encodings: ASCII-8BIT and UTF-8 (Encoding::CompatibilityError)
Sprockets::Manifest
loads the source viaFile.binread
in https://github.com/rails/sprockets/blob/d1dcf7075c468522e1cb6f93ae547d8d7fdcfbcb/lib/sprockets/asset.rb#L99, so the CSS will always be loaded as ASCII-8BIT. When ASCII-8BIT combines with UTF-8 characters,- Developer
I have also managed to reproduce it locally using GDK:
- Add
config.assets.compile = false
toconfig/environments/development.rb
- Precompile the assets by running
bin/rake gitlab:assets:compile
- Visit a page that 404s (e.g. http://localhost:3000/gitlab-org/gitlab/-/issues/undefined)
The steps are required because Sprockets does not call
File.binread
if compilation is enabled. - Add
- Owner
Draining canary now.
Collapse replies - Owner
cny is drained: https://gitlab.slack.com/archives/C101F3796/p1708475535443309
- Resolved by Malcolm Locke
Full stack trace from correlation_id : 01HQ4H0FD1CHJ2GVF76V4KCVBZ:
json.exception.cause_class: ExtractsRef::RefExtractor::InvalidPathError json.exception.class: Encoding::CompatibilityError
app/views/layouts/errors.html.haml:11 app/controllers/application_controller.rb:132:in `render' app/controllers/application_controller.rb:248:in `block (2 levels) in render_404' app/controllers/application_controller.rb:247:in `render_404' lib/extracts_path.rb:48:in `rescue in assign_ref_vars' lib/extracts_path.rb:36:in `assign_ref_vars' ee/lib/gitlab/ip_address_state.rb:10:in `with' ee/app/controllers/ee/application_controller.rb:45:in `set_current_ip_address' app/controllers/application_controller.rb:468:in `set_current_admin' lib/gitlab/session.rb:11:in `with_session' app/controllers/application_controller.rb:459:in `set_session_storage' lib/gitlab/i18n.rb:114:in `with_locale' app/controllers/application_controller.rb:452:in `set_locale' app/controllers/application_controller.rb:443:in `set_current_context' ee/lib/omni_auth/strategies/group_saml.rb:41:in `other_phase' lib/gitlab/metrics/elasticsearch_rack_middleware.rb:16:in `call' lib/gitlab/middleware/memory_report.rb:13:in `call' lib/gitlab/middleware/speedscope.rb:13:in `call' lib/gitlab/database/load_balancing/rack_middleware.rb:23:in `call' lib/gitlab/middleware/rails_queue_duration.rb:33:in `call' lib/gitlab/etag_caching/middleware.rb:21:in `call' lib/gitlab/metrics/rack_middleware.rb:16:in `block in call' lib/gitlab/metrics/web_transaction.rb:46:in `run' lib/gitlab/metrics/rack_middleware.rb:16:in `call' lib/gitlab/middleware/go.rb:20:in `call' lib/gitlab/middleware/query_analyzer.rb:11:in `block in call' lib/gitlab/database/query_analyzer.rb:40:in `within' lib/gitlab/middleware/query_analyzer.rb:11:in `call' lib/gitlab/middleware/organizations/current.rb:24:in `call' lib/gitlab/middleware/multipart.rb:173:in `call' lib/gitlab/middleware/read_only/controller.rb:50:in `call' lib/gitlab/middleware/read_only.rb:18:in `call' lib/gitlab/middleware/unauthenticated_session_expiry.rb:18:in `call' lib/gitlab/middleware/same_site_cookies.rb:27:in `call' lib/gitlab/middleware/path_traversal_check.rb:35:in `call' lib/gitlab/middleware/handle_malformed_strings.rb:21:in `call' lib/gitlab/middleware/basic_health_check.rb:25:in `call' lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call' lib/gitlab/middleware/request_context.rb:15:in `call' lib/gitlab/middleware/webhook_recursion_detection.rb:15:in `call' config/initializers/fix_local_cache_middleware.rb:11:in `call' lib/gitlab/middleware/compressed_json.rb:44:in `call' lib/gitlab/middleware/rack_multipart_tempfile_factory.rb:19:in `call' lib/gitlab/middleware/sidekiq_web_static.rb:20:in `call' lib/gitlab/metrics/requests_rack_middleware.rb:79:in `call' lib/gitlab/middleware/release_env.rb:13:in `call'
Edited by Malcolm Locke 3 replies Last reply by Malcolm Locke
- 🤖 GitLab Bot 🤖 added RootCauseNeeded label
added RootCauseNeeded label
- 🤖 GitLab Bot 🤖 added ServiceNeeded label
added ServiceNeeded label
- Sylvester Chin added ServiceWeb label and removed ServiceNeeded label
added ServiceWeb label and removed ServiceNeeded label
- Thong Kuah mentioned in merge request gitlab-org/gitlab!145355 (closed)
mentioned in merge request gitlab-org/gitlab!145355 (closed)
- Stan Hu mentioned in merge request gitlab-org/gitlab!145363 (merged)
mentioned in merge request gitlab-org/gitlab!145363 (merged)
- Stan Hu mentioned in commit gitlab-org/gitlab@b936a1d3
mentioned in commit gitlab-org/gitlab@b936a1d3
- Developer
gitlab-org/gitlab!145363 (merged) should fix this problem
Collapse replies - Developer
corrective action: gitlab-org/gitlab!145386 (merged) adds a spec to prevent future encoding failures.
1
- ops-gitlab-net mentioned in issue on-call-handovers#4697 (closed)
mentioned in issue on-call-handovers#4697 (closed)
- Jay McCure mentioned in issue gitlab-org/quality/pipeline-triage#240 (closed)
mentioned in issue gitlab-org/quality/pipeline-triage#240 (closed)
- Rehab mentioned in incident #17622 (closed)
mentioned in incident #17622 (closed)
- Peter Leitzen mentioned in merge request gitlab-org/gitlab!145386 (merged)
mentioned in merge request gitlab-org/gitlab!145386 (merged)
- John Skarbek mentioned in issue gitlab-org/release/tasks#8747 (closed)
mentioned in issue gitlab-org/release/tasks#8747 (closed)
- John Skarbek mentioned in issue gitlab-org/release/tasks#8750 (closed)
mentioned in issue gitlab-org/release/tasks#8750 (closed)
- ops-gitlab-net mentioned in issue on-call-handovers#4698 (closed)
mentioned in issue on-call-handovers#4698 (closed)
- Dat Tang added Deploys-blocked-gprd7hr Deploys-blocked-gstg7hr labels
added Deploys-blocked-gprd7hr Deploys-blocked-gstg7hr labels
- Dat Tang added release-blocker label
added release-blocker label
- Dat Tang added RootCauseSoftware-Change label and removed RootCauseNeeded label
added RootCauseSoftware-Change label and removed RootCauseNeeded label
- John Skarbek mentioned in issue gitlab-org/release/tasks#8764 (closed)
mentioned in issue gitlab-org/release/tasks#8764 (closed)
- Owner
The MR we wanted is deployed to Production and validated to be working. We can resolve this incident. Corrective Actions are already noted.
- John Skarbek closed
closed
- John Skarbek changed the incident status to Resolved by closing the incident
changed the incident status to Resolved by closing the incident
- John Skarbek added IncidentResolved label and removed IncidentMitigated label
added IncidentResolved label and removed IncidentMitigated label
- John Skarbek changed the incident status to Resolved
changed the incident status to Resolved
- Chloe Liu mentioned in issue gitlab-org/gitlab#442619 (closed)
mentioned in issue gitlab-org/gitlab#442619 (closed)
- Chloe Liu mentioned in issue gitlab-org/gitlab#442611 (closed)
mentioned in issue gitlab-org/gitlab#442611 (closed)
- Paul Gascou-Vaillancourt mentioned in merge request gitlab-org/gitlab-ui!3987 (merged)
mentioned in merge request gitlab-org/gitlab-ui!3987 (merged)