Product Analytics Backend events lead to API timeouts
Summary
Rack::Timeout::RequestTimeoutException
errors were reported at Sentry, and it was counted about 314k. (https://new-sentry.gitlab.net/organizations/gitlab/issues/173/?project=3)
After the feature flag internal_events_for_product_analytics
was disabled, the endpoint started to return 204
successful responses and the errors are stopped.
From our investigation, we are suspecting that the timeout was caused by the feature flag internal_events_for_product_analytics
, that will start sending requests to the external endpoints, according to the log data.
Details
- The feature flag
internal_events_for_product_analytics
was enabled to 100% on Dec 15th, and it was disabled on Dec 21st 13:30, 2023. (feature-flag-log) - The endpoint started to return
204
response two hours after the feature flag turn off. (Log) - External HTTP duration was more than 30 seconds, and the number of external HTTP count was about 600 to 1,100 HTTP calls per single API request. Redis duration was about 3 seconds, so Redis looks no problem. (Log)
- External HTTP count is zero if the feature flag was disabled.
Looking at the p95 of external http count over 1 day broken down by status types (https://log.gprd.gitlab.net/app/r/s/0vboA), external http count went to zero after the feature flag was disabled. Do we instrument our snowplow http requests (sounds quite specific, so i'm guessing no)? edit: snowplow emitters can be instrumented with our own logger. right now we dont pass them any, so it uses the default stderr logger (unless we patched some instrumentation, wouldnt be surprised if we did)
Thanks for @schin1 for investigation and @pskorupa for disabling the feature flag!
Steps to reproduce
Example Project
What is the current bug behavior?
Currently the errors are stopped by disabling the feature flag.
What is the expected correct behavior?
The expected behavior is to return 204
response.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)