Unknown user callouts lead to a 400 error, breaking mixed-code environments
Summary
Twice now, when introducing a new user callout, we've experienced problems on GitLab.com. We load a frontend from canary that includes the new user callout (in this case "our privacy policy has changed"), but when we dismiss the callout, the POST /-/user_callouts
sometimes goes to a non-canary host and is rejected with a 400 error as a consequence.
Steps to reproduce
- Get a mixed code deployment like GitLab.com (canary, gprd)
- Get a new user callout served from canary
- Dismiss the callout and have that request go to not-canary
What is the current bug behavior?
The old code doesn't know about the user callout being dismissed, so the request is rejected with a 400 response
What is the expected correct behavior?
I think it's more reasonable for us to just accept the POST and set the unknown value in the database. What does the validation get us that's worth suffering this pain for?
Output of checks
This bug happens on GitLab.com
Possible fixes
The error occurs here: https://gitlab.com/gitlab-org/gitlab/blob/master/app/controllers/user_callouts_controller.rb#L20
Things are complicated slightly because we currently send feature name in the http request, but need an id (technically an enum) to persist in the database. So we'd need to change the callouts so we send the id, or both id and name, to make this work.
An alternative approach might be to ensure that either all or no traffic goes to canary for a given user (sticky sessions at the load balancer?) - but it's difficult to ensure all our users set up like this, and it would be better to be resilient to the issue instead.