A user reported to us that if you go to the GitLab.com page but don't login and then try to log in a couple of hours later the user receives an 422 error message and if they press "Go back" it just ends on infinite redirect loop of "Confirm form submission" The user state that this could be improved so that GitLab will have a better user experience.
Intended users
Anyone that uses GitLab.com
User experience goal
Redirect to the login page
Proposal
Instead of giving a 422 error just redirect back to the login page with a banner that says something like "You had been redirected to the login screen because your cookie has expired"
Thanks for the ping @dmoraBerlin. I'm not really clear on what's happening and how it was attributed to expiring cookies. Regardless, I definitely would not say this is an expected or desired outcome for users. I'll change it to a bug.
@jcolyer I see you commented on the ticket about the expiring cookies. Can you share more details about how you came to the conclusion cookies is the issue?
@mmora In the ticket it looks like you were able to reproduce the issue. On GitLab.com or another instance/GDK? Can you please add detailed reproduction steps to the description to help us further investigate?
Oh, I hit that problem from time to time when using Brave and I forget to login after a lengthy bit of time. I've also seen this on other apps that do browser checking, esp when said browser checking is done via CloudFlare. I'm not even 100% sure this is a "GitLab" problem per se. This could be the browser expiring it on my end when I did my testing to confirm he was seeing the same thing I was.
I'm not 100% sure this is "GitLab" doing it per se. I can tell you if I use TOR, I have to solve a captcha, which I normally see on CloudFlare sites. Admittedly, I'm game to help test if needed, but I might need guidance on what you'd want to help pinpoint where the issue is coming from. (and again, I tested via Brave, a Chromium clone, not pure Chrome).
I'll see if I can replicate once more on gitlab.com. I got past the browser check, so I'll wait to try to login for let's say 4 hours. Initial thing to report is there are 2 cookies currently:
These cookies are used by Cloudflare for the execution of Javascript or Captcha challenges. They are not used for tracking or beyond the scope of the challenge. They can be deleted if seen.
cf-chl-XXXX
This cookie is used to check whether the Cloudflare Edge server supports cookies. It can be deleted if seen.
@dblessing 100% replicated. After the browser check, I let the browser sit for 4 hours. Then tried to login. It tried to do another browser check and then produced a 422 error. Username I used was reyloc.
I did notice a super quick 503 error before it tried the browser check. Then it directed me to a 422 page at the this URL
I noticed the cookies are different, but that might be the issue:
POST https://gitlab.com/users/sign_in?__cf_chl_jschl_tk__=89da3af5de90265a08e0d8e922df9ee20ba0f299-1612988920-0-AZSLcklj240LrOkSDQaZF3lRRx_o01nFnmK1HNzkBgcXRi5Dfe_UDJf2Me8z6vRtLck9eLAlKEohaKGJtHatvdlCT5zCxn2l5p5KiTi3WdcXPsOw-lWKfksn0AzemUIY0BAlFeGcTJvjw8gD1baIvG9SBuNbcwjHjNil5SkSbtJp6jHBRONwrtvSX1bUO1XUEi-ej87aA3iKtzUm9kwBdoDO6ZXlKsSKKj9mJBoSkM0EFn00f80Ym3NeTkpBajGUxxBuXlGTZeaZ9TuXapT0V6sun45eTpOk-kg4IpP_CDM3Br36Vq-vFpfrocP4mHWLJm-ZGfYa-NPCiZKEU7JwZnE7GI7SNpXkcMq9F8qVSn6NtRxkYm3rteI9h8rwvWslMw 422document.cookie"cf_chl_2=a8fc5226d93c57d; cf_chl_prog=x15"
@dblessing I'm not familiar with Cloudflare's cookies. Maybe @T4cC0re has some insights here?
A user reported to us that if you go to the GitLab.com page but don't login and then try to log in a couple of hours later the user receives an 422 error message and if they press "Go back" it just ends on infinite redirect loop of "Confirm form submission" The user state that this could be improved so that GitLab will have a better user experience.
I think the 422 issue seems to do with a stale CSRF token. Maybe we should just re-render the sign-in page with an error instead of a 422? Why do we think the Cloudflare tokens may be causing this issue?
Why do we think the Cloudflare tokens may be causing this issue?
I guess we don't, necessarily. It's one noticeable difference on GitLab.com. However, the issue seems to be related to the POST to sign_in. In the absence of CF, this request wouldn't be a POST but a GET. Something about that POST is causing the 422.
One thing that we have in place on the sign_in page is a Cloudflare Challenge. So if the page is loaded, and left stale for a while, and after the challenge is passed the login form is submitted, this might cause an issue. Going back might also cause this challenge to fail, however, that would be displayed as a 503 from Cloudflare.
But the screenshot looks like a GitLab one, not a stylized Cloudflare Error page, so I am not sure what is happening here.
@dblessing Sorry, I did not see this earlier.
We never see the challenge, this is all wrapped away in Cloudflare. If we suspect an issue here, I would suggest reaching out to Cloudflare directly.
It might still be related but something within GitLab is happening here. Otherwise we'd get a Cloudflare error page, as you mentioned. I think we need to keep digging but I'm not sure where to go.
This ~"group::authentication and authorization" bug has at most 25% of the SLO duration remaining and is ~"approaching-SLO" breach. Please consider taking action before this becomes a ~"missed-SLO" in 7 days (2022-01-20).
@hsutor@dennis@lmcandrew This S1 bug is a year and a half old compared to our S1 SLO of 30 days. Gently requesting prioritization into %14.8 if possible, or review of the severity to see if it can be reduced. Thank you kindly!
At first glance this doesn't seem like a priority1 nor severity1. There isn't justification as to why this was given a severity1 label in the first place.
I think we can safely downgrade this a couple of levels but I'll leave it to @hsutor or @lmcandrew to adjust accordingly.
@dennis@tpazitny Thank you both- I've downgraded this since it doesn't seem to have a whole lot of traction. It also looks like we explored this in detail 10 months ago and couldn't find exactly what's wrong/how to fix it, making this a tough sell from a Product perspective when comparing LoE required and severity of the issue.
Issue is still reproducible on Gitlab 17.6, after waiting 4 hours.
In reference to this comment, Cloudfare now uses a cf_clearance token to store proof of successfully passed challenges so it is probably a Gitlab issue.