The client cluster connection is failing with an error GitLab Agent Server: Unauthorized on staging. The same setup works as expected on production. The request to the proxy is sent (for example, https://kas.staging.gitlab.com/k8s-proxy/api/v1/services), the KAS cookie and CRSF token are added to the request header together with the related agent id. The response:
All the client - cluster requests are failing with an Unauthorized error.
What is the expected correct behavior?
The client should be able to request information from the cluster.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)
(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)
(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)
(we will only investigate if the tests are passing)
This endpoint will return a "reason" in the body for the failure, which is logged as debug log from here (that is why it makes sense to enable debug logging in KAS, see #418998 (comment 1475817563)).
@anna_vovchenko I actually had a trailing / in my project path in the agent configuration - and that's why it didn't work. Correcting that it works for the project we've created yesterday on staging:
@anna_vovchenko yes, I saw that - just wanted to note that it doesn't seem to be a "general" problem. Do you mind granting me Maintainer permissions to that project?
Seems to be either a problem with your user / session (though I remember that we logged out and back in again yesterday) ... can you try clearing the browser cookies?
@timofurrer, I cleared cookies/everything, set up a proper 2FA, and still can't make it work. The odd thing I'm noticing is it seems to send 2 different _gitlab_kas cookies within each request 🤯
@anna_vovchenko I can reproduce that ... dang it. That's an issue
IIRC the cookie is sent to all sub domains - that's why it works even though the target host is at kas.gitlab.com (or kas.staging.gitlab.com) while the cookie was set from gitlab.com (or staging.gitlab.com). However, this is now an issue, because the the cookie set by gitlab.com is also sent to kas.staging.gitlab.com.
@anna_vovchenko I don't think that this is easy to solve - I need to dig into cookies again.
@hfyngvason@ash2k@shinya.maeda@ameyadarshan does any of you know a way of preventing a cookie set on gitlab.com with domain=.gitlab.com to be sent to kas.staging.gitlab.com, but allowing it to be sent to kas.gitlab.com (or basically restrict it only to direct subdomains) ?
According the the domain-matching rules that's not possible. Also SameSite configurations don't help here.
@anna_vovchenko you can unblock yourself temporarily (until the cookie is set again by gitlab.com) by deleting the domain=.gitlab.com cookie manually (under Application -> Cookies).
We want to keep using cookies because they are more secure than a secret embedded into the page. Cookie (with httponly flag) cannot be stolen with rouge JS on the page or CSRF.
I wonder what kas sees when browser sends both (?) cookies. I see the following approaches to fix this:
Encode domain in the cookie value, as mentioned above.
Somehow remove the unwanted cookie value somewhere before kas receives it. This is obviously a hack and it may not be even possible if there is no way to tell the domain of the cookie.
Actually, the browser is very likely sending only one of the cookies. They have the same name so it probably cannot send both. If it does, then there are two with the same name? That'd be weird and Go's stdlib does not handle that. We might need to use different cookie names, not different values in that case. We can make the name configurable and change it for for staging and pre, leaving the current _gitlab_kas for production.
I think the above is the best way forward. WDYT?
When someone picks this up, please add some info on this problem to the design doc. Also, please mention httpOnly flag more prominently.
I'm not sure if we should keep using cookies. This is an already problem on on-premises instances that when they have different subdomains, they can't use this feature at all.
Is there any approaches that won't get affected by domain names?
The tradeoff here is good security (cookies) but requirement for kas to be on the same domain or sub-domain vs worse security (embedded token) but ability to use any domain. There may be some alternative approach with redirects to set the cookie on the kas domain only, but I haven't looked into that personally.
I think it'd not be a good decision to make security worse for everyone because a few customers (only one so far?) wants to have kas on a different domain (why?). However, we can see if there is yet another approach to setting the cookie.
Actually, the browser is very likely sending only one of the cookies.
The browser is sending both cookies:
... I haven't checked what KAS receives, but it appears to be "random" - because currently the authentication on staging works for me, even that two cookies are sent.
The RFC only mentions that the cookie list "should" be sorted and only that the "path" may matter, but not the domain ...
That'd be weird and Go's stdlib does not handle that.
We might need to use different cookie names, not different values in that case. We can make the name configurable and change it for for staging and pre, leaving the current _gitlab_kas for production.
Yes, I think that would work, but we could also omit the configuration in KAS by just encoding the allowed origin name in the cookie name, e.g. _gitlab_kas_gitlab_com and _gitlab_kas_staging_gitlab_com (I think we could also use dots, e.g. _gitlab_kas_gitlab.com).
Another option would be to manually read the cookies from the Cookie request header, but that would require to clone & own quite some stdlib functions ...
EDIT: there is r.Cookies() which just returns a slice of cookies, so we may use that ...
I'm not sure if we should keep using cookies.
I think we should until there is a very compelling case for using KAS on a completely different domain than GitLab itself.
When someone picks this up, please add some info on this problem to the design doc. Also, please mention httpOnly flag more prominently.
There may be some alternative approach with redirects to set the cookie on the kas domain only, but I haven't looked into that personally.
I've been thinking about this, too. The "problem" here is that Rails must eventually decrypt the cookie. Which means that KAS and Rails must either share the encryption key or that Rails would redirect with an already encrypted cookie to KAS which would then set that cookie. I wonder what the security implications of that are, because we would effectively bypass the server-browser-cookie-api by sending that cookie in a non-cookie header in a redirect request to KAS.
Anyways, even if it would be okay security-wise it seems like a much more complex solution than just encoding the domain the cookie name or even value.
Using request.Cookies() in the interim seems like a good first step, as long as the performance is acceptable. We should pick the cookie with
the most specific matching domain,
and the least specific matching path on that domain
in this case, we may want to actually return a Bad Request. It does not seem like a valid configuration.
Making the cookie name customizable might also be useful, as there can conceivably be multiple instances running on the same domain under different ports. For example, multiple GDKs.
Just re-read this thread. I think the root cause of this issue is that environments are hosted on overlapping DNS zones. staging.gitlab.com is staging but is under the production zone of gitlab.com. This is not great. We also have pre.gitlab.com that has the same problem. The layout should be similar to this: gitlab.staging.development-of-gitlab.com and gitlab.pre.development-of-gitlab.com.
I'm not sure we should build any complex workarounds for this bad layout. Don't do it, it doesn't work
Seriously though, we could make the cookie name configurable, but I doubt this work would be prioiritized. We should probably document that when installing a self-managed GitLab, users shouldn't use overlapping DNS zones for different environments to avoid cookie clashes.
I contributed the request.CookiesNamed a while back and it landed in 1.23 (see here). We can use it to get the correct cookie without mangling with the cookie names but only look at values.
And yes, environments are hosted on overlapping DNS zones. that is the "root cause".
FYI: I've created a Go API proposal for a new ReadCookies(name) method on http.Request to be able to conveniently retrieve a slice of cookies with the given name.