404 errors on application start
In incident gitlab-com/gl-infra/production#8329 (closed) we saw a situation where GitLab Pages was serving 404 errors for approximately 40 minutes for the docs.gitlab.com site.
We have done an initial investigation into this and so far have not been able to identify a root cause.
What we know
- The
404
errors started happening after cycling pages pods as part of a normal deployment https://log.gprd.gitlab.net/goto/67904e40-a166-11ed-9f43-e3784d7fe3ca (there were no code changes). - We haven't seen this before (previous deployments, for example), and we only saw this for docs.gitlab.com which is very strange.
- We don't see any other log messages accompanying the 404 errors, I see there are not that many places where we issue a
httperrors.Serve404
in the code so maybe the pages team could help us narrow this down (or we can add more logging for the next time this happens) - I don't see any errors returned from
/api/v4/internal/pages
for docs.gitlab.com, so I think we can rule out the internal api - All errors were generated from new pods as they were coming up as shown here https://log.gprd.gitlab.net/goto/d19d0c60-a166-11ed-9f43-e3784d7fe3ca and here gitlab-com/gl-infra/production#8329 (comment 1259019692). The errors stopped suddenly after 40 minutes and we don't know why.
- I thought initially that this might be related to a pages deploy pipeline, but we don't see any correlation between that and the start of this incident.
- We don't believe this was due to anyone going in and making a Pages settings update for the project, based on an audit of access logs
- There was no DNS update or change that caused the incident, while we had the issue
docs.gitlab.com
was giving us a dns A record of35.185.44.232
Questions
- Is it possible that Pages would send
404
errors on startup for a single domain? - Is there something else we should be looking for in the logs to understand the reason for the
404
errors better?
Edited by John Jarvis