Job with partial traces
Overview
Some jobs seem to have been set to Success
and end up having partial traces, for example:
- https://gitlab.com/gitlab-org/gitlabktl/-/jobs/243077107
- https://gitlab.com/axil/playground/-/jobs/243105572
- https://gitlab.com/steveazz/playground/-/jobs/238951924
For example:
partial trace
Taken from https://gitlab.com/gitlab-org/gitlabktl/-/jobs/243077107/raw
[0KRunning with gitlab-runner 12.0.0-rc1 (58d8360f)
[0;m[0K on prm-com-gitlab-org ae3bfce2
[0;msection_start:1561984106:prepare_executor
[0K[0KUsing Docker executor with image ruby:2.6 ...
[0;m[0KPulling docker image ruby:2.6 ...
[0;m[0KUsing docker image sha256:f1c13927d193a35037d7776a0aeae96c30ba7aae5a22f7f1d424b992da1a0b00 for ruby:2.6 ...
[0;msection_end:1561984109:prepare_executor
[0Ksection_start:1561984109:prepare_script
[0KRunning on runner-ae3bfce2-project-11080193-concurrent-0 via runner-ae3bfce2-prm-1561977982-46bc9386...
section_end:1561984111:prepare_script
[0Ksection_start:1561984111:get_sources
[0K[32;1mFetching changes...[0;m
Reinitialized existing Git repository in /builds/gitlab-org/gitlabktl/.git/
[32;1mChecking out 3b134af3 as feature/gb/serverless-application-deploy...[0;m
Removing specs/fixtures/app/uuid.txt
[32;1mSkipping Git submodules setup[0;m
section_end:1561984129:get_sources
[0Ksection_start:1561984129:restore_cache
[0Ksection_end:1561984131:restore_cache
[0Ksection_start:1561984131:download_artifacts
[0Ksection_end:1561984132:download_artifacts
[0Ksection_start:1561984132:build_script
[0K[32;1m$ cd specs/[0;m
[32;1m$ bundle install[0;m
Fetching gem metadata from https://rubygems.org/... <------------------ There should be more here
Investigation
All these jobs have happened on GitLab.com using the shared Runner fleet, and there are no similar reports from self-hosted users.
It seems like when GitLab Runner tries to send a request to /jobs/:id/trace
it gets a 403 Forbidden error, so the trace is never updated.
Below are all the requests to trace sent to rails from GitLab Runner
https://gitlab.com/gitlab-org/gitlabktl/-/jobs/243077107
{
"_index": "pubsub-nginx-inf-gprd-2019.07.01",
"_type": "doc",
"_id": "AWutg-LPqojRxKGhUnc-",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2019-07-01T12:30:29.745Z",
"publish_time": "2019-07-01T12:30:29.715Z",
"message": "{\"remote\":\"10.216.1.20\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/243077107/trace\",\"code\":\"403\",\"size\":\"49\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-18-sv-gprd\",\"fqdn\":\"api-18-sv-gprd.c.gitlab-production.internal\"}",
"json": {
"method": "PATCH",
"hostname": "api-18-sv-gprd",
"host": "-",
"size": "49",
"referer": "",
"tag": "nginx.access",
"fqdn": "api-18-sv-gprd.c.gitlab-production.internal",
"user": "-",
"path": "/api/v4/jobs/243077107/trace",
"environment": "gprd",
"remote": "10.216.1.20",
"code": "403",
"agent": "gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)"
},
"type": "pubsub-nginx-inf-gprd",
"beat": {
"name": "pubsub-nginx-inf-gprd",
"hostname": "pubsub-nginx-inf-gprd",
"version": "6.2.2"
},
"message_id": "655603644950789"
},
"fields": {
"@timestamp": [
1561984229745
],
"publish_time": [
1561984229715
]
},
"highlight": {
"json.path": [
"/api/v4/jobs/@kibana-highlighted-field@243077107@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@"
],
"message": [
"{\"remote\":\"10.216.1.20\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/@kibana-highlighted-field@243077107@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@\",\"code\":\"403\",\"size\":\"49\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-18-sv-gprd\",\"fqdn\":\"api-18-sv-gprd.c.gitlab-production.internal\"}"
]
},
"sort": [
1561984229745
]
}
{
"_index": "pubsub-nginx-inf-gprd-2019.07.01",
"_type": "doc",
"_id": "AWutgx0GhKvGNclJBCo4",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2019-07-01T12:29:38.805Z",
"message_id": "655619063491817",
"publish_time": "2019-07-01T12:29:38.782Z",
"message": "{\"remote\":\"10.216.1.20\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/243077107/trace\",\"code\":\"202\",\"size\":\"8\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-10-sv-gprd\",\"fqdn\":\"api-10-sv-gprd.c.gitlab-production.internal\"}",
"beat": {
"name": "pubsub-nginx-inf-gprd",
"hostname": "pubsub-nginx-inf-gprd",
"version": "6.2.2"
},
"json": {
"remote": "10.216.1.20",
"code": "202",
"tag": "nginx.access",
"environment": "gprd",
"fqdn": "api-10-sv-gprd.c.gitlab-production.internal",
"host": "-",
"referer": "",
"hostname": "api-10-sv-gprd",
"path": "/api/v4/jobs/243077107/trace",
"agent": "gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)",
"method": "PATCH",
"size": "8",
"user": "-"
},
"type": "pubsub-nginx-inf-gprd"
},
"fields": {
"@timestamp": [
1561984178805
],
"publish_time": [
1561984178782
]
},
"highlight": {
"json.path": [
"/api/v4/jobs/@kibana-highlighted-field@243077107@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@"
],
"message": [
"{\"remote\":\"10.216.1.20\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/@kibana-highlighted-field@243077107@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@\",\"code\":\"202\",\"size\":\"8\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-10-sv-gprd\",\"fqdn\":\"api-10-sv-gprd.c.gitlab-production.internal\"}"
]
},
"sort": [
1561984178805
]
}
{
"_index": "pubsub-nginx-inf-gprd-2019.07.01",
"_type": "doc",
"_id": "AWutgvcrkpt83rwzO1V2",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2019-07-01T12:29:29.391Z",
"json": {
"method": "PATCH",
"host": "-",
"path": "/api/v4/jobs/243077107/trace",
"code": "202",
"tag": "nginx.access",
"fqdn": "api-23-sv-gprd.c.gitlab-production.internal",
"remote": "10.216.1.25",
"environment": "gprd",
"hostname": "api-23-sv-gprd",
"user": "-",
"size": "7",
"referer": "",
"agent": "gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)"
},
"type": "pubsub-nginx-inf-gprd",
"beat": {
"name": "pubsub-nginx-inf-gprd",
"hostname": "pubsub-nginx-inf-gprd",
"version": "6.2.2"
},
"message_id": "655619146983430",
"publish_time": "2019-07-01T12:29:29.371Z",
"message": "{\"remote\":\"10.216.1.25\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/243077107/trace\",\"code\":\"202\",\"size\":\"7\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-23-sv-gprd\",\"fqdn\":\"api-23-sv-gprd.c.gitlab-production.internal\"}"
},
"fields": {
"@timestamp": [
1561984169391
],
"publish_time": [
1561984169371
]
},
"highlight": {
"json.path": [
"/api/v4/jobs/@kibana-highlighted-field@243077107@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@"
],
"message": [
"{\"remote\":\"10.216.1.25\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/@kibana-highlighted-field@243077107@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@\",\"code\":\"202\",\"size\":\"7\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-23-sv-gprd\",\"fqdn\":\"api-23-sv-gprd.c.gitlab-production.internal\"}"
]
},
"sort": [
1561984169391
]
}
{
"_index": "pubsub-nginx-inf-gprd-2019.07.01",
"_type": "doc",
"_id": "AWutgs2En8X3P-EKwoWw",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2019-07-01T12:29:18.723Z",
"type": "pubsub-nginx-inf-gprd",
"message_id": "655619288448242",
"publish_time": "2019-07-01T12:29:18.692Z",
"message": "{\"remote\":\"10.216.1.20\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/243077107/trace\",\"code\":\"202\",\"size\":\"7\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-08-sv-gprd\",\"fqdn\":\"api-08-sv-gprd.c.gitlab-production.internal\"}",
"json": {
"size": "7",
"environment": "gprd",
"hostname": "api-08-sv-gprd",
"fqdn": "api-08-sv-gprd.c.gitlab-production.internal",
"host": "-",
"referer": "",
"path": "/api/v4/jobs/243077107/trace",
"agent": "gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)",
"tag": "nginx.access",
"remote": "10.216.1.20",
"user": "-",
"method": "PATCH",
"code": "202"
},
"beat": {
"name": "pubsub-nginx-inf-gprd",
"hostname": "pubsub-nginx-inf-gprd",
"version": "6.2.2"
}
},
"fields": {
"@timestamp": [
1561984158723
],
"publish_time": [
1561984158692
]
},
"highlight": {
"json.path": [
"/api/v4/jobs/@kibana-highlighted-field@243077107@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@"
],
"message": [
"{\"remote\":\"10.216.1.20\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/@kibana-highlighted-field@243077107@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@\",\"code\":\"202\",\"size\":\"7\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0-rc1 (; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-08-sv-gprd\",\"fqdn\":\"api-08-sv-gprd.c.gitlab-production.internal\"}"
]
},
"sort": [
1561984158723
]
}
https://gitlab.com/axil/playground/-/jobs/243105572
{
"_index": "pubsub-nginx-inf-gprd-2019.07.01",
"_type": "doc",
"_id": "AWutiXYQN8vxRlUBPaQW",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2019-07-01T12:36:35.012Z",
"message_id": "655603512201180",
"publish_time": "2019-07-01T12:36:34.980Z",
"message": "{\"remote\":\"10.216.1.32\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/243105572/trace\",\"code\":\"403\",\"size\":\"49\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-17-sv-gprd\",\"fqdn\":\"api-17-sv-gprd.c.gitlab-production.internal\"}",
"json": {
"hostname": "api-17-sv-gprd",
"fqdn": "api-17-sv-gprd.c.gitlab-production.internal",
"path": "/api/v4/jobs/243105572/trace",
"tag": "nginx.access",
"size": "49",
"user": "-",
"method": "PATCH",
"agent": "gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)",
"remote": "10.216.1.32",
"code": "403",
"referer": "",
"environment": "gprd",
"host": "-"
},
"beat": {
"name": "pubsub-nginx-inf-gprd",
"hostname": "pubsub-nginx-inf-gprd",
"version": "6.2.2"
},
"type": "pubsub-nginx-inf-gprd"
},
"fields": {
"@timestamp": [
1561984595012
],
"publish_time": [
1561984594980
]
},
"highlight": {
"json.path": [
"/api/v4/jobs/@kibana-highlighted-field@243105572@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@"
],
"message": [
"{\"remote\":\"10.216.1.32\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/@kibana-highlighted-field@243105572@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@\",\"code\":\"403\",\"size\":\"49\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-17-sv-gprd\",\"fqdn\":\"api-17-sv-gprd.c.gitlab-production.internal\"}"
]
},
"sort": [
1561984595012
]
}
{
"_index": "pubsub-nginx-inf-gprd-2019.07.01",
"_type": "doc",
"_id": "AWutiREJn8X3P-EKytdz",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2019-07-01T12:36:09.128Z",
"beat": {
"name": "pubsub-nginx-inf-gprd",
"hostname": "pubsub-nginx-inf-gprd",
"version": "6.2.2"
},
"type": "pubsub-nginx-inf-gprd",
"message_id": "655620085423712",
"publish_time": "2019-07-01T12:36:09.102Z",
"message": "{\"remote\":\"10.216.1.32\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/243105572/trace\",\"code\":\"202\",\"size\":\"7\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-16-sv-gprd\",\"fqdn\":\"api-16-sv-gprd.c.gitlab-production.internal\"}",
"json": {
"user": "-",
"code": "202",
"tag": "nginx.access",
"hostname": "api-16-sv-gprd",
"path": "/api/v4/jobs/243105572/trace",
"referer": "",
"fqdn": "api-16-sv-gprd.c.gitlab-production.internal",
"environment": "gprd",
"remote": "10.216.1.32",
"host": "-",
"method": "PATCH",
"size": "7",
"agent": "gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)"
}
},
"fields": {
"@timestamp": [
1561984569128
],
"publish_time": [
1561984569102
]
},
"highlight": {
"json.path": [
"/api/v4/jobs/@kibana-highlighted-field@243105572@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@"
],
"message": [
"{\"remote\":\"10.216.1.32\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/@kibana-highlighted-field@243105572@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@\",\"code\":\"202\",\"size\":\"7\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-16-sv-gprd\",\"fqdn\":\"api-16-sv-gprd.c.gitlab-production.internal\"}"
]
},
"sort": [
1561984569128
]
}
{
"_index": "pubsub-nginx-inf-gprd-2019.07.01",
"_type": "doc",
"_id": "AWutiIiCN8vxRlUBPAC7",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2019-07-01T12:35:34.359Z",
"message": "{\"remote\":\"10.216.1.32\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/243105572/trace\",\"code\":\"202\",\"size\":\"7\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-15-sv-gprd\",\"fqdn\":\"api-15-sv-gprd.c.gitlab-production.internal\"}",
"json": {
"user": "-",
"method": "PATCH",
"path": "/api/v4/jobs/243105572/trace",
"agent": "gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)",
"tag": "nginx.access",
"fqdn": "api-15-sv-gprd.c.gitlab-production.internal",
"size": "7",
"host": "-",
"code": "202",
"remote": "10.216.1.32",
"environment": "gprd",
"referer": "",
"hostname": "api-15-sv-gprd"
},
"type": "pubsub-nginx-inf-gprd",
"message_id": "655619156752863",
"publish_time": "2019-07-01T12:35:34.310Z",
"beat": {
"name": "pubsub-nginx-inf-gprd",
"hostname": "pubsub-nginx-inf-gprd",
"version": "6.2.2"
}
},
"fields": {
"@timestamp": [
1561984534359
],
"publish_time": [
1561984534310
]
},
"highlight": {
"json.path": [
"/api/v4/jobs/@kibana-highlighted-field@243105572@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@"
],
"message": [
"{\"remote\":\"10.216.1.32\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PATCH\",\"path\":\"/api/v4/jobs/@kibana-highlighted-field@243105572@/kibana-highlighted-field@/@kibana-highlighted-field@trace@/kibana-highlighted-field@\",\"code\":\"202\",\"size\":\"7\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-15-sv-gprd\",\"fqdn\":\"api-15-sv-gprd.c.gitlab-production.internal\"}"
]
},
"sort": [
1561984534359
]
}
Notice how the last request is a 403, meaning GitLab Runner tried to send a trace but Rails returned a 403. This is even the case for the PUT request GitLab Runner sends at the end of the job.
PUT request from Runner
{
"_index": "pubsub-nginx-inf-gprd-2019.07.01",
"_type": "doc",
"_id": "AWutiWHihKvGNclJEffB",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2019-07-01T12:36:29.949Z",
"type": "pubsub-nginx-inf-gprd",
"message_id": "655603040320935",
"publish_time": "2019-07-01T12:36:29.928Z",
"message": "{\"remote\":\"10.216.1.32\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PUT\",\"path\":\"/api/v4/jobs/243105572\",\"code\":\"403\",\"size\":\"49\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-18-sv-gprd\",\"fqdn\":\"api-18-sv-gprd.c.gitlab-production.internal\"}",
"json": {
"environment": "gprd",
"remote": "10.216.1.32",
"host": "-",
"code": "403",
"hostname": "api-18-sv-gprd",
"agent": "gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)",
"tag": "nginx.access",
"user": "-",
"fqdn": "api-18-sv-gprd.c.gitlab-production.internal",
"method": "PUT",
"path": "/api/v4/jobs/243105572",
"size": "49",
"referer": ""
},
"beat": {
"name": "pubsub-nginx-inf-gprd",
"hostname": "pubsub-nginx-inf-gprd",
"version": "6.2.2"
}
},
"fields": {
"@timestamp": [
1561984589949
],
"publish_time": [
1561984589928
]
},
"highlight": {
"json.method": [
"@kibana-highlighted-field@PUT@/kibana-highlighted-field@"
],
"json.path": [
"/api/v4/jobs/@kibana-highlighted-field@243105572@/kibana-highlighted-field@"
],
"message": [
"{\"remote\":\"10.216.1.32\",\"host\":\"-\",\"user\":\"-\",\"method\":\"PUT\",\"path\":\"/api/v4/jobs/@kibana-highlighted-field@243105572@/kibana-highlighted-field@\",\"code\":\"403\",\"size\":\"49\",\"referer\":\"\",\"agent\":\"gitlab-runner 12.0.0 (12-0-stable; go1.8.7; linux/amd64)\",\"tag\":\"nginx.access\",\"environment\":\"gprd\",\"hostname\":\"api-18-sv-gprd\",\"fqdn\":\"api-18-sv-gprd.c.gitlab-production.internal\"}"
]
},
"sort": [
1561984589949
]
}
If we have seen a spike of 403s for the /trace
request during a production incident tracked in gitlab-com/gl-infra/production#928 (closed) which seems to correlates with the amount of 403s we had:
Now the errors above might be just a symptom of the performance degradation, we seem to have a steady stream of 403 even when GitLab.com was operating normally.
The code responsible for sending 403 can be found here which can be of two cases, the token is not valid or the job is not running anymore, right now we don't have anywhere in the log explaining what is going on. So either the Runner is sending the wrong token or the state of the job is being updated before the job actually finishes.
This doesn't seem to be limited to GitLab.com shared Runners but also for private runners https://gitlab.com/gitlab-org/gitlab-ce/issues/63972#note_187354251
Root cause
GitLab Runner sends an incremental update to the Coordinator (GitLab Rails) every 3 seconds, with a patch and a ping that the job is running, and at the end of the job we send the final trace and the job status to update. In the incremental update, we only used to update the job status when the job was set to running. In 11.10 with gitlab-runner!1292 (merged) we introduced a regression where we always sent the job status update no matter the status. This became a problem when PatchTrace took a long time to respond, and the job finished at the time it was sending this request so the status got updated before sending the finish request. This became very apparent when we have performance degradation on GitLab.com
Follow up issues to create
- Send a different http status code instead of 403 when job is not running and a request to trace or job state update is sent. https://gitlab.com/gitlab-org/gitlab-ce/issues/64362
- Inside of GitLab Runner logs, print out the internal state of the job AND the jot statues returned from the coordinator, this will help us figure out if we have a split brain situation. gitlab-runner#4454
- On
jobs/:id
request, when there is an update send in the response header to show to Runner that the status was updated. gitlab-runner#4461 - Show correlation ID gitlab-runner!1423 (closed)