Raw artifacts download fails with "error code: 1010" when the user agent starts with "Python-urllib"
Summary
When retrieving build artifacts via the /-/jobs/:jobid/artifacts/raw endpoint, the request fails when the User-Agent header starts with Python-urllib.
Using other user agents, such as the curl default, or anything not starting with Python-urllib that came to my mind, does not trigger this issue.
I assume this is some kind of "anti-scraping mechanism" by Cloudflare. This must have been introduced in the last few months; before this worked fine.
Steps to reproduce
-
curl https://gitlab.com/s3lph/spaceapi-server/-/jobs/603973027/artifacts/raw/dist/SHA256SUMS→ Works(What actually happens)(What actually happens) -
curl -H 'Python-urllib/3.8' https://gitlab.com/s3lph/spaceapi-server/-/jobs/603973027/artifacts/raw/dist/SHA256SUMS→ Doesn't work!
Example Project
See the URL in Steps to reproduce.
What is the current bug behavior?
> GET /s3lph/spaceapi-server/-/jobs/603973027/artifacts/raw/dist/SHA256SUMS HTTP/2
> Host: gitlab.com
> accept: */*
> user-agent: Python-urllib/3.8
>
< HTTP/2 403
< date: Sat, 20 Jun 2020 03:33:31 GMT
< content-type: text/plain; charset=UTF-8
< content-length: 16
< x-frame-options: SAMEORIGIN
< cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< expires: Thu, 01 Jan 1970 00:00:01 GMT
< set-cookie: __cfduid=df9e8d9415f387d96ab20260a806bb76e1592624011; expires=Mon, 20-Jul-20 03:33:31 GMT; path=/; domain=.gitlab.com; HttpOnly; SameSite=Lax; Secure
< cf-request-id: 037162f2cf0000cc6284204200000001
< expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< server: cloudflare
< cf-ray: 5a626dcaef7fcc62-ZRH
<
< error code: 1010
What is the expected correct behavior?
> GET /s3lph/spaceapi-server/-/jobs/603973027/artifacts/raw/dist/SHA256SUMS HTTP/2
> Host: gitlab.com
> user-agent: curl/7.70.0
> accept: */*
>
< HTTP/2 200
< date: Sat, 20 Jun 2020 03:38:29 GMT
< content-type: application/octet-stream
< content-length: 105
< set-cookie: __cfduid=db548765326ba4cce59cc69befca799ae1592624308; expires=Mon, 20-Jul-20 03:38:28 GMT; path=/; domain=.gitlab.com; HttpOnly; SameSite=Lax; Secure
< cache-control: no-cache
< content-disposition: attachment; filename="SHA256SUMS"
< content-security-policy: connect-src 'self' https://assets.gitlab-static.net https://gl-canary.freetls.fastly.net wss://gitlab.com https://sentry.gitlab.net https://customers.gitlab.com https://snowplow.trx.gitlab.net https://sourcegraph.com https://ec2.ap-east-1.amazonaws.com https://ec2.ap-northeast-1.amazonaws.com https://ec2.ap-northeast-2.amazonaws.com https://ec2.ap-northeast-3.amazonaws.com https://ec2.ap-south-1.amazonaws.com https://ec2.ap-southeast-1.amazonaws.com https://ec2.ap-southeast-2.amazonaws.com https://ec2.ca-central-1.amazonaws.com https://ec2.eu-central-1.amazonaws.com https://ec2.eu-north-1.amazonaws.com https://ec2.eu-west-1.amazonaws.com https://ec2.eu-west-2.amazonaws.com https://ec2.eu-west-3.amazonaws.com https://ec2.me-south-1.amazonaws.com https://ec2.sa-east-1.amazonaws.com https://ec2.us-east-1.amazonaws.com https://ec2.us-east-2.amazonaws.com https://ec2.us-west-1.amazonaws.com https://ec2.us-west-2.amazonaws.com https://iam.amazonaws.com; frame-ancestors 'self'; frame-src 'self' https://www.google.com/recaptcha/ https://www.recaptcha.net/ https://content.googleapis.com https://content-cloudresourcemanager.googleapis.com https://content-compute.googleapis.com https://content-cloudbilling.googleapis.com https://*.codesandbox.io; img-src * data: blob:; object-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://assets.gitlab-static.net https://gl-canary.freetls.fastly.net https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/ https://www.recaptcha.net/ https://apis.google.com 'nonce-G/fZT8DbhMgRei831w/4NA=='; style-src 'self' 'unsafe-inline' https://assets.gitlab-static.net https://gl-canary.freetls.fastly.net; worker-src https://assets.gitlab-static.net https://gl-canary.freetls.fastly.net https://gitlab.com blob:
< referrer-policy: strict-origin-when-cross-origin
< set-cookie: experimentation_subject_id=eyJfcmFpbHMiOnsibWVzc2FnZSI6Iklqa3dOREJrTXpkbExUQTJaV0V0TkRjM1ppMDVOek5pTFROa1l6ZzBNbVU1T1RCalppST0iLCJleHAiOm51bGwsInB1ciI6ImNvb2tpZS5leHBlcmltZW50YXRpb25fc3ViamVjdF9pZCJ9fQ%3D%3D--c89389351940ab5d541c382879bc719012f4309a; path=/; expires=Wed, 20 Jun 2040 03:38:29 -0000; secure; HttpOnly; SameSite=None
< x-content-type-options: nosniff
< x-download-options: noopen
< x-frame-options: DENY
< x-permitted-cross-domain-policies: none
< x-request-id: qpsWuWdPlE3
< x-runtime: 0.036225
< x-ua-compatible: IE=edge
< x-xss-protection: 1; mode=block
< strict-transport-security: max-age=31536000
< referrer-policy: strict-origin-when-cross-origin
< gitlab-lb: fe-01-lb-gprd
< gitlab-sv: web-04-sv-gprd
< cf-cache-status: MISS
< accept-ranges: bytes
< cf-request-id: 0371677a9a0000cc52f631f200000001
< expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< server: cloudflare
< cf-ray: 5a62750a9da8cc52-ZRH
<
< 2545c15e02962580609aa668e480a0ddf10ff8b352b65dddb52e410fe3020927 spaceapi_server-0.3.1-py3-none-any.whl
Relevant logs and/or screenshots
See above.
Output of checks
This bug happens on GitLab.com
Possible fixes
As a workaround, I can of course just override urllib's user agent...