Fix 404 errors and encoding issues in failure analyzer
What does this MR do?
Fixes two issues in the failure category analyzer that caused errors when processing job traces:
-
404 errors for nested project paths: Project path extraction regex only captured 2 segments (e.g.,
gitlab-org/quality) instead of all segments (e.g.,gitlab-org/quality/observer) => These malformed URLs might also be responsible for theNo trace founderrors that we see in google console -
UTF-8 encoding errors: Job traces containing non-ASCII characters (emojis, special symbols) caused
"\xE2" from ASCII-8BIT to UTF-8conversion errors
Changes
- Updated project path regex from
([-\w]+/[-\w]+)to([-\w/]+)/-/to capture all namespace segments - Added UTF-8 encoding conversion with
.dup.force_encoding('UTF-8').scrub('?')in both API fetch and cache read paths - Added test coverage for deeply nested project paths and binary-encoded traces
Before
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12236314689: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12236314689/trace # the observer path is dropped here because the project id was limited to 2 nested paths
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12234384432: "\xE2" from ASCII-8BIT to UTF-8
Complete log of analysis
bin/failure_category_analyzer --csv tmp/input/rubocop-jobs-ch.csv --output-csv tmp/output/rubocop_ch.csv --threads 12 --no-cache
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12236314689: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12236314689/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12234384432: "\xE2" from ASCII-8BIT to UTF-8 | 7% (30/386) Time: 00:00:04 ETA: 00:00:58
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12246519000: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12246519000/trace
Error fetching job trace for https://gitlab.com/gitlab-org/gitlab/-/jobs/12220959023: end of file reached | 15% (60/386) Time: 00:00:09 ETA: 00:00:52
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12250916754: "\xE2" from ASCII-8BIT to UTF-8 | 20% (78/386) Time: 00:00:13 ETA: 00:00:52
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12253169763: "\xE2" from ASCII-8BIT to UTF-8 | 21% (84/386) Time: 00:00:13 ETA: 00:00:49
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12262021002: "\xE2" from ASCII-8BIT to UTF-8 | 32% (127/386) Time: 00:00:21 ETA: 00:00:43
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12262195809: "\xE2" from ASCII-8BIT to UTF-8 | 33% (128/386) Time: 00:00:21 ETA: 00:00:43
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12262115661: "\xE2" from ASCII-8BIT to UTF-8 | 34% (132/386) Time: 00:00:21 ETA: 00:00:42
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12280062790: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12280062790/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12276869738: "\xE2" from ASCII-8BIT to UTF-8 | 41% (162/386) Time: 00:00:25 ETA: 00:00:36
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12289144094: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12289144094/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12287360715: "\xE2" from ASCII-8BIT to UTF-8 | 47% (183/386) Time: 00:00:30 ETA: 00:00:33
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12289415698: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12289415698/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12285431901: "\xE2" from ASCII-8BIT to UTF-8 | 48% (189/386) Time: 00:00:31 ETA: 00:00:33
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12288229283: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12288229283/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12295125475: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12295125475/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12292244796: "\xE2" from ASCII-8BIT to UTF-8 | 51% (200/386) Time: 00:00:33 ETA: 00:00:31
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12295572698: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12295572698/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12291763032: "\xE2" from ASCII-8BIT to UTF-8 | 53% (205/386) Time: 00:00:34 ETA: 00:00:30
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12300593890: "\xE2" from ASCII-8BIT to UTF-8 | 57% (221/386) Time: 00:00:36 ETA: 00:00:27
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12314084024: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12314084024/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12314190999: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12314190999/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12314731313: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12314731313/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12314827030: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12314827030/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12315126748: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12315126748/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12316203933: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12316203933/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12315180247: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12315180247/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12321113397: "\xE2" from ASCII-8BIT to UTF-8 | 67% (262/386) Time: 00:00:43 ETA: 00:00:21
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12320821005: "\xE2" from ASCII-8BIT to UTF-8 | 71% (276/386) Time: 00:00:46 ETA: 00:00:18
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12321919515: "\xE2" from ASCII-8BIT to UTF-8 | 72% (278/386) Time: 00:00:46 ETA: 00:00:18
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12321329776: "\xE2" from ASCII-8BIT to UTF-8 | 72% (280/386) Time: 00:00:46 ETA: 00:00:18
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12321337840: "\xE2" from ASCII-8BIT to UTF-8 | 81% (314/386) Time: 00:00:51 ETA: 00:00:12
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12335671223: "\xE2" from ASCII-8BIT to UTF-8=============== | 98% (380/386) Time: 00:01:02 ETA: 00:00:01
Analyzing jobs: |============================================================================================================| 100% (386/386) Time: 00:01:04 Time: 00:01:04
============================================================
FAILURE CATEGORY SUMMARY
============================================================
rubocop : 352
ERROR: "\xE2" from ASCII-8BIT to UTF-8: 18
no_trace_found : 16
-------------------------------------------
Total : 386
============================================================
Caching was disabled (--no-cache)
Results written to: tmp/output/rubocop_ch.csv
After
Both issues resolved - analyzer processes all 386 jobs successfully with correct project paths and encoding handling.