Fix 404 errors and encoding issues in failure analyzer

What does this MR do?

Fixes two issues in the failure category analyzer that caused errors when processing job traces:

  1. 404 errors for nested project paths: Project path extraction regex only captured 2 segments (e.g., gitlab-org/quality) instead of all segments (e.g., gitlab-org/quality/observer) => These malformed URLs might also be responsible for the No trace found errors that we see in google console
  2. UTF-8 encoding errors: Job traces containing non-ASCII characters (emojis, special symbols) caused "\xE2" from ASCII-8BIT to UTF-8 conversion errors

Changes

  • Updated project path regex from ([-\w]+/[-\w]+) to ([-\w/]+)/-/ to capture all namespace segments
  • Added UTF-8 encoding conversion with .dup.force_encoding('UTF-8').scrub('?') in both API fetch and cache read paths
  • Added test coverage for deeply nested project paths and binary-encoded traces

Before

Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12236314689: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12236314689/trace # the observer path is dropped here because the project id was limited to 2 nested paths

Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12234384432: "\xE2" from ASCII-8BIT to UTF-8
Complete log of analysis
bin/failure_category_analyzer --csv tmp/input/rubocop-jobs-ch.csv --output-csv tmp/output/rubocop_ch.csv --threads 12 --no-cache

Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12236314689: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12236314689/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12234384432: "\xE2" from ASCII-8BIT to UTF-8                    | 7% (30/386) Time: 00:00:04  ETA: 00:00:58
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12246519000: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12246519000/trace
Error fetching job trace for https://gitlab.com/gitlab-org/gitlab/-/jobs/12220959023: end of file reached                      | 15% (60/386) Time: 00:00:09  ETA: 00:00:52
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12250916754: "\xE2" from ASCII-8BIT to UTF-8                   | 20% (78/386) Time: 00:00:13  ETA: 00:00:52
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12253169763: "\xE2" from ASCII-8BIT to UTF-8                   | 21% (84/386) Time: 00:00:13  ETA: 00:00:49
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12262021002: "\xE2" from ASCII-8BIT to UTF-8                  | 32% (127/386) Time: 00:00:21  ETA: 00:00:43
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12262195809: "\xE2" from ASCII-8BIT to UTF-8                  | 33% (128/386) Time: 00:00:21  ETA: 00:00:43
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12262115661: "\xE2" from ASCII-8BIT to UTF-8                  | 34% (132/386) Time: 00:00:21  ETA: 00:00:42
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12280062790: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12280062790/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12276869738: "\xE2" from ASCII-8BIT to UTF-8                  | 41% (162/386) Time: 00:00:25  ETA: 00:00:36
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12289144094: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12289144094/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12287360715: "\xE2" from ASCII-8BIT to UTF-8                  | 47% (183/386) Time: 00:00:30  ETA: 00:00:33
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12289415698: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12289415698/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12285431901: "\xE2" from ASCII-8BIT to UTF-8                  | 48% (189/386) Time: 00:00:31  ETA: 00:00:33
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12288229283: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12288229283/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12295125475: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12295125475/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12292244796: "\xE2" from ASCII-8BIT to UTF-8                  | 51% (200/386) Time: 00:00:33  ETA: 00:00:31
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12295572698: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12295572698/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12291763032: "\xE2" from ASCII-8BIT to UTF-8                  | 53% (205/386) Time: 00:00:34  ETA: 00:00:30
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12300593890: "\xE2" from ASCII-8BIT to UTF-8                  | 57% (221/386) Time: 00:00:36  ETA: 00:00:27
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12314084024: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12314084024/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12314190999: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12314190999/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12314731313: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12314731313/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12314827030: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12314827030/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12315126748: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12315126748/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12316203933: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12316203933/trace
Error fetching job trace for https://gitlab.com/gitlab-org/quality/observer/-/jobs/12315180247: Server responded with code 404, message: {"message" => "404 Project Not Found"}. Request URI: https://gitlab.com/api/v4/projects/gitlab-org%2Fquality/jobs/12315180247/trace
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12321113397: "\xE2" from ASCII-8BIT to UTF-8                  | 67% (262/386) Time: 00:00:43  ETA: 00:00:21
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12320821005: "\xE2" from ASCII-8BIT to UTF-8                  | 71% (276/386) Time: 00:00:46  ETA: 00:00:18
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12321919515: "\xE2" from ASCII-8BIT to UTF-8                  | 72% (278/386) Time: 00:00:46  ETA: 00:00:18
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12321329776: "\xE2" from ASCII-8BIT to UTF-8                  | 72% (280/386) Time: 00:00:46  ETA: 00:00:18
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12321337840: "\xE2" from ASCII-8BIT to UTF-8                  | 81% (314/386) Time: 00:00:51  ETA: 00:00:12
Error analyzing job https://gitlab.com/gitlab-org/gitlab/-/jobs/12335671223: "\xE2" from ASCII-8BIT to UTF-8===============   | 98% (380/386) Time: 00:01:02  ETA: 00:00:01
Analyzing jobs: |============================================================================================================| 100% (386/386) Time: 00:01:04 Time: 00:01:04

============================================================
FAILURE CATEGORY SUMMARY
============================================================
rubocop                               : 352
ERROR: "\xE2" from ASCII-8BIT to UTF-8:  18
no_trace_found                        :  16
-------------------------------------------
Total                                 : 386
============================================================

Caching was disabled (--no-cache)

Results written to: tmp/output/rubocop_ch.csv

After

Both issues resolved - analyzer processes all 386 jobs successfully with correct project paths and encoding handling.

Merge request reports

Loading