GraphQL::StringEncodingError on merge request diffstats paths
Summary
When trying to run the extraction job for the project https://gitlab.com/owarezaka/owarezaka, it fails with error 500 received from GitLab. This job call the merge request diffStats
GraphQL. The logs on the GitLab side imply that the issue was due to the encoding of one of the MRs.
Steps to reproduce
- Navigate to the GraphiQL Explorer
- Run the following query
{ project(fullPath: "owarezaka/owarezaka") { mergeRequests( createdAfter: "1970-01-01T14:58:50+00:00", createdBefore: "2022-09-27T14:58:50+00:00", first: 25, sort: CREATED_ASC, after: "eyJjcmVhdGVkX2F0IjoiMjAyMi0wNC0xMiAxNDoxNDowNi43NTY0NjEwMDAgKzAwMDAiLCJpZCI6IjE1MDAxNDU1OCJ9" ) { nodes { iid diffStats { path } } } } }
- Observe the internal server error
Example Project
owarezaka/owarezaka
and the problematic MR diffs.
We suspect that this problem might exist for repositories that contain UTF8 characters in paths.
What is the current bug behavior?
The GraphQL query returns 500 Internal Server Error
What is the expected correct behavior?
The GraphQL query should return paths with correct encoding and no errors.
Relevant logs and/or screenshots
String "spec/__mocks__/Group\xC4\xB1tem.js" was encoded as ASCII-8BIT @ project.mergeRequests.nodes.24.diffStats.13.path (DiffStats.path). GraphQL requires an encoding compatible with UTF-8.
Reference to the full log can be found here.
Output of checks
This bug happens on GitLab.com
Possible fixes
diff_stats
is coming from Gitaly client. I wonder if we can encode the path
to apply the appropriate encoding, using the EncodingHelper.encode!
helper.