Windows shared runners fail when processing commit messages containing non-ASCII
Summary
Writing a commit message containing non-ASCII UTF-8 characters and then pushing this to a repository that runs CI on Windows shared runners sometimes causes these CI jobs to fail. The cause appears to be an attempt to parse the commit message using a shell script.
Steps to reproduce
I am not sure precisely what the necessary conditions are (sometimes this causes a failure, sometimes it doesn't), but see the example project section below.
Example Project
Graphviz MR graphviz/graphviz!2479 (closed). The pipeline for this failed rather messily on all Windows shared runners, https://gitlab.com/graphviz/graphviz/-/pipelines/480139907. Yet here is a pipeline using a commit with the same diff but a differing commit message that passed, https://gitlab.com/graphviz/graphviz/-/pipelines/480126679.
What is the current bug behavior?
Hopefully it's obvious from the above, but the Windows runners choke on non-ASCII characters in commit messages.
What is the expected correct behavior?
CI should be agnostic to commit message content.
Relevant logs and/or screenshots
Here's a log if the above links don't work, commit-message-parse-fail.log
Output of checks
This bug happens on GitLab.com
Results of GitLab environment info
N/A
Results of GitLab application Check
N/A
Possible fixes
Ideally just stop parsing commit messages within the CI job itself. If there's some reason this needs to be done, please consider using a language other than shell/PowerShell for this.
I tried searching for duplicates of this issue and couldn't find any, but I may have missed something, in which case apologies for that. It's also possible that we (Graphviz) screwed up something on our side and this is the result of our actions, in which case I'd appreciate any tips on what we're doing wrong.