Skip to content

pwsh shell on windows - unicode characters are corrupted by 13.9 STDIN change

Summary

A customer raised a ticket (🎫 link for GitLab team members) relating to the Powershell Core (pwsh) variation of the shell executor.

Since 13.9, when the powershell started being passed via STDIN, they found that a unicode character they had defined was getting corrupted by this process.

Detail on their use case is below, but they use a specific character as a placeholder value, and have powershell module(s) with that character coded into it.

They then need to be able to put the same value in their environment variables and CI.

However, when the value is passed via STDIN it is modified to another character or characters and so no longer matches. This breaks their code.

Steps to reproduce

See project: https://gitlab.com/bprescott-support/testing/zd206695-win_ps_chars

Or, the code is in this tarball:

27842-code.tgz

Actual behavior

The powershell hex Format-Hex function prints the characters in ASCII as well as hex. I encountered misbehavior with GitLab when attempting to render these, so I've edited them out of the output here.

  • .gitlab-ci.yml defines GL_Test1 with the unicode character - this value is fixed for all the jobs
  • three different approaches are defined for providing the reference character to powershell.
  • the value which powershell finds is output in hex and returned as an artifact.
  • the correct return is:
          Offset Bytes                                          
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ -----------------------------------------------
0000000000000000 E2 88 85                                       
  • all three approaches are illustrated using:
    • the 13.8 runner, which didn't use STDIN
    • the 13.9 runner, which introduced STDIN to demonstrate when the regression occurred
    • the 13.12 runner, to confirm current state.

There's three illustrations of how this variable is being handled.

[1] The inyaml jobs (27842-psm_in_yaml stage) have the powershell code inside the yaml, including with the hard coded value.

  • as the test variable and the reference value are both defined in yaml, and both passed through STDIN, the comparison works
  • the comparison works because both strings are corrupted, so powershell is comparing like with like
  • compare the hex output for 13.8 and 13.8+ to observe the character being corrupted

[2] the inargs jobs (27842-unicode_argument stage) have the powershell code in a module, but pass the reference value in as a parameter

  • similarly, as the test variable and the reference value are both coming from the CI, the comparison works
  • the change in behaviour from 13.8 to 13.9 can be observed in the hex output.

[3] the inpsm jobs (27842-unicode_in_psm) have the powershell code in a module, and the reference value is in the module

  • the module is imported to powershell directly, so the reference character is not modifed - the hex output shows this correct
  • the CI variables continue to be modified from 13.9 upwards. In 13.8, the comparison works, from 13.9 the comparison fails.
  • these are the main jobs to look at. This illustrates mostly clearly that the same unicode string gets into Powershell OK via the .psm1 file in the test repository, but the one in .gitlab-ci.yml does not. The 13.8 job is the 'control' and shows what used to happen, the 13.9 and 13.12 jobs show the corrupted output.
> unicode from .gitlab-ci.yml:


   Label: String (System.String) <6FEC9241>

          Offset Bytes                                          
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ -----------------------------------------------
0000000000000000 C3 94 C3 AA C3 A0                               

> unicode being used for comparison:


   Label: String (System.String) <6895EFAF>

          Offset Bytes                                          
                 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
          ------ -----------------------------------------------
0000000000000000 E2 88 85                                       

example pipeline

https://gitlab.com/bprescott-support/testing/zd206695-win_ps_chars/-/pipelines/298008377

  • Hex output from Job #1239385175 13.12_inpsm

hex_compare_13_12.txt

  • Hex output from Job #1239385171 13.8_inpsm

hex_compare_13_8.txt

The jobs in stages 27830-erroractionpreferenc and 27830-error2 relate to #27830 (closed)

Expected behavior

Characters set in the CI and in variables should be passed into powershell uncorrupted.

Relevant logs and/or screenshots

job log
Add the job log

Environment description

I have a Windows 10 laptop with multiple runners set up as services. Reproduced using Powershell 7.0.6 (See job output for version)

All are configured with the defaults except for check_interval

concurrent = 1
check_interval = 13

[session_server]
  session_timeout = 1800

[[runners]]
  name = "foo"
  url = "https://gitlab.com"
  token = "bar"
  executor = "shell"
  shell = "pwsh"

customer use case (detail)

Customer used to use another CI solution that supported a hierarchy of variables, including allowing variables to be defined in scopes which GitLab does not support such as individual branches.

When migrating to GitLab, they maintained this way of working by

  • using parent/child pipelines
  • constructing the equivalent variable hierarchy in the parent pipeline in powershell
  • determining what values have "won" for that particular pipeline, and writing them out to their CI
  • executing that CI as a dynamic child pipeline

They have in excess of 40 repositories working this way.

Their code requires a placeholder value to represent an empty set, and to avoid a collision, and they use unicode character: "∅"

Used GitLab Runner version

13.8: does not display the issue 13.9: displays the issue 13.12 beta: still displays the issue

Possible fixes

Related to change to use STDIN for pwsh - !2715 (merged)

Edited by Ben Prescott (ex-GitLab)