Do not mask migration output during reconfigure
A recent change via !5971 (merged) has marked migration output to stdout/stderr as sensitive.
Impact:
25+ customer tickets as of 2022-05-14. Related support tickets this change has been generating can be tracked through this search: https://gitlab.zendesk.com/agent/search/1?type=ticket&q=%22Command%20execution%20failed.%20STDOUT%2FSTDERR%20suppressed%20for%20sensitive%20resource%22%20order_by%3Acreated_at%20sort%3Aasc
Related documentation update to help stem these ticket creations from masked migration failures: gitlab!87437 (merged)
Change that occurred:
Before (sample):
Recipe: gitlab::database_migrations
* ruby_block[check remote PG version] action nothing (skipped due to action :nothing)
* rails_migration[gitlab-rails] action run
* bash[migrate gitlab-rails database] action run
================================================================================
Error executing action `run` on resource 'bash[migrate gitlab-rails database]'
================================================================================
Mixlib::ShellOut::CommandTimeout
--------------------------------
Command timed out after 3600s:
Command exceeded allowed execution time, process terminated
---- Begin output of "bash" "/tmp/chef-abcd" ----
STDOUT:
STDERR:
---- End output of "bash" "/tmp/chef-abcd" ----
After:
Recipe: gitlab::database_migrations
* ruby_block[check remote PG version] action nothing (skipped due to action :nothing)
* rails_migration[gitlab-rails] action run
* bash[migrate gitlab-rails database] action run
================================================================================
Error executing action `run` on resource 'bash[migrate gitlab-rails database]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Command execution failed. STDOUT/STDERR suppressed for sensitive resource
Reason it should be reverted, at least for the migration step
There are multiple challenges this causes to users and the GitLab support team members:
- Upgrades auto-performed during package installations often fail, especially since 14.0, due to batched background migrations not having completed on the previous version (and many more reasons in associated epic)
- The errors in this scenario from the migration utility were designed to instruct the user the commands to run to be able to complete their upgrade successfully
- These errors and self-resolve instructions would appear on the upgrade attempt output prior to the change above
- With the masking this is entirely hidden away, leaving the user with no actionable insight
- Docker container upgrades rely exclusively on the stdout/stderr from the containers to analyze the migration failure
- Often the logs directory isn't mounted to a host path externally
- With the sensitive masking of migrate output such users are now left with zero clue on why the container refuses to upgrade, and no way to extract any information because the container enters a crash-loop due to the migration error
In terms of support tickets during self-managed customer emergencies, customers often share a copy-paste output they receive on the screen when things fail. This allows us to answer quick and with precision because we can perform issue lookups with the data we receive from migration failures during reconfigures.
With this change we have introduced an avoidable round-trip where we now have to (1) educate the customer about discovering and picking out the right timestamped migration/reconfigure log files (2) require the customer to always perform a file upload to us because the output is useless (3) deal with many more customer emergencies than before because they cannot self-diagnose issues the product may have already added helpful text for.
Please consider reverting this behavior and adopt a different approach for protecting the initial migration output that formed the reason for this change.