user_diff_report is bottlenecking chef-repo pipeline
Background
As part of our chef-repo pipeline on ops we run a job called user_diff_report
(example pipeline).
This job is really slow, taking almost 5 minutes to perform:
$ userDiff=$(for f in data_bags/users/*.json; do knife diff --name-status --chef-repo-path $CI_PROJECT_DIR $f; done)
Problem
This job is in the critical path for rolling out chef changes. During an incident we want to be able to respond quickly.
Moreover, the job does not appear to be working very well at present. Since failures are allowed on this job, and it is running on ops, we don't actually actively check this output. I randomly looked at it a few days ago and saw that quite a lot of drift had accumulated.
Proposal
I think we should remove this job from the chef-repo pipeline and instead move it to a periodic report that either posts a message in slack (similar to terraform drift detector) or opens an issue for triaging.
This would likely bring the chef-repo
merge pipeline to under 1 minute (down from almost 6 minutes).