Tests are not split optimally by Knapsack

Copied over from &25 (comment 188572759):

I observed that sometimes there can be large differences in runtime between jobs split by Knapsack. I'm not sure if this is due to variance in test runs but there are some other possible causes for this as well:

The Knapsack docs say: You’ll want to regenerate your execution report whenever you remove or add a test file with a long time execution time that would affect one of the CI nodes.

Looking at our scripts/merge-reports, it looks like we just add / update to the JSON report. This would cause deleted spec files' runtimes to still be considered by Knapsack when splitting the tests.

Removing the whole spec file is rare, but considering we've been merging reports for a long time now, we might have a few cases of this here. Also, we removed a lot of old migration specs recently, although I guess most if not all of those are pretty fast so it shouldn't affect the split that much.
I suspect that Knapsack doesn't handle tag filters that we pass to Rspec. We would need to pass KNAPSACK_TEST_FILE_PATTERN so that Knapsack filters it down to only those tests. This is what we do with our test level filter.

But in EE we also have --tag ~geo. There aren't that many Geo specs so maybe this doesn't cause a big enough impact. This is also difficult to incorporate into KNAPSACK_TEST_FILE_PATTERN because geo specs aren't in one directory. We could consider moving them though.

Having it split optimally would potentially save us some wall time and perhaps allow us to lower Knapsack parallelization and save on overhead.

Edited Jul 11, 2019 by Heinrich Lee Yu