Distributed Reads no longer appear to work as expected for Cluster, significantly affecting performance
In our performance tests today we noticed numerous failures across 5 of our test environments. After inspection it was clear that Distributed Reads are no longer working as expected and this is dramatically degrading performance.
The behavior we're seeing today is weird and hard to understand. Looking at our performance test runs it shows that Distributed Reads are not working as expected except for one specific test, api_v4_projects_repository_files_file_raw
, which does seem to somehow trigger the Reads functionality but in a very inbalanced fashion:
In addition to the above we're seeing the Praefect Postgres instance maxing out it's CPU again. This behavior was seen previously whenever Distributed Reads are on but Praefect cache functionality wasn't working. Here we're seeing the CPU max out even though Reads aren't actually working:
As mentioned this is happening consistently across 5 of our test environments today after they were all updated to the latest Nightly Omnibus package. They were all working fine on the last test on Friday.