Improve performance of get_tuids_containing
Explanation of changes
The method get_tuids_containing can be slow because it has to parse all stored experiments. A relatively expensive part of the method is the TUID.is_valid(x[:26]) check in the filter method. We can avoid that check in many cases by performing the (contains in x) check first. The latter check is much faster and (assuming a user stores many the quantify datasets in the quantify folder), the TUID.is_valid(x[:26]) check will almost always be true, while the (contains in x) check can be very selective.
In our system (> 100.000 experiments) performance of get_tuids_containing with the contains argument set goes from 4.1 seconds to 0.6 seconds.
Motivation of changes
This improvement has no impact on the public interface. There are other methods, but they require more work.
Merge checklist
See also merge request guidelines
-
Merge request has been reviewed (in-depth by a knowledgeable contributor), and is approved by a project maintainer. -
New code is covered by unit tests (or N/A). - [N/A] New code is documented and docstrings use numpydoc format (or N/A).
- [N/A] New functionality: considered making private instead of extending public API (or N/A).
- [N/A] Public API changed: added
@deprecated(or N/A). - [N/A] Newly added/adjusted documentation and docstrings render properly (or N/A).
- [N/A] Pipeline fix or dependency update: post in
#software-for-developerschannel to mergemainback in or update local packages (or N/A). - [N/A] Tested on hardware (or N/A).
- [N/A]
CHANGELOG.mdandAUTHORS.mdhave been updated (or N/A). -
Windows tests in CI pipeline pass (manually triggered by maintainers before merging). - Maintainers do not hit Auto-merge, we need to actively check as manual tests do not block pipeline
For reference, the issues workflow is described in the contribution guidelines.