Some targets missed when processing at speed
We’re scaling up our testing of rugged. We send in targets related to a single project on Drupal.org in each batch, renaming their directory to tuf_ready_…
and immediately preparing the next batch, not waiting for rugged to finish processing. Over 50% of the batches are not being processed properly.
From the targets worker log:
[2023-06-15 21:43:09,592: INFO/ForkPoolWorker-2] Task add_targets_task[4531abb2-83d1-44a3-af5f-73d0144bda06] succeeded in 33.217705231159925s: (True, {'added_targets': ['drupal/views_data_export/1.9999999.9999999.9999999-dev', 'drupal/views_data_export/1.0.0.0-alpha2', 'drupal/views_data_export/1.0.0.0-alpha3', 'drupal/views_data_export/1.0.0.0-alpha4', 'drupal/views_data_export/1.0.0.0-beta1', 'drupal/views_data_export/1.0.0.0-beta3', 'drupal/views_data_export/1.0.0.0-beta4', 'drupal/views_data_export/1.0.0.0-RC1', 'drupal/views_data_export/1.0.0.0', 'drupal/views_data_export/1.1.0.0', 'drupal/views_data_export/1.2.0.0', 'drupal/views_data_export/1.3.0.0', 'drupal/views_data_export.json', 'drupal/views_data_export~dev.json', 'packages.json', 'tuf_processing_2023-06-15T21:41:47+00:00_28489/drupal/views_infinite_scroll/1.9999999.9999999.9999999-dev', 'tuf_processing_2023-06-15T21:41:47+00:00_28489/drupal/views_infinite_scroll/1.0.0.0-RC1', 'tuf_processing_2023-06-15T21:41:47+00:00_28489/drupal/views_infinite_scroll/1.0.0.0-RC2', 'tuf_processing_2023-06-15T21:41:47+00:00_28489/drupal/views_infinite_scroll/1.0.0.0-RC3', 'tuf_processing_2023-06-15T21:41:...', ...]})
[2023-06-15 21:43:09,594: WARNING/ForkPoolWorker-2] Received add-targets task.
[2023-06-15 21:43:15,865: WARNING/ForkPoolWorker-2] Moved inbound target 'drupal/viewsreference/1.0.0.0-alpha1' to targets directory.
[2023-06-15 21:43:15,868: WARNING/ForkPoolWorker-2] Added target 'drupal/viewsreference/1.0.0.0-alpha1' to 'bin_05a-05b' role.
I suspect the tuf_processing_…
files should not be mixed in with the others.
Would something like this be a good start at a fix?
diff --git a/rugged/tuf/repo.py b/rugged/tuf/repo.py
index 2a807b1..5ecb20e 100644
--- a/rugged/tuf/repo.py
+++ b/rugged/tuf/repo.py
@@ -270,6 +270,9 @@ class RuggedRepository():
if path.isdir(inbound_target):
# We only want files, not intermediate directories.
continue
+ if inbound_target.startswith('tuf_processing_'):
+ # Do not process targets that are still processing.
+ continue
log.debug(f"Found target: {inbound_target}")
inbound_targets.append(inbound_target)
return inbound_targets