ActiveContext: Consistent error handling for preprocessors

What does this MR do and why?

Adds better error handling for ActiveContext preprocessors.

Instead of always returning the refs, we now return { successful:, failed: } and only continue to index the successes and re-enqueue the failures for retry.

It also only passes successful refs from one preprocessor to the next. E.g. if the first preprocessor failed for a ref, we shouldn't run the second preprocessor.

We introduce two new methods that can be called with refs and a block:

  • with_per_ref_handling
  • with_batch_handling

with_per_ref_handling: executes the block on every ref and if one ref fails, adds to failed and continues with the rest. Useful for per-ref operations.

with_batch_handling: executes the block on all the refs at once. If it fails, it sets all refs as failed.

Preprocessors

  • chunking preprocessor:
    • fails all refs if the chunks method is not defined
    • fails individual ref if there's an error with the chunk process
  • embedding preprocessor:
    • fails all refs if there's an error bulk generating embeddings
    • also refactors the preprocessor to build up bulk embeddings when there's only one document (previously we did bulk generation if a ref contains multiple docs, now we collect all documents from all refs and process them in batches)
  • preload preprocessor:
    • fails all refs if the preload_indexing_data method is not defined
    • fails individual ref if a corresponding database record can't be found

References

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #536212 (closed)

Edited by Madelein van Niekerk

Merge request reports

Loading