Advanced Search: Add an option of using updated_at instead of callbacks
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Background
Our current Advanced Search callback logic can sometimes be fragile.
Proposal
I'd like to propose an alternative way we can queue documents for indexing for some document types.
What we can do is to use a cron worker that's going to scan the underlying table in increments using updated_at.
Let's say we switch projects to this new approach. This new worker/service is going to do is to pick initial time. Let's say Time.at(0) (epoch: 0). Every iteration we're going to do process a batch of documents:
updated_at_cursor = <load from redis>
projects = Project.where('updated_at > ?', updated_at_cursor).order(created_at: :asc, id: :desc).limit(PROCESSING_LIMIT)
Elastic::ProcessBookkeepingService.track!(*projects)
# save latest timestamp to redis (projects.last.updated_at)
# requeue the worker if needed. For example, if the number of projects is equal to the limit
The main benefit is that this approach should be much more robust.
If we decide to do that, I think we should use the approach similar to Search::Zoekt::SchedulingService where you have an initial task scheduling subtasks. In this case we'd run this for all supported document types (document types that opted-in into this new indexing logic)