[ActiveContext] Skip previous model embeddings during model switch

The following discussion from !222416 (merged) should be addressed:

@maddievn started a discussion:

comment: I'm still on the fence about using current + next for generating index embeddings. When doing a backfill we'd be generating embeddings for all the old fields which is wasteful. But not including current means incremental updates or new projects would be incomplete for searches until the search switches. Maybe we should make backfill use a dedicated queue and that only uses the next model 🤔

Anyway, having this method is good and we can remove current.


@partiaga: Yeah, I opted to remove the current_ prefix because it's both the current and next embeddings.

When we integrate this into the ActiveContext processing (!222417 (merged)), my plan is to have it follow the current approach where it processes both current + next, as I don't want to depart from the existing logic in what's essentially a refactor/object model redesign.

Thinking about this further, maybe it's best to name it indexing_embedding_models to indicate it's the models used during the indexing process.


Talking about changing the logic though:

Maybe we should make backfill use a dedicated queue and that only uses the next model

This sounds good! This would also make it easier to track the backfill vs the current embeddings processing. I'm guessing we'd have to pass that info on which model to use (current vs next) from the bulk processor classes -> Reference -> apply_embeddings preprocessor? And I assume we'd need to have a separate dead queue for it? Anyway, something for a separate issue 🙂


JTBD

  • Figure out how to prevent wasting embedding generations during model switch
  • Implement the solution
Edited by 🤖 GitLab Bot 🤖