Use data_consistency :sticky because it's the best trade-off. A data consistency delayed could cause the job to be rescheduled if the replica hasn't caught up, but the worker is set to run with urgency :high so we want to avoid that. A sticky mode still allows us to use replica as much as possible.
@jreporter@cheryl.li This change is trivial but setting weight: 2 because we would want to roll this out with feature flag so there is that extra overhead.
After investigation with Nikola we found the that the issue with the persisted multiple primary calls was the result of calling Sticking.stick in the loop deep down in the method execution chain.
All the primary calls except the one are cached, so it shouldn't block us from fully rolling out the worker, which I am doing now and monitoring. The typical "primary vs replica" graph won't be as pretty as in the usual worker rollout case, but the fix to Sticking.stick may be delivered in parallel.
Primary calls won't disappear fully because of what I've explained above ^ (until the follow-up fix is merged), but we could confirm that all calls except the first one to primary are cached ones:
I added an additional graph into my dashboard copy - non-cached primary calls count, as it makes it more visible how the LB enabling affected the DB primary pressure.
@changzhengliu FYI: I don't think we'll meet the exact due date set in this issue (end of today, I guess?), as I plan to monitor this under the feature flag today and remove the feature flag from the codebase (thus, closing this issue) tomorrow: !64180 (merged)
@alipniagov, regarding the due date, I think it's tomorrow, 2021-06-17. I think it's fine to take extra time to sweep the follow-ups. The most important part is that it's working on production right now. Thanks for helping with this effort!!