Skip to content

Allow Adapter#select_all to be performed asynchronously from a background thread pool

Carla Drago requested to merge github/fork/Shopify/ar-adapter-async-query into main

Created by: casperisfine

Context

Sometimes a controller or a job has to perform multiple independent queries, e.g.:

def index
  @posts = Post.published
  @categories = Category.active
end

Since these two queries are totally independent, ideally you could execute them in parallel, so that assuming that each take 50ms, the total query time would be 50ms rather than 100ms.

A very naive way to do this is to simply call Relation#to_a in a background thread, the problem is that most Rails applications, and even Rails itself rely on thread local state (PerThreadRegistry, CurrentAttributes, etc). So executing such a high level interface from another thread is likely to lead to many context loss problems or even thread safety issues.

What we can do instead, is to schedule a much lower level operation (Adapter#select_all) in a thread pool, and return a future/promise. This way we keep most of the risky code on the main thread, but perform the slow IO in background, with very little chance of executing some code that rely on state stored in thread local storage.

Also since most users are on MRI, only the IO can really be parallelized, so scheduling more code to be executed in background wouldn't lead to better performance.

For more context, I experimented with a quick proof of concept in this gist: https://gist.github.com/casperisfine/0ccd24dc209665c46e83bcc2920dd7dc

PR Scope

As to make the feature easier to review, here's I'm only adding the async interface to AbstractAdapter, If that's OK, I'd rather keep the changes to Relation for a followup.

Implementation concerns

I'll be adding review comments for some specific points, but in general:

  • The Async query interface should be as close to the database driver as possible as to avoid thread context issues. E.g. the SQL query and binds must already be evaluated etc.
  • If the request / job is completed without accessing a query result, we should make sure to cancel any in flight async query. That is what AsynchronousQueriesTracker is for. I'm just unsure of the best way to wrap it around user code. I feel like we could piggy back on the query cache enable/disable?
  • I'm quite unsure what to do with ActiveSupport::Notifications.instrument, ideally I'd like to find a way to call the notifier on the main thread as to avoid any thread state issues with users callbacks.

cc @rafaelfranca @Edouard-chin @tenderlove @matthewd because we talked about this last week. cc @kamipo because Active Record cc @eileencodes because connection pools etc.

Merge request reports