Define abstraction levels and the boundaries that separate them

Over the past 3 years or so, I routinely ran into (performance) problems caused by how we define and (re)use abstractions. To illustrate, a very common problem is this:

Finder A builds a sub-query using Finder B, which uses a bunch of random ActiveRecord methods. Finder B does things in a way that don't work very well for Finder A, but because Finders offer a very high level API, there's no way around this. This often translates to the queries produced by Finder A performing badly, often as a result of repeating the same WHERE conditions in ways PostgreSQL can't optimise.

When dealing with these problems, there often is no solution as any change made is likely to touch a lot of code. This can result in it taking weeks to solve the problem. Because our tests often test the internals of methods, it's not unlikely you also end up breaking a lot of tests even if the public API/behaviour remains the same.

Boundaries

A first step towards solving these problems is to define abstraction levels, assign certain abstractions to those levels, then set up boundaries/rules on how to use and separate them. I consider there to be four abstraction levels:

High level
Medium level
Low level
Micro level

High level

High level abstractions are very simple: controllers, views, serializers, and presenters. Basically everything that's directly used to display data to the user.

Medium level

Medium level abstractions are Finders, Service classes, and similar classes. These typically take a "current user" object, a scope (project, group, etc), and a bunch of URL parameters. These typically only expose a few methods, such as an "execute" method.

Low level

Low level abstractions are:

Custom model class methods and scopes
Simple methods for finding rows, such as find_by_id or find_by_email. If it takes more than two arguments it's probably a micro level abstraction.
Model validations

Typically an object will offer quite a lot of these. Chaining them together is fine.

Micro level

Micro level abstractions would be the ActiveRecord querying API (where, pluck, etc), Gitlab::Git, etc. Basically everything that provides a lot of public methods, often requiring a lot of glue to be made useful.

Boundaries

Simply identifying these levels is not very helpful, instead we must also establish boundaries. The idea is very simple: an abstraction can only use abstractions that are exactly one level below it in the hierarchy. This means a Finder (medium level) can not reuse another Finder (medium level), but can use a class method defined on a model (low level).

Abstractions can also not go up the chain. This means an ActiveRecord instance method can't suddenly call back into a Finder.

Micro level abstractions can reuse each other, but only if they are defined in the same class/module. This prevents a developer from having to dig through 15 files just to figure out how something is built.

Goal and benefits

The end goal is to clearly separate abstractions, making it easier to use, debug, and test them. We will also enable for better refactoring, performance optimisations, and reuse. For example, you won't run into the issue of "I want to reuse X, but all I have is a super high level interface that doesn't quite do what I need it to", because instead you can just build your own abstraction reusing the same low level abstractions. This means you don't duplicate code, instead you reuse the same code as an existing abstraction, with some slight changes, without having to change the entire abstraction that already exists.

Examples

Without examples this might be a bit hard to wrap your head around, so let's show some cases where this can be helpful. Let's take a look at AutocompleteController, starting with the #projects method:

def projects
  project = Project.find_by_id(params[:project_id])
  projects = projects_finder.execute(project, search: params[:search], offset_id: params[:offset_id])
                                                                                                      
  render json: projects.to_json(only: [:id, :name_with_namespace], methods: :name_with_namespace)
end

This method lives in a high level abstraction (a controller), and uses:

The instance method projects_finder which returns MoveToProjectFinder.new(current_user), a medium level API
to_json, a micro level API provided by ActiveRecord
Project.find_by_id: a micro level abstraction, since it's part of the AR querying API Let's call this a violation of abstraction boundaries: a high level API should only use a medium level API, but instead it's all over the place.

Using the boundaries system, we'd instead change this to something along the lines of the following:

def projects
  projects = Autocomplete::MoveToProjectFinder.new(current_user, params).execute
  serializer = Autocomplete::ProjectsSerializer.new(projects)
                                                                    
  render json: serializer.to_json
end

Here we don't violate the boundaries, because we only use medium level APIs in our high level projects method.

To make this particular case work we'd either adjust MoveToProjectFinder to take params, or we introduce a new finder that reuses the same underlying logic. The serializer class just replaces the to_json call from before. The MoveToProjectFinder current looks like this:

class MoveToProjectFinder
  PAGE_SIZE = 50

  def initialize(user)
    @user = user
  end

  def execute(from_project, search: nil, offset_id: nil)
    projects = @user.projects_where_can_admin_issues
    projects = projects.search(search) if search.present?
    projects = projects.excluding_project(from_project)
    projects = projects.order_id_desc

    # infinite scroll using offset
    projects = projects.where('projects.id < ?', offset_id) if offset_id.present?
    projects = projects.limit(PAGE_SIZE)

    # to ask for Project#name_with_namespace
    projects.includes(namespace: :owner)
  end
end

This medium level abstraction uses various micro level abstractions, as well as a low level abstraction (projects_where_can_admin_issues, search, order_id_desc, and excluding_project). To clean this up, we'd change things into the following:

class MoveToProjectFinder
  PAGE_SIZE = 50

  def initialize(user)
    @user = user
  end

  def execute(from_project, search: nil, offset_id: nil)
    @user.projects_where_can_admin_issues
      .search(search) # "search()" handles the argument being nil
      .excluding_project(from_project)
      .order_id_desc
      .paginate(offset_id, PAGE_SIZE) # paginate, or whatever we call it, handles nil arguments itself
      .eager_load_namespace_and_owner # replaces the includes(), allowing it to be reused
  end
end

Here all the if statements have been removed in favour of the methods handling nil arguments themselves. Eager loading is moved to a dedicated method (instead of a "one size never fits all" solution such as inc_associations), and pagination is moved to a dedicated method (which applies both the WHERE and LIMIT).

In this new setup, our medium level finder is only using low level abstractions, because all the methods we use and chain are custom methods.

All of this is rather straightforward, and I think most would argue that this is "just" good clean code. The important part is that we actually define these rules, and enforce them, instead of merely thinking about them.

One really nice aspect is that when you make a change at a lower abstraction, you only have to worry about one level up (assuming of course the abstractions are well written). This means you should be able to change a micro level abstraction, without having to also change every high/medium/low level abstraction that uses this (which right now can be the case depending on the change you make). Because micro level abstractions only have limited room to reuse other micro abstractions, the likelihood of cascading problems is dramatically reduced.

One important thing to keep in mind is that we have to start making changes like this. Our merge requests get increasingly more complex, features add more and more, and problems become more difficult to debug on a daily basis. There are many changes we need to make to help solve all that, reorganising our code (mostly by splitting things up) is definitely one of those changes.

Closing thoughts

The boundaries are rather loosely defined, I suspect we'll refine them once we gather some more examples. This is also something where we really have to experiment in a merge request to get a better understanding of things, change the rules a bit, etc.

I deliberately steered clear of Trailblazer since I think it's too invasive, but I can see us applying many techniques from it that I haven't covered above, such as separate validation classes (which we already use in a few places). We also want to move non DB logic out of AR models, put hooks some place else, etc. All of that I think is a separate discussion, based on the boundaries we define for abstractions.

@smcgivern @DouweM @rymai @dzaporozhets: feedback welcome. The idea is still a bit rough on the edges, and I probably need to toy around with the code a bit to come up with something more refined.

Edited Mar 02, 2023 by Mohamed Hamda