Define abstraction levels and the boundaries that separate them
Over the past 3 years or so, I routinely ran into (performance) problems caused by how we define and (re)use abstractions. To illustrate, a very common problem is this:
Finder A builds a sub-query using Finder B, which uses a bunch of random ActiveRecord methods. Finder B does things in a way that don't work very well for Finder A, but because Finders offer a very high level API, there's no way around this. This often translates to the queries produced by Finder A performing badly, often as a result of repeating the same WHERE conditions in ways PostgreSQL can't optimise.
When dealing with these problems, there often is no solution as any change made is likely to touch a lot of code. This can result in it taking weeks to solve the problem. Because our tests often test the internals of methods, it's not unlikely you also end up breaking a lot of tests even if the public API/behaviour remains the same.
A first step towards solving these problems is to define abstraction levels, assign certain abstractions to those levels, then set up boundaries/rules on how to use and separate them. I consider there to be four abstraction levels:
- High level
- Medium level
- Low level
- Micro level
High level abstractions are very simple: controllers, views, serializers, and presenters. Basically everything that's directly used to display data to the user.
Medium level abstractions are Finders, Service classes, and similar classes. These typically take a "current user" object, a scope (project, group, etc), and a bunch of URL parameters. These typically only expose a few methods, such as an "execute" method.
Low level abstractions are:
- Custom model class methods and scopes
- Simple methods for finding rows, such as
find_by_email. If it takes more than two arguments it's probably a micro level abstraction.
- Model validations
Typically an object will offer quite a lot of these. Chaining them together is fine.
Micro level abstractions would be the ActiveRecord querying API (
pluck, etc), Gitlab::Git, etc. Basically everything that provides a lot of public methods, often requiring a lot of glue to be made useful.
Simply identifying these levels is not very helpful, instead we must also establish boundaries. The idea is very simple: an abstraction can only use abstractions that are exactly one level below it in the hierarchy. This means a Finder (medium level) can not reuse another Finder (medium level), but can use a class method defined on a model (low level).
Abstractions can also not go up the chain. This means an ActiveRecord instance method can't suddenly call back into a Finder.
Micro level abstractions can reuse each other, but only if they are defined in the same class/module. This prevents a developer from having to dig through 15 files just to figure out how something is built.
Goal and benefits
The end goal is to clearly separate abstractions, making it easier to use, debug, and test them. We will also enable for better refactoring, performance optimisations, and reuse. For example, you won't run into the issue of "I want to reuse X, but all I have is a super high level interface that doesn't quite do what I need it to", because instead you can just build your own abstraction reusing the same low level abstractions. This means you don't duplicate code, instead you reuse the same code as an existing abstraction, with some slight changes, without having to change the entire abstraction that already exists.
Without examples this might be a bit hard to wrap your head around, so let's show some cases where this can be helpful. Let's take a look at
AutocompleteController, starting with the
def projects project = Project.find_by_id(params[:project_id]) projects = projects_finder.execute(project, search: params[:search], offset_id: params[:offset_id]) render json: projects.to_json(only: [:id, :name_with_namespace], methods: :name_with_namespace) end
This method lives in a high level abstraction (a controller), and uses:
- Project.find_by_id: a micro level abstraction, since it's part of the AR querying API
- The instance method
MoveToProjectFinder.new(current_user), a medium level API
to_json, a micro level API provided by ActiveRecord
Let's call this a violation of abstraction boundaries: a high level API should only use a medium level API, but instead it's all over the place.
Using the boundaries system, we'd instead change this to something along the lines of the following:
def projects projects = Autocomplete::MoveToProjectFinder.new(current_user, params).execute serializer = Autocomplete::ProjectsSerializer.new(projects) render json: serializer.to_json end
Here we don't violate the boundaries, because we only use medium level APIs in our high level
To make this particular case work we'd either adjust
MoveToProjectFinder to take
params, or we introduce a new finder that reuses the same underlying logic. The serializer class just replaces the
to_json call from before. The
MoveToProjectFinder current looks like this:
class MoveToProjectFinder PAGE_SIZE = 50 def initialize(user) @user = user end def execute(from_project, search: nil, offset_id: nil) projects = @user.projects_where_can_admin_issues projects = projects.search(search) if search.present? projects = projects.excluding_project(from_project) projects = projects.order_id_desc # infinite scroll using offset projects = projects.where('projects.id < ?', offset_id) if offset_id.present? projects = projects.limit(PAGE_SIZE) # to ask for Project#name_with_namespace projects.includes(namespace: :owner) end end
This medium level abstraction uses various micro level abstractions, as well as a low level abstraction (
excluding_project). To clean this up, we'd change things into the following:
class MoveToProjectFinder PAGE_SIZE = 50 def initialize(user) @user = user end def execute(from_project, search: nil, offset_id: nil) @user.projects_where_can_admin_issues .search(search) # "search()" handles the argument being nil .excluding_project(from_project) .order_id_desc .paginate(offset_id, PAGE_SIZE) # paginate, or whatever we call it, handles nil arguments itself .eager_load_namespace_and_owner # replaces the includes(), allowing it to be reused end end
Here all the
if statements have been removed in favour of the methods handling nil arguments themselves. Eager loading is moved to a dedicated method (instead of a "one size never fits all" solution such as
inc_associations), and pagination is moved to a dedicated method (which applies both the WHERE and LIMIT).
In this new setup, our medium level finder is only using low level abstractions, because all the methods we use and chain are custom methods.
All of this is rather straightforward, and I think most would argue that this is "just" good clean code. The important part is that we actually define these rules, and enforce them, instead of merely thinking about them.
One really nice aspect is that when you make a change at a lower abstraction, you only have to worry about one level up (assuming of course the abstractions are well written). This means you should be able to change a micro level abstraction, without having to also change every high/medium/low level abstraction that uses this (which right now can be the case depending on the change you make). Because micro level abstractions only have limited room to reuse other micro abstractions, the likelihood of cascading problems is dramatically reduced.
One important thing to keep in mind is that we have to start making changes like this. Our merge requests get increasingly more complex, features add more and more, and problems become more difficult to debug on a daily basis. There are many changes we need to make to help solve all that, reorganising our code (mostly by splitting things up) is definitely one of those changes.
The boundaries are rather loosely defined, I suspect we'll refine them once we gather some more examples. This is also something where we really have to experiment in a merge request to get a better understanding of things, change the rules a bit, etc.
I deliberately steered clear of Trailblazer since I think it's too invasive, but I can see us applying many techniques from it that I haven't covered above, such as separate validation classes (which we already use in a few places). We also want to move non DB logic out of AR models, put hooks some place else, etc. All of that I think is a separate discussion, based on the boundaries we define for abstractions.