Data access layer interface for Error Tracking
Idea
To roll out the new error tracking backend, we implement an interface that allows us to switch between the old (PG) and the new backend (ClickHouse).
Why: ActiveRecord exposes a wide interface for querying data which we won't be able to fully implement for our ClickHouse models. By creating our own interface we ensure that all data access methods are limited and known to us.
Rollout plan
- Add the interface and implement it for the PG backend.
- Implement the interface for the ClickHouse backend.
- Add a switch (feature flag) that switches between PG and ClickHouse backend.
Note: We currently don't plan to maintain the data in both PG and ClickHouse. We assume that we'll migrate the existing data in PG as a one-off task.
Main features
Error Tracking feature can be split into three parts:
- Data ingestion: inserting
events
records. - Querying data: showing errors and the last event.
- Updating status: Marking an error
errors
as resolved.
Relation: Error
has many Event
Querying data
The PG version uses keyset pagination to list the ordered Error
records on the page. Based on the findings here, we can use the same keyset pagination queries with ClickHouse.
Proposed interface
module ErrorInterface # TODO: find a better name
# params contains the following keys:
# - project_id
# - name
# - description
# - actor
# - platform
# - last_seen_at
# - environment
# - level
# - payload
#
# Note: we might need an extra method that generates the queries or just the payload in case
# we need bulk insert functionality
def self.report_event(params)
# PG: upsert Error model + insert Event model
# ClickHouse: insert Event model, Error model will be automatically updated via ClickHouse
end
# filters might contain the following values:
# - project_id (always present)
# - status
# - query (free text search, not working in PG)
#
# sorting options (asc, desc):
# - last_seen_at
# - first_seen_at
# - frequency (event count)
# cursor: a hash used for keyset pagination to load the next rows
def self.list(filters:, sort: :last_seen_at, direction: :desc, cursor: {}, limit: 20)
# PG: filtering logic is implemented in the Error and ErrorsFinder classes.
end
# Not sure about the method name. In PG we have a primary key.
# In ClickHouse we'll probably have a unique fingerprint value.
def self.find_by_unique_identifier(value)
end
# We show the last and first events on the UI, this data can be easily queried
# with the list method.
def last_event
self.class.list(filters: {}, sort: :last_seen_at, direction: :desc, cursor: {}, limit: 1).first
end
def first_event
self.class.list(filters: {}, sort: :last_seen_at, direction: :asc, cursor: {}, limit: 1).first
end
def update_status(new_status)
# PG: update the status column
end
end