Skip to content

Split audit_events table

Background 🏙

audit_events table is used to store all audit information of user-driven actions on GitLab application. "audit_events is the 4th largest table on GitLab.com" source. To provide more comprehensive audit events for compliance-minded organisations, we need to progressive extend our audit coverage while still maintaining reasonably performant and capable search functionalities on large audit events data.

Demands 📈

There is a growing needs to provide complex views around audit event level

  • Expose sub-groups and projects events under Group level view (#39139 (closed))

As well as more filtering/searching capabilities on the number of audit events

State of the world 🌏

They are visible under 3 levels: Project, Group and Instance as depicted below

We currently provide very limited filtering capabilities on these vast number of audit events, namely:

  • created_at date range
  • entity_type and entity_id

The table contains a polymorphic type - Entity (entity_id and entity_type) which can be either User, Project or Group.

The audit_events table schema currently look like the following:

Column Type Nullable Default Storage
id integer not null nextval('audit_events_id_seq'::regclass) plain
author_id integer not null plain
type character varying not null extended
entity_id integer not null plain
entity_type character varying not null extended
details text extended
created_at timestamp without time zone plain
updated_at timestamp without time zone plain
Index
"audit_events_pkey" PRIMARY KEY, btree (id)
"analytics_index_audit_events_on_created_at_and_author_id" btree (created_at, author_id)
"index_audit_events_on_entity_id_and_entity_type_and_id_desc" btree (entity_id, entity_type, id DESC)

Challenges 🚂

The current table schema poses a number of challenges:

  • Slow query time
  • Difficult to query information in details serialized hash
  • Inefficient utilization of indexes (i.e. analytics_index_audit_events_on_created_at_and_author_id is taking up spaces and not used by any queries)

"My main issue with our current schema is that we use a lot of serialized properties in the details section. This makes it very hard to query for certain types of events and index on them. My suggestion to break the tables out is so we can add specific fields for certain operations (e.g. user changed this setting from X to Y)." source

Proposal

Extending on the original @stanhu's thoughts (#7865 (comment 110902945)), we would like to distinguish two types of audit events:

  • Settings changes (very much like Change-Data Capture record with author)
  • Activities (that not always result in changes of our persisted stores)
instance_setting_audit_events
Column Type Nullable Default Storage
id integer not null nextval('instance_setting_audit_events_id_seq'::regclass) plain
author_id integer not null plain
attribute_name character varying(255) not null extended
old_value character varying(255) not null extended
new_value character varying(255) not null extended
ip_address character varying(255) not null extended
created_at timestamp without time zone not null plain
group_setting_audit_events
Column Type Nullable Default Storage
id integer not null nextval('group_setting_audit_events_id_seq'::regclass) plain
author_id integer not null plain
group_id integer not null plain
attribute_name character varying(255) not null extended
old_value character varying(255) not null extended
new_value character varying(255) not null extended
created_at timestamp without time zone not null plain
group_activity_audit_events
Column Type Nullable Default Storage
id integer not null nextval('group_activity_audit_events_id_seq'::regclass) plain
author_id integer not null plain
group_id integer not null plain
created_at timestamp without time zone not null plain
project_setting_audit_events
Column Type Nullable Default Storage
id integer not null nextval('project_setting_audit_events_id_seq'::regclass) plain
author_id integer not null plain
project_id integer not null plain
attribute_name character varying(255) not null extended
old_value character varying(255) not null extended
new_value character varying(255) not null extended
created_at timestamp without time zone not null plain
project_activity_audit_events
Column Type Nullable Default Storage
id integer not null nextval('project_activity_audit_events_id_seq'::regclass) plain
author_id integer not null plain
project_id integer not null plain
created_at timestamp without time zone not null plain
user_activity_audit_events
Column Type Nullable Default Storage
id integer not null nextval('user_activity_audit_events_id_seq'::regclass) plain
author_id integer not null plain
user_id integer not null plain
created_at timestamp without time zone not null plain

Migration plan-of-attack

  • How to migrate from existing audit_events table to new schema

Tasks to Evaluate

  • Determine feasibility of the feature
  • Create issue for implementation or update existing implementation issue description with implementation proposal

Risks and Implementation Considerations

  1. ip_address is currently available on admin_audit_log feature only. Is there any privacy concern to always capture this piece of information?
  2. Generate aggregated project level and group level view across new set of tables might require further optimization (i.e. maybe leveraging ElasticSearch?)
  3. Transactional grouping of audit events, esp. around settings changes. For example, an update on a project settings page could change both the name and description and end up with 2 separate audit events. Should the schema on the table reflect that they are part of a same action?
  4. Might require frontend changes to show a view for each table

Team

Related issues

List of audit events

References

Scope Name
GroupDestroyed Group
GroupMarkedForDeletion Group
GroupSettingsUpdated Group
GroupRestored Groups
ExportFileDownloadStarted Project
Impersonation Project
KeysCreated Project
MemberAccessApproved Project
MemberCreated Project
MemberDestroyed Project
MemberUpdated Project
ProjectArchived Project
ProjectCreated Project
ProjectDestroyed Project
ProjectFeaturesUpdated Project
ProjectGroupLinksCreated Project
ProjectGroupLinksUpdated Project
ProjectSettingsUpdated Project
ProjectUnarchived Project
ProtectedBranch Project
ReleaseArtifactsDownloaded Project
ReleaseAssociateMilestoneUpdated Project
ReleaseCreated Project
RepositoryDownloadStarted Project
RepositoryPush Project
DeployKeyCreated User
DeployKeyDisabled User
DeployKeyEnabled User
EmailUpdated User
LoginFailed User
LoginSuccessful User
OauthAccessInitiated User
SmartcardLoginSuccessful User
UserBlocked User
UserCreated User
UserDestroyed User
UserUpdated User
Edited by Dan Jensen