Refactor AuditEvent
## Problem to solve The `AuditEvent` model is heavily used. [For example](https://gitlab.com/gitlab-org/gitlab/-/issues/7865#note_292580533), "`audit_events` is the 4th largest table on GitLab.com" with [over 100m records](https://gitlab.com/gitlab-org/gitlab/-/issues/7865#note_109418643). However, the `AuditEvent` model is also largely unstructured. It stores most of its data in a Hash-serialized `text` column called `details`. That data is not validated. This lack of structure has multiple harmful consequences. <details> - Data integrity suffers because it's hard to notice bugs ([example](https://gitlab.com/gitlab-org/gitlab/-/issues/216577)). For example, instead of relying on validation rules and test coverage, we must rely on developers to somehow know the rule that the "action" must be the first item in the hash. - Data is not easily queryable - we don't want to query a `text` column across 100m records! - Classes that depend on `AuditEvent` must be much more complicated than we want them to be, like [`AuditEventService`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/audit_event_service.rb) and [`Audit::Details`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/audit/details.rb). - [Adding new AuditEvents](https://gitlab.com/groups/gitlab-org/-/epics/736) is more complicated, and therefore slower, than it should be. </details> ## Proposal - [x] [Disintegration of commonly used `details` attributes into individual columns in AuditEvents table](https://gitlab.com/gitlab-org/gitlab/-/issues/220116) ("[parallel persistence](https://gitlab.com/groups/gitlab-org/-/epics/2765#note_346882371)") - [x] [ip_address](https://gitlab.com/gitlab-org/gitlab/-/issues/217442) - [x] [target_id and target_type](https://gitlab.com/gitlab-org/gitlab/-/issues/220322) (related to https://gitlab.com/gitlab-org/gitlab/-/issues/13506 and https://gitlab.com/gitlab-org/gitlab/-/issues/12599) - [x] etc - [ ] Refactor `type` as a granular type instead of always being `SecurityEvent`, per https://gitlab.com/gitlab-org/gitlab/-/issues/216845#note_341664163 - [ ] Validate that new persistence is working as expected over the course of at least 1 release. - [ ] Introduce JSONB column (needs issue) - [ ] Backfill new persistence columns with historical data (deserialize `details` this one time) and [move remaining values to JSONB column](https://gitlab.com/gitlab-org/gitlab/-/issues/224213) - [ ] Validate that historical and new AuditEvents are working as expected over the course of at least 1 release. - [ ] Delete `details` column ## State of the world ### Record Audit Event #### Service interface 1. There are multiple patterns on how to record audit events * Via `Audit::Changes` mixin * Via Auditor classes in `ee/lib/ee/audit/` * Directly invoke `AuditEventService` via Controller or Service class 1. `AuditEventService` is overloadded with responsibilities 1. Usages of mutable internal states and method missing * Mutate internal state (i.e. `details`) * Method missing (i.e. `for_xxx`) 1. Dual writes of states and events potentially cause data inconsistency 1. Write latency <details><summary>Details</summary> **What if the service fails to store the event?** * Two sources of truth * States and events are stored in same DB. For auditing purpose, it would be better to keep them separated. ![test](/uploads/270c31a496697a018a20d5bd900a106a/test.png) As illustrated above, event can be failed to be logged while the user is already successfully blocked. If we wrap both commands (in blue) in a transaction, does the event log failures significant enough to warrant the user block rollback? **`AuditEventService` class takes on too many responsibilities** * Create event hash * Store event in DB * Write event in file * Check audit eligibility (via `License` features) **`AuditEventService` interface is confusing** * Builder pattern (i.e. return self) * Unclear about `entity` in constructor and method argument (ie. `for_member(...)`). Which one is the change target? * `AuditEventService` has intimate knowledge of the entity it create event for. **Inserting audit events at the time the action occurs introduces extra latency.** The more events we are recorded, the more insert statements we issue to the DB. For example, in the case of user update, as we track every field changes, there are potential up to N number of inserts if the users change all audited fields. </details> ### Query Audit Event 1. Structure of the data is used to determine presentation (via `details` hash and `Audit::Details` class) 1. Database schema does not allow efficient reads on large dataset * Query by scope (instance, group and project-level) * Query by information in `details` hash <details> <summary>Original proposal</summary> From my limited exposure to the code base, the current Audit Event model feels similar to the Event in Event Sourcing pattern. While Event Sourcing persists events and sources states by replaying those events, Audit Event is generated after the states persist. While Audit Event works well with the current state-persistence paradigm, we can still pick up some good practices from Event Sourcing literature - Event is append-only (i.e. should be include only created_at/timestamp, no updated_at) - Event details should be by value, not by reference (i.e. snapshot of the object at the point the event occurred) - Event should be versioned - Event should be traceable to an aggregated root (i.e. the action requested) For more short-medium terms, I suggest we look into some of the backend improvements below: - Introduce individual audit event class (e.g. `UserBlockedEvent`, `UserCreatedEvent`). This can be simple Value objects with some validation logic. - Revisit `AuditEvent` model - Add new fields - `aggregate_id`: tie to user action - `id`: unique identifier for event - `metadata`: version? - `data`: serialized hash - `timestamp`: unix time when the event occurred - Drop `updated_at` - Extract audit eligibility check to a separate class - Investigate more standardise usages of `AuditEvents` </details> <details><summary>Potential interfaces</summary> ```ruby # Option 1 class Users::DestroyService def execute do_something if successful? log_audit_event success else failure end end private def log_audit_event audit_events = [ AuditEvents::UserMigratedToGhostEvent.new(data: {...}), AuditEvents::UserDestroyEvent.new(data: {...}) ] AuditEvents::Repository.save(audit_events) end end ``` ```ruby # Option 2 class Users::DestroyService def execute with_audit_event_logged { do_something } end private def with_audit_event_logged result = yield if result[:success] audit_events = [ AuditEvents::UserMigratedToGhostEvent.new(data: {...}), AuditEvents::UserDestroyEvent.new(data: {...}) ] AuditEvents::Repository.save(audit_events) end result end end ``` </details> ## Related information ### Open discussions <details> I have been pondering on the list of problems and our definition of what success look like (https://about.gitlab.com/direction/manage/audit-events/#problems-to-solve), trying to relate to what technical items we might need to consider. > What success looks like: GitLab should log 100% of user-driven events. These records should be optionally retained for > a long period of time (years), and should be easily queryable in GitLab and in 3rd party tools. - Do we need an intuitive way to document that in code so we can tell apart which ones are user-driven, which ones are not? - A separate data store for audit events as we need different retention period. - Maybe a GraphQL interface or ability to export to other formats? > What success looks like: an administrator should be able to explore a logged event and see related changes and users. > Clicking on the recorded event for a comment in a merge request, for instance, should show more contextual data around the > merge request itself (when it was opened, merged, and who interacted with it). This should also be possible to do via the API. - We need to a away to correlate events related to an action (i.e. soft delete action emits more than one audit events: delete user, migrate contributions to ghost). > What success looks like: an instance should be able to modify and flag user behavior based on audit events. > These restrictions should be highly configurable and range from the simple to the complex. - Rollback action and/or staging action - Hook to a separate service to detect suspicious activities in real time </details> ### Alternative architectures #### Event streaming and CQRS <details> The utopia of Event Driven and Event Sourcing system. This requires a total overhaul and rather is experimented in a smaller, less risky part of GitLab application. This means Audit Event is just a 'user-driven' filter on top of all the events emitted from GitLab. Or we expand the definition of Audit Event to include all events. </details> #### Capture committed changes and stream <details> Capture committed changes from data store and push to audit store/or Kafka stream. - Delta sync (https://netflixtechblog.com/delta-a-data-synchronization-and-enrichment-platform-e82c36a79aee) - Change-Data-Capture (CDC) such as DBLog (https://netflixtechblog.com/dblog-a-generic-change-data-capture-framework-69351fb9099b) - Debezium Connector for PostgreSQL https://debezium.io/documentation/reference/1.0/connectors/postgresql.html </details> ### History The evolution of AuditEvent over time might help explain why it is (or isn't) built certain ways: <details> - 2014-11-19 b722baeec19a: Audit and Security event models. - Commit: https://gitlab.com/gitlab-org/gitlab-foss/commit/b722baeec19a - Notes: Introduced AuditEvent and SecurityEvent - 2014-11-21 b68e1123fb9fb27b63f991670a48a251d5e78e7d: Move event creation to service. - Notes: AuditEventService introduced. - 2015-03-23 08586a616909c7f9efe2210c2b74fd3422d4eb62: Audit log for deploy keys - 2015-07-03 411829fdb5f24f97ce17e05f5fd018d47075b216: Audit log for user authentication - Commit: https://gitlab.com/gitlab-org/gitlab/commit/411829fdb5f2 - MR: https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/931 (References non-existant issue: https://dev.gitlab.org/gitlab/gitlabhq/issues/2318) - Notes: This replaced profile "history" page, which displayed all events, with an "audit_log" page. __Introduces AuditEvent and SecurityEvent... again?!?__ - 2016-02-04 a443b021e39762eb557fb41ccba0e3fd978477c4: Keep track of audit event's author even when he was deleted - 2016-03-19 ed82b8048e49ed8b6570ac6e0a7f5c1238c09d40: Gracefully handle deleted user in audit event - 2017-07-12 d73d4a91f72aa86ae40dc0111c70a94572f6ce91: Only generate Audit Events when licensed - 2017-08-23 8fa87ea3fb862bdae624aec360c80b12cda3905c: Add logging for all web authentication events - Commit: https://gitlab.com/gitlab-org/gitlab-foss/commit/8fa87ea3fb862bdae624aec360c80b12cda3905c - MR: https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/9196 - Issue: none? - Notes: Adds logging alongside AuditEvents </details> ### Subclasses of AuditEventService <details> - ee/app/services/ee/audit_events/protected_branch_audit_event_service.rb: class ProtectedBranchAuditEventService < ::AuditEventService - ee/app/services/ee/audit_events/custom_audit_event_service.rb: class CustomAuditEventService < ::AuditEventService - ee/app/services/ee/audit_events/repository_download_started_audit_event_service.rb: class RepositoryDownloadStartedAuditEventService < CustomAuditEventService - ee/app/services/ee/audit_events/release_created_audit_event_service.rb: class ReleaseCreatedAuditEventService < ReleaseAuditEventService - ee/app/services/ee/audit_events/release_artifacts_downloaded_audit_event_service.rb: class ReleaseArtifactsDownloadedAuditEventService < ReleaseAuditEventService - ee/app/services/ee/audit_events/repository_push_audit_event_service.rb: class RepositoryPushAuditEventService < ::AuditEventService - ee/app/services/ee/audit_events/release_audit_event_service.rb: class ReleaseAuditEventService < ::AuditEventService - ee/app/services/ee/audit_events/release_updated_audit_event_service.rb: class ReleaseUpdatedAuditEventService < ReleaseAuditEventService - ee/app/services/ee/audit_events/impersonation_audit_event_service.rb: class ImpersonationAuditEventService < ::AuditEventService - ee/app/services/ee/audit_events/release_associate_milestone_audit_event_service.rb: class ReleaseAssociateMilestoneAuditEventService < ReleaseAuditEventService </details> ### Usage of AuditEventService <details> AuditEventService - app/controllers/omniauth_callbacks_controller.rb: AuditEventService.new(user, user, options) - app/controllers/sessions_controller.rb: AuditEventService.new(user, user, options) - ee/app/controllers/ee/omniauth_callbacks_controller.rb: ::AuditEventService.new(author, - ee/app/controllers/ee/passwords_controller.rb: ::AuditEventService.new(current_user, - ee/app/controllers/ee/sessions_controller.rb: audit_event_service = ::AuditEventService.new(login, nil, ip_address: request.remote_ip) - ee/app/controllers/smartcard_controller.rb: AuditEventService.new(user, user, options).for_authentication.security_event - ee/app/controllers/groups/omniauth_callbacks_controller.rb: AuditEventService.new(user, @unauthenticated_group, options) - ee/app/services/feature_flags/base_service.rb: ::AuditEventService.new( - ee/app/services/ee/projects/group_links/destroy_service.rb: ::AuditEventService.new( - ee/app/services/ee/projects/group_links/create_service.rb: ::AuditEventService.new( - ee/app/services/ee/projects/destroy_service.rb: ::AuditEventService.new( - ee/app/services/ee/projects/enable_deploy_key_service.rb: AuditEventService.new(current_user, project, options) - ee/app/services/ee/projects/create_service.rb: ::AuditEventService.new( - ee/app/services/ee/projects/disable_deploy_key_service.rb: AuditEventService.new(current_user, project, options) - ee/app/services/ee/emails/base_service.rb: ::AuditEventService.new(@current_user, @user, options) - ee/app/services/ee/deploy_keys/create_service.rb: ::AuditEventService.new(user, project, options) - ee/app/services/ee/groups/destroy_service.rb: ::AuditEventService.new( - ee/app/services/ee/groups/create_service.rb: ::AuditEventService.new( - ee/app/services/ee/users/block_service.rb: ::AuditEventService.new( - ee/app/services/ee/users/destroy_service.rb: ::AuditEventService.new( - ee/app/services/ee/users/create_service.rb: ::AuditEventService.new( - ee/app/services/ee/applications/create_service.rb: ::AuditEventService.new(current_user, - ee/app/services/ee/keys/create_service.rb: ::AuditEventService.new(user, - ee/app/services/ee/members/destroy_service.rb: ::AuditEventService.new( - ee/app/services/ee/members/update_service.rb: ::AuditEventService.new( - ee/app/services/ee/members/create_service.rb: ::AuditEventService.new( - ee/app/services/ee/members/approve_access_request_service.rb: ::AuditEventService.new( - ee/app/services/projects/restore_service.rb: ::AuditEventService.new( - ee/app/services/projects/mark_for_deletion_service.rb: ::AuditEventService.new( - ee/lib/audit/changes.rb: ::AuditEventService.new(@current_user, entity, options) - ee/lib/ee/api/helpers/members_helpers.rb: ::AuditEventService.new( CustomAuditEventService - ee/app/controllers/ee/projects_controller.rb: AuditEvents::CustomAuditEventService.new( - ee/app/services/groups/mark_for_deletion_service.rb: EE::AuditEvents::CustomAuditEventService.new( - ee/app/services/groups/restore_service.rb: EE::AuditEvents::CustomAuditEventService.new( ImpersonationAuditEventService - ee/app/controllers/ee/admin/users_controller.rb: AuditEvents::ImpersonationAuditEventService.new(current_user, request.remote_ip, 'Started Impersonation').for_user(user.username).security_event - ee/app/controllers/ee/application_controller.rb: AuditEvents::ImpersonationAuditEventService.new(impersonator, request.remote_ip, 'Stopped Impersonation').for_user(impersonated_user.username).security_event ProtectedBranchAuditEventService - ee/app/services/ee/protected_branches/loggable.rb: ::EE::AuditEvents::ProtectedBranchAuditEventService.new( Release AuditEventServices - ee/lib/ee/api/releases.rb: EE::AuditEvents::ReleaseCreatedAuditEventService.new( - ee/lib/ee/api/releases.rb: EE::AuditEvents::ReleaseUpdatedAuditEventService.new( - ee/lib/ee/api/releases.rb: EE::AuditEvents::ReleaseAssociateMilestoneAuditEventService.new( Repository AuditEventServices - ee/app/controllers/ee/projects/repositories_controller.rb: AuditEvents::RepositoryDownloadStartedAuditEventService.new( - ee/app/workers/repository_push_audit_event_worker.rb: service = EE::AuditEvents::RepositoryPushAuditEventService - ee/lib/ee/api/helpers.rb: EE::AuditEvents::RepositoryDownloadStartedAuditEventService.new( </details> Analysis of AuditEvents that persist `ip_address`: https://gitlab.com/gitlab-org/gitlab/-/issues/36231#note_334732876
epic