Implement package tracking subsystem
With the need to track more metrics for packages (#205578 (closed)), we need to introduce a way to track individual events in order to be able to aggregate it and report back to usage ping.
It seems like the existing Event
table can't be used due to the way it's currently used for aggregating data (#205578 (comment 399789668)) and that is used for tracking users events for gitlab.com/:user
page.
I've refactored how package events are published (!41709 (merged)) in preparation for this proposal.
Proposal
Introduce a new package_events
table in order to track events specific to packages.
Table structure:
Column | type | Purpose |
---|---|---|
event_type | enum | The event type that occurred (ex: package_published/package_pulled/package_deleted |
originator_id | integer | The id of who originated the action (ex: user_id or deploy_token_id) |
event_scope | integer/enum | The scope of the event (either a package type: composer, nuget, etc), or could also be container or tag , etc |
originator_type | integer/enum | The originator_id type (ex: user=0, deploy_token=1) |
package_id | integer | the package_id the event refers to, it might come in hand to relate the event back to the package |
Caveats
Although technically we don't use polymorphic associations (https://docs.gitlab.com/ee/development/polymorphic_associations.html), here we don't really need to access the originator as we only need it for distinct counts of the different types (user vs deploy token). So I believe having both on the same table simplifies the structure in contrast to having 2 separate tables for this.
Questions
- How long do we need to keep this data for? I believe we don't want this table growing indefinitely. What about a job that clean up data older than X days (example: 30 days).
TODOS
-
Create the table and indexes -
Add clean up job -
Add tracking code -
Aggregate data and return to usage ping