Database Group Triage for week ending 2025-10-24
About
This issue is used by groupdatabase frameworks to triage issues and make sure they get properly assigned and prioritized. Each week, a bot will look up the old issue, pick the next assignee in the list, and submit a new issue with a list of any issues that may need attention from the team.
Process
-
Review any issues identified for triage -
Post any questions or pressing issues to the database group meeting doc
Issues needing triage (Labeled databasetriage)
For each issue below:
- If the issue needs further investigation, spend up to 1 hour of investigating or fixing the issue.
- If the issue is a typebug, assign it one of severity1, severity2, severity3, or severity4
- Document any findings you make in a comment on the issue, and if the issue still needs additional work or refinement, consider looping in
@alexivesand@to help with scheduling and priority. - If the issue is incomplete, labeled ~"workflow::scheduled", and will take more than an hour to fix, remove databasetriage
Bugs needing Severity
For each issue below:
- For each typebug, spend up to 1 hour investigating or fixing the issue.
- Assign it one of severity1, severity2, severity3, or severity4
- Document any findings you make in a comment on the issue, and if the issue still needs additional work or refinement, consider looping in
@alexivesand@to help with scheduling and priority.
-
Skip upgrade from 18.3.5 to 18.5.1 does not create the work_item_descriptions tables -
Migration: PG::UndefinedTable: ERROR: relation "bigint_idx_9e3cffd9404ea9edfaac" does not exist
Recent issues labeled database
For each issue below:
- If the issue has no
grouplabel, consider if it should be addressed by groupdatabase frameworks and if so label it. - If the issue has a group, and you think they may need assistance from us:
- If the issue needs further investigation, add databasetriage and spend up to 1 hour of investigating the issue.
- Document any findings you make in a comment on the issue, and if the issue still needs additional work or refinement, consider looping in
@alexivesand/or@
-
groupci platform Implement partition pruning for all CI sql queries with over 1000 calls/sec per replica -
groupagent foundations Collect all codebase locations that need to consider DAP role-based permissions -
groupauthentication Step-up auth: Enforce visibility constraints - only allow for private groups/namespaces -
grouporganizations Optimization: Use left join instead of exists when filtering for authorized projects -
groupproject management Old Upload links break when migrating to object storage with rake task -
groupsecurity infrastructure Feature Request: Treat vulnerabilities as issues with editable fields -
grouporganizations Fully shard protected_branch_push_access_levels -
groupdesign system Remove user_preferences.early_access_studio_participantcolumn after studio rollout -
~"group::not_owned" reusing branch name in merge requests increases instance load -
~"group::not_owned" Migration from 18.4 to 18.5 -
grouporganizations Fully shard identities -
~"group::not_owned" [kg] Handle the "diff" of data between the most recent snapshot and the KG -
groupbuild Mattermost restore fails for Gitlab 18.5.0-ce.0 Omnibus -
groupsecurity infrastructure Patch Dependency Export to be resillient against invalid record matches -
groupsecurity policies Policy change history outside of git -
groupoperate Run Registry Schema Migrations in Prefer Mode -
groupcontainer registry Run Registry Schema Migrations in Prefer Mode -
groupcontainer registry Drop old smallint columns -
groupcontainer registry Using Lock retry mechanism rename media_types.id_convert_to_bigint to media_types.id -
~"group::not_owned" [db] KG Database Selection - AWS Neptune
Recent mentions of @gitlab-org/database-team
For each mention below:
- If the item already has an adequate response, move on to the next
- If the mention is from someone looking to provide feedback on database review, and nobody has responded yet, set up a coffee chat with them and make a note in the thread that you did. Record feedback in the feedback issue.
- If you know how to respond to the comment, post a response
- If you don't know how, redirect with a specific mention to someone who may be able to respond
-
@alexivesmentioned triage on Update templates for database help
Customer Issue Hand-offs
For each issue below:
- If the item already has a back and forth, check in with the @morefice to see if what needs to be handed off
- Consider if it's right for our team and if not, ask the support rep to follow up with the correct team.
- If the item is right for our team, spend some time accessing it and trying to assist.
- If you need help, and the request seems pressing, ask in the team channel if there's someone who can help dig into it.
-
Customer identified query performance that can be improve - looking to work with the team to improve this on the application~"Help group::Database Frameworks", ~"RFH-Lifecycle::Last comment from Development", devopsdeploy, groupenvironments, sectioncd, severity3 -
Some backups fail to restore with cannot attach indexerrors~"Help group::Database Frameworks", ~"RFH-Lifecycle::Last comment from Support", severity1 -
Request for Help - Traffic not balanced correctly to R/O DB replicas~"Help group::Database Frameworks", ~"RFH-Lifecycle::Pending Closure", SupportAssigned Support Engineer, severity2
Review Current Saturation Report
DRAFT: Database Capacity Report for week ending 2025-10-26
Please update this week's database capacity report! Remember:
- You can always reach out to the team for support
- Consider reaching out in #g_database_operations for feedback too
Process:
-
Review and update Summary section. -
Review and update Concerns section](https://gitlab.com/gitlab-org/database-team/capacity-projections/-/issues/34#concerns). -
Remove DRAFT from title. -
Ping @alexivesin a comment saying the update is complete. -
Close past report issues.
Review Top Queries for Changes
There were some new anomalous queries to review on the main database, broken down by metric type:
| Metric | # of queries |
|---|---|
| by_total_time | 0 |
| by_time_avg | 0 |
Please find the detailed query report here (Ops access required.)
For each query listed:
- Spend up to 30 minutes trying to understand its source.
- If needed, determine the team that owns the query and file an issue with them. A good place to start would be to check the relevant table's owner under
db/docs.
For additional context around how these queries were determined to be anomalous as well as historical rankings for known queries, all of the data we have collected lives in a database dump stored in artifacts in the query-stats-reporting repo on ops. Specifically check query-stats.yml for an example of how you can use this dump to locally rehydrate this database on your own Postgres instance and analyze the collected statistics. For this iteration of this report, queries are grouped by fingerprint.
Legacy query analysis
It is not necessary to look at this unless we need to check the new anomalous query report against the top query report on the main database.
Click to expand
For each database:
- Are there new queries in the top queries (See: K003 Top-15 Queries by total_time) compared to a previous report?
- If there are new queries, Spend up to 30 minutes trying to understand their sources
- If there are new queries, file an issue and assign to the team that owns the query, or if unable to source, then the team that owns the table
- If there are no new queries, review the top 5 queries for each to see if there are already investigations, or file issues to investigate them
-
Review recent Primary Checkup for new Top Queries (K003) -
Review recent CI Checkup for new Top Queries (K003)
Review Dashboards on queries with large In-Lists
For both Sidekiq and Rails:
- If the report is empty, no action is needed
- For each item on the report:
- Determine what feature the query belongs to
- Create an issue for the team owning the feature category asking them to limit the maximum number of items in the in query
Int4 Saturation Review
For issue(s) linked below:
-
Ensure each table referenced in the report has an associated issue with ~"Gitlab.com Resource Saturation" and infradev attached
-
Make sure the issue is assigned to a team.
-
All tables approaching saturation in Capacity warning for patroni: pg_int4_id.column_name="merge_requests.latest_merge_request_diff_id" have an issue assigned to a team. -
All tables approaching saturation in Capacity warning for patroni: pg_int4_id.column_name="merge_request_diff_files.merge_request_diff_id" have an issue assigned to a team. -
All tables approaching saturation in Capacity warning for patroni: pg_int4_id.column_name="deployments.id" have an issue assigned to a team. -
All tables approaching saturation in Capacity warning for patroni: pg_int4_id.column_name="merge_request_diff_commits.merge_request_diff_id" have an issue assigned to a team.
-
@alexives if @praba.m7n isn't available this week, please reassign to @stomlinson
This is generated by this project.