Improve query generated for counts.projects_jira_issuelist_active Service ping metric
Summary
Improve the metric query for counts.projects_jira_issuelist_active
This metric generates the query with more columns selected than we need.
Why?
More context https://gitlab.slack.com/archives/CL3A7GFPF/p1635510724114100
These queries are used by the data team to generate service ping in DWH
Details
This is happening only when we generate the queries using UsageDataQueries
"projects_jira_issuelist_active": "SELECT COUNT(\"integrations\".\"id\"), \"integrations\".\"id\" AS t0_r0, \"jira_tracker_data\".\"id\" AS t1_r0, \"jira_tracker_data\".\"service_id\" AS t1_r1, \"jira_tracker_data\".\"created_at\" AS t1_r2, \"jira_tracker_data\".\"updated_at\" AS t1_r3, \"jira_tracker_data\".\"encrypted_url\" AS t1_r4, \"jira_tracker_data\".\"encrypted_url_iv\" AS t1_r5, \"jira_tracker_data\".\"encrypted_api_url\" AS t1_r6, \"jira_tracker_data\".\"encrypted_api_url_iv\" AS t1_r7, \"jira_tracker_data\".\"encrypted_username\" AS t1_r8, \"jira_tracker_data\".\"encrypted_username_iv\" AS t1_r9, \"jira_tracker_data\".\"encrypted_password\" AS t1_r10, \"jira_tracker_data\".\"encrypted_password_iv\" AS t1_r11, \"jira_tracker_data\".\"jira_issue_transition_id\" AS t1_r12, \"jira_tracker_data\".\"project_key\" AS t1_r13, \"jira_tracker_data\".\"issues_enabled\" AS t1_r14, \"jira_tracker_data\".\"deployment_type\" AS t1_r15, \"jira_tracker_data\".\"vulnerabilities_issuetype\" AS t1_r16, \"jira_tracker_data\".\"vulnerabilities_enabled\" AS t1_r17, \"jira_tracker_data\".\"jira_issue_transition_automatic\" AS t1_r18 FROM \"integrations\" LEFT OUTER JOIN \"jira_tracker_data\" ON \"jira_tracker_data\".\"service_id\" = \"integrations\".\"id\" WHERE \"integrations\".\"type\" = 'JiraService' AND \"integrations\".\"active\" = TRUE AND \"jira_tracker_data\".\"issues_enabled\" = TRUE",
We select more columns that are needed for this metric when we generated the metrics queries.
Implementation
count(::Integrations::Jira.active.includes(:jira_tracker_data).where(jira_tracker_data: { issues_enabled: true }), start: min_id, finish: max_id)
When we include jira_tracker_data
we select all columns in the related table.
Proposed solutions
- We should select only the columns which are used and for this, we could use
joins
instead ofinclude
-
includes
is generating a left outer join -
joins
is generating an inner join
-
Overwrite the
projects_jira_issuelist_active
inUsageDataQueries
with a simplified query -
Add the related instrumentation class for this metric and overwrite
to_sql
method with the simplified query
Make sure the results are not affected