Fix avg_cycle_analytics giving an uncaught error
I've identified (possibly the only) root cause of the issue & it's the cycle analytics counter which fails. I've opened an issue to fix it.
- Lack of index
- Lack of error catching
👎 avg_cycle_analytics
counter fails.
Step A: First attempt to calculate and send! UTC+3 11:07:50 UTC 11 hours 18 minutes
--- Gitlab::UsageData.to_json(force_refresh: true)
--- 'ea8bf810-1d6f-4a6a-b4fd-93e8cbd8b57f'
--SubmitUsagePingService.new.execute
[ gprd ] production> Time.now
=> 2020-02-28 14:07:55 +0000
[ gprd ] production> SubmitUsagePingService.new.execute
Time.now
unknown OID 28: failed to recognize type of 'relfrozenxid'. It will be treated as String.
unknown OID 1034: failed to recognize type of 'relacl'. It will be treated as String.
Traceback (most recent call last):
16: from app/services/submit_usage_ping_service.rb:22:in `execute'
15: from lib/gitlab/usage_data.rb:24:in `to_json'
14: from lib/gitlab/usage_data.rb:10:in `data'
13: from lib/gitlab/usage_data.rb:11:in `block in data'
12: from ee/lib/ee/gitlab/usage_data.rb:26:in `uncached_data'
11: from lib/gitlab/usage_data.rb:20:in `uncached_data'
10: from lib/gitlab/usage_data.rb:124:in `cycle_analytics_usage_data'
9: from lib/gitlab/cycle_analytics/usage_data.rb:19:in `to_json'
8: from lib/gitlab/cycle_analytics/usage_data.rb:34:in `medians_per_stage'
7: from lib/gitlab/cycle_analytics/usage_data.rb:34:in `each_with_object'
6: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:41:in `select_all'
5: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:62:in `read_using_load_balancer'
4: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:83:in `read_write'
3: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:134:in `retry_with_backoff'
2: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:84:in `block in read_write'
1: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:63:in `block in read_using_load_balancer'
ActiveRecord::QueryCanceled (PG::QueryCanceled: ERROR: canceling statement due to statement timeout)
[ gprd ] production>
[ gprd ] production> Time.now
=> 2020-02-28 22:26:16 +0000
Step 2 Detailed analysis
Gitlab::CycleAnalytics::UsageData.new.to_json
=> {:avg_cycle_analytics=>
{:issue=>{:average=>nil, :sd=>0, :missing=>10},
:plan=>{:average=>nil, :sd=>0, :missing=>10},
:code=>{:average=>nil, :sd=>0, :missing=>10},
:test=>{:average=>nil, :sd=>0, :missing=>10},
:review=>{:average=>nil, :sd=>0, :missing=>10},
:staging=>{:average=>nil, :sd=>0, :missing=>10},
:production=>{:average=>nil, :sd=>0, :missing=>10},
:total=>70}}
SELECT "projects".*
FROM "projects"
ORDER BY GREATEST(COALESCE(last_activity_at, '1970-01-01'), COALESCE(last_repository_updated_at, '1970-01-01')) DESC
LIMIT 10
Limit (cost=2186519.46..2186519.49 rows=10 width=756) (actual time=86629.616..86629.619 rows=10 loops=1)
Buffers: shared hit=3 read=441744 dirtied=13277 written=12305
I/O Timings: read=53713.707 write=451.542
-> Sort (cost=2186519.46..2217269.09 rows=12299850 width=756) (actual time=86629.615..86629.617 rows=10 loops=1)
Sort Key: (GREATEST(COALESCE(projects.last_activity_at, '1970-01-01 00:00:00+00'::timestamp with time zone), (COALESCE(projects.last_repository_updated_at, '1970-01-01 00:00:00'::timestamp without time zone))::timestamp with time zone)) DESC
Sort Method: top-N heapsort Memory: 29kB
Buffers: shared hit=3 read=441744 dirtied=13277 written=12305
I/O Timings: read=53713.707 write=451.542
-> Seq Scan on public.projects (cost=0.00..1920724.12 rows=12299850 width=756) (actual time=4.082..70259.121 rows=12432632 loops=1)
Buffers: shared read=441744 dirtied=13277 written=12305
I/O Timings: read=53713.707 write=451.542
Time: 1.444 min
- planning: 0.467 ms
- execution: 1.444 min
- I/O read: 53.714 s
- I/O write: 451.542 ms
Shared buffers:
- hits: 3 (~24.00 KiB) from the buffer pool
- reads: 441744 (~3.40 GiB) from the OS file cache, including disk I/O
- dirtied: 13277 (~103.70 MiB)
- writes: 12305 (~96.10 MiB)
Edited by Alper Akgun