Skip to content

Fix avg_cycle_analytics giving an uncaught error

I've identified (possibly the only) root cause of the issue & it's the cycle analytics counter which fails. I've opened an issue to fix it.

  1. Lack of index
  2. Lack of error catching

Step A: First attempt to calculate and send! UTC+3 11:07:50 UTC 👎 avg_cycle_analytics counter fails.

11 hours 18 minutes

--- Gitlab::UsageData.to_json(force_refresh: true)
--- 'ea8bf810-1d6f-4a6a-b4fd-93e8cbd8b57f'
--SubmitUsagePingService.new.execute

[ gprd ] production> Time.now
=> 2020-02-28 14:07:55 +0000
[ gprd ] production> SubmitUsagePingService.new.execute

Time.now
unknown OID 28: failed to recognize type of 'relfrozenxid'. It will be treated as String.
unknown OID 1034: failed to recognize type of 'relacl'. It will be treated as String.
Traceback (most recent call last):
       16: from app/services/submit_usage_ping_service.rb:22:in `execute'
       15: from lib/gitlab/usage_data.rb:24:in `to_json'
       14: from lib/gitlab/usage_data.rb:10:in `data'
       13: from lib/gitlab/usage_data.rb:11:in `block in data'
       12: from ee/lib/ee/gitlab/usage_data.rb:26:in `uncached_data'
       11: from lib/gitlab/usage_data.rb:20:in `uncached_data'
       10: from lib/gitlab/usage_data.rb:124:in `cycle_analytics_usage_data'
        9: from lib/gitlab/cycle_analytics/usage_data.rb:19:in `to_json'
        8: from lib/gitlab/cycle_analytics/usage_data.rb:34:in `medians_per_stage'
        7: from lib/gitlab/cycle_analytics/usage_data.rb:34:in `each_with_object'
        6: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:41:in `select_all'
        5: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:62:in `read_using_load_balancer'
        4: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:83:in `read_write'
        3: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:134:in `retry_with_backoff'
        2: from ee/lib/gitlab/database/load_balancing/load_balancer.rb:84:in `block in read_write'
        1: from ee/lib/gitlab/database/load_balancing/connection_proxy.rb:63:in `block in read_using_load_balancer'
ActiveRecord::QueryCanceled (PG::QueryCanceled: ERROR:  canceling statement due to statement timeout)
[ gprd ] production>
[ gprd ] production> Time.now
=> 2020-02-28 22:26:16 +0000

Step 2 Detailed analysis

 Gitlab::CycleAnalytics::UsageData.new.to_json

=> {:avg_cycle_analytics=>
  {:issue=>{:average=>nil, :sd=>0, :missing=>10},
   :plan=>{:average=>nil, :sd=>0, :missing=>10},
   :code=>{:average=>nil, :sd=>0, :missing=>10},
   :test=>{:average=>nil, :sd=>0, :missing=>10},
   :review=>{:average=>nil, :sd=>0, :missing=>10},
   :staging=>{:average=>nil, :sd=>0, :missing=>10},
   :production=>{:average=>nil, :sd=>0, :missing=>10},
   :total=>70}}




SELECT "projects".*
FROM "projects"
ORDER BY GREATEST(COALESCE(last_activity_at, '1970-01-01'), COALESCE(last_repository_updated_at, '1970-01-01')) DESC
LIMIT 10


 Limit  (cost=2186519.46..2186519.49 rows=10 width=756) (actual time=86629.616..86629.619 rows=10 loops=1)
   Buffers: shared hit=3 read=441744 dirtied=13277 written=12305
   I/O Timings: read=53713.707 write=451.542
   ->  Sort  (cost=2186519.46..2217269.09 rows=12299850 width=756) (actual time=86629.615..86629.617 rows=10 loops=1)
         Sort Key: (GREATEST(COALESCE(projects.last_activity_at, '1970-01-01 00:00:00+00'::timestamp with time zone), (COALESCE(projects.last_repository_updated_at, '1970-01-01 00:00:00'::timestamp without time zone))::timestamp with time zone)) DESC
         Sort Method: top-N heapsort  Memory: 29kB
         Buffers: shared hit=3 read=441744 dirtied=13277 written=12305
         I/O Timings: read=53713.707 write=451.542
         ->  Seq Scan on public.projects  (cost=0.00..1920724.12 rows=12299850 width=756) (actual time=4.082..70259.121 rows=12432632 loops=1)
               Buffers: shared read=441744 dirtied=13277 written=12305
               I/O Timings: read=53713.707 write=451.542

Time: 1.444 min
  - planning: 0.467 ms
  - execution: 1.444 min
    - I/O read: 53.714 s
    - I/O write: 451.542 ms

Shared buffers:
  - hits: 3 (~24.00 KiB) from the buffer pool
  - reads: 441744 (~3.40 GiB) from the OS file cache, including disk I/O
  - dirtied: 13277 (~103.70 MiB)
  - writes: 12305 (~96.10 MiB)

https://explain.depesz.com/s/p8yO

Edited by Alper Akgun