Statistics artifacts_size is broken
Recent changes in statistics have broken artifacts_size counters.
We have started seeing really huge and also negative numbers on our local instance. Example:
"statistics":{
"commit_count":256,
"storage_size":11204154550,
"repository_size":18800967,
"lfs_objects_size":0,
"job_artifacts_size":-115028208
}
Following gitlab-rails
script scanned our instance and has found
1097 non-matching counts and 28 negative numbers and
that out of ~5.5k projects with builds, which is ~20% with wrong statistics.
If we would count only those building really often, the rate is much higher.
::Project.with_statistics().find_each do |project|
size_real = Ci::JobArtifact.artifacts_size_for(project).to_i
size_old = project.builds.sum(:artifacts_size).to_i # should be 0 after migrations
size_now = project.statistics.build_artifacts_size
puts "Project #{project.id}: #{project.path_with_namespace} -> real = #{size_real}; old = #{size_old}; now = #{size_now}"
puts " ERROR - OLD SIZE" if size_old != 0
puts " ERROR - SIZE MISMATCH" if size_real + size_old != size_now
puts " ERROR - NEGATIVE VALUE" if size_now < 0
end
During analysis of recent gitlab-ce code changes, I was only able to identify 1 problem
-
updating artifacts_size
does not triggerstorage_size
update, sostorage_size
will be lower than sum of components if CI is triggered; -> MR is available: gitlab-ce!20697
One noticeable thing it that no project created after migration 10.7.3 -> 10.8.2 shows the problem (yet?). So it is possible that the code is good now, just stats are broken due to legacy stuff or bugs in previous releases and they are not able to self-heal because the new code just adds/removes bytes and never recalculates the value.
-
fix wrong counting of-> assuming this is legacy problemjob_artifacts_size
-
implement migration script to recalculate artifact sizes -
document how to forcefully recalculate these sizes if they get wrong again (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/20697#note_91526778)
Investigations took longer than I'd like to also due to following issue:
-
jobs API provides only artifacts_file.size
, however job_artifacts_size counts all artifact types (artifact, trace, metadata) -> MR is available: gitlab-ce!20821 -
GUI does not show (unless you open each single job) if the trace is available and does not provide any deletion possibility -> MR is available: gitlab-ce!18707
So the user is not able to see which job is consuming the space and even admin has problems finding that out. With 100k jobs in a single project, I have to do hell of a scripting to clean up while being blind at the same time (not knowing if the job being processed actually has trace or not).
And finally the reason why we started the investigations:
After migration to 11.0.1 storage consumption statistics of some projects have sky-rocketed from <10G to >30G.
We were able to identify that this was due to legacy trace archiving
(manual, since automatic migrations have failed, see gitlab-com/infrastructure#4377).
Suddenly the traces count towards storage consumption stats and with >100k traces within a project it's huge increase.
It good to count these, however it just added too much in a single shot.
Most of our projects have the traces much larger than artifacts:
- people love quick analysis of failed builds so maximum logging is usually enabled
- artifacts self-delete themselves after configured time
- artifacts (usually binaries) are compressed while traces (clear text, the best format for compression) not
We think that following two points make the life easier for both admin and user:
-
set expiration of traces-> trace management as mentioned above is enough -
compress traces the same way as artifacts-> extracted to gitlab-ce#50263
And finally - why are there still legacy artifacts and metadata? Traces were migrated in 11.0, shouldn't the other types be migrated now, too? These could contribute to the errors in counting...
-
migrate legacy artifacts and metadata-> handled under gitlab-ce#46652/gitlab-ce!18615
Relevant MRs I have found for recent changes:
- gitlab-ce!14367 - add NEW traces and metadata to statistics, but do NOT show these anywhere
- gitlab-ce!16539 - bugfix for artifacts_size calculation (increment/decrement cannot fix these)
- gitlab-ce!17839 - artifacts size is not fully refreshed anymore, just incremented/decremented
Feel free to split this issue to smaller pieces reporting/fixing each problem/proposal separately.
Proposal
Makebuild_artifacts_size
use efficient counters.- Enable
efficient_counter_attribute
feature flag and monitor it - Fix decrement of
build_artifact_size
when artifacts are deleted: #224151 (closed) - Write a script that can forcefully recalculate all artifacts size (e.g. a rake task that iterates over projects) and document how to use it.