Return the last coverage in trace stream
What does this MR do?
This MR fixes the coverage issue raised at https://gitlab.com/gitlab-org/gitlab-ce/issues/31556. Currently, the first coverage appears in the trace will be considered as the coverage. Since general total scores will likely appear at the last part of logs, coverage extraction should be targeting the last matched part.
Are there points in the code the reviewer needs to double check?
Need to be checked by @godfat
Why was this MR needed?
Because #31556 (closed) labeled as regression
Screenshots (if relevant)
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added - Tests
-
Added for this feature/bug -
All builds are passing
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Branch has no merge conflicts with master
(if it does - rebase it please) -
Squashed related commits together
What are the relevant issue numbers?
Closes #31556 (closed)
Merge request reports
Activity
@godfat Feel free to ignore, if you've already working on it
Reproduced at https://gitlab.com/dosuken123/gitlab-ce/builds/15703125 with https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/11128/diffs?diff_id=4242643
Failures: 1) Gitlab::Ci::Trace::Stream#extract_coverage multiple results in content & regex returns the last matched coverage Failure/Error: is_expected.to eq("98.29") expected: "98.29" got: "98.39" (compared using ==) # ./spec/lib/gitlab/ci/trace/stream_spec.rb:253:in `block (4 levels) in <top (required)>' Finished in 31 minutes 27 seconds (files took 32.94 seconds to load) 1358 examples, 1 failure Failed examples: rspec ./spec/lib/gitlab/ci/trace/stream_spec.rb:252 # Gitlab::Ci::Trace::Stream#extract_coverage multiple results in content & regex returns the last matched coverage
Performance improvement
reverse_line
[22] pry(main)> puts Benchmark.measure { stream.extract_coverage('bdajlshaksdhal') } 0.010000 0.010000 0.020000 ( 0.017088) => nil
stream.each_line
[2] pry(main)> puts Benchmark.measure { stream.extract_coverage('bdajlshaksdhal') } 0.170000 0.070000 0.240000 ( 0.244303) => nil
Filesize: 3.4M
I'm thinking to add more tests as https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10736/diffs and close the MR...
added 303 commits
-
d8c4a24c...6ad3814e - 300 commits from branch
gitlab-org:master
- 9ae36e9f - Reproduced
- 6abf2750 - Add reverse_line
- 7b421ca5 - Fix while true
Toggle commit list-
d8c4a24c...6ad3814e - 300 commits from branch
added 303 commits
-
d8c4a24c...6ad3814e - 300 commits from branch
gitlab-org:master
- 9ae36e9f - Reproduced
- 6abf2750 - Add reverse_line
- 7b421ca5 - Fix while true
Toggle commit list-
d8c4a24c...6ad3814e - 300 commits from branch
@godfat Reproduced
and fixed pipeline passedHowever, I'd like to ask you to advice for more improvements. I've implemented a new function
reverse_line
, though, the actual process is fetching chunks (Not line!) of codes byBUFFER_SIZE
and extracting coverage bymatches.flatten.last
. I think this would be faster than iterating each line, though, I'm unconfident whether this approach is the best.- Resolved by Shinya Maeda
changed milestone to %9.2
added 71 commits
-
7b421ca5...f59a44db - 67 commits from branch
gitlab-org:master
- 7551c572 - Reproduced
- 912b44c9 - Add reverse_line
- 90e49fad - Fix while true
- 33e597a7 - Fix reverse_line from chunk based
Toggle commit list-
7b421ca5...f59a44db - 67 commits from branch
marked the checklist item Changelog entry added, if necessary as completed
marked the checklist item Conform by the merge request performance guides as completed
marked the checklist item Conform by the style guides as completed
@ayufan Could you review again? Thank you!
- Resolved by Lin Jen-Shin
@dosuken123 Sorry for the delay. I was off for last week. I think we could somehow extend from
read_last_lines
instead of doing this over again. Something like:def read_last_lines(last_lines) to_enum(:reverse_line, last_lines).reverse_each.to_a.join end def reverse_line(last_lines) # ... some algorithm from previous `read_last_lines` end
Edited by Lin Jen-Shin@dosuken123 I tried it here: !11381 (closed) This is probably still a bit complicated though. /cc @ayufan
mentioned in merge request !11381 (closed)
I think we could somehow extend from
read_last_lines
instead of doing this over againYes, I was thinking the same thing, however, I hesitated because extracting coverage doesn't need
last_lines
. I think eliminating unnecessary comparisons would make it faster. If we're implementing general/utilityreverse_line
method, I 100% endorse your MR.My MR was considered as "complicated", but In my honest opinion, I think it's ok that the low-level implementation becomes a little bit tricky appearance
- Resolved by Shinya Maeda
@godfat Anyway, I found another bug.
stream.read(BUFFER_SIZE - (pos - max))
https://gitlab.com/gitlab-org/gitlab-ce/blob/master/lib/gitlab/ci/trace/stream.rb#L109
This should be
stream.read(BUFFER_SIZE + (pos - max))
. Because(pos - max)
is negative diff.P.S. I thought the line is a little bit complex so I re-organized in this MR.
added 456 commits
Toggle commit list- Resolved by Shinya Maeda
added 85 commits
-
0caf29a6...6a2bcb4b - 78 commits from branch
gitlab-org:master
- 0eb597f6 - Reproduced
- 95332b8f - Add reverse_line
- 06126de6 - Fix while true
- fd3bee87 - Fix reverse_line from chunk based
- fd9ea4e9 - Add changelog
- a6d9c5a7 - Use each_line. Avoid comparison of partial. Add UTF-8 spec.
- b38a550a - Use force_encoding when regex contains UTF-8 char
Toggle commit list-
0caf29a6...6a2bcb4b - 78 commits from branch
@godfat I think we've discussed enough the possibility of bugs. The last thing is the code's appearance.
We have two options.
- Combine
reverse_line
andread_last_lines
(Like your MR) - Optimize
reverse_line
for extracting coverage (Using each chunk instead of each line, change method name)
What's your vote?
- Combine
- Resolved by Shinya Maeda
- Resolved by Lin Jen-Shin
@dosuken123 Thank you so much for working on this. I think I would personally lean more toward to have one single algorithm (i.e. (1)), but it could just be me, so I might not really give my vote to that yet.
I don't really care too much about the performance here, because we're doing
each_line
before, which could be slow anyway. If we're not making it significant slower, it's probably fine. So just merge this as is without doing 1 or 2 is fine for me, too.I would be curious about the performance though, of course. More information is always better :P Just that it might not make an impact on production, yet.
Good point about that the string could be as long as
BUFFER_SIZE
*2, considering that there's an extremely huge line. It's probably already the case though. Matching a huge string with regexp won't be an issue though, as we should match bottom up with\z
. I am more worried about constructing a huge string in Ruby, which is memory consuming.
added 2 commits
- Resolved by Shinya Maeda
- Resolved by Lin Jen-Shin
@dosuken123 Awesome. So I think the only thing left would be trying to merge the two loops into one, and I think that would be the ideal. If not, I am also fine with that.
- Resolved by Shinya Maeda
- Resolved by Shinya Maeda
- Resolved by Lin Jen-Shin