Reduce false positives when cleaning model reflection
What does this merge request do and why?
This MR further improves the model reflection cleanup logic by additionally:
- increasing the min size of the candidate block up to
5
consequent lines - checking the percentage of special characters in the candidate block (skip the blocks with value >= 0.25)
- checking lexical diversity of the candidate blocks (skip the blocks with value >= 0.35)
The intuition behind this logic: we don't need to trim the end of blocks for the C-like languages as we do now.
Low thresholds for special characters and lexical diversity reduce the likelihood of block deduplication.
Special characters we consider with this MR:
_SPECIAL_CHARS = "()[];.,$%&^*@#!{}//"
How to set up and validate locally
Example:
Input:
// This code has a filename of test.js and is written in JavaScript.
import testing;
const newFunctionForValidatingEmail = (email) => {
return emailRegex.test(email);
}
// For the mask XYZ
const writeStringBackwards = (inpStr) => {
let outStr = '';
for (let i = inpStr.length - 1; i >= 0; i--) {
outStr += inpStr[i];
}
return outSt
Output with the existing deployed logic:
r;
const maskXYZ = (inpStr) => {
let outStr = '';
Correct completion improved by this MR:
r;
}
const maskXYZ = (inpStr) => {
let outStr = '';
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Closes !395 (merged)
Edited by Alexander Chueshev