Skip to content

Reduce false positives when cleaning model reflection

Alexander Chueshev requested to merge ac/enhance-model-reflection-cleanup into main

What does this merge request do and why?

This MR further improves the model reflection cleanup logic by additionally:

  • increasing the min size of the candidate block up to 5 consequent lines
  • checking the percentage of special characters in the candidate block (skip the blocks with value >= 0.25)
  • checking lexical diversity of the candidate blocks (skip the blocks with value >= 0.35)

The intuition behind this logic: we don't need to trim the end of blocks for the C-like languages as we do now.

Low thresholds for special characters and lexical diversity reduce the likelihood of block deduplication.

Special characters we consider with this MR:

_SPECIAL_CHARS = "()[];.,$%&^*@#!{}//"

How to set up and validate locally

Example:

Input:

 // This code has a filename of test.js and is written in JavaScript.
import testing;

const newFunctionForValidatingEmail = (email) => {
  return emailRegex.test(email);
}

// For the mask XYZ
const writeStringBackwards = (inpStr) => {
  let outStr = '';
  for (let i = inpStr.length - 1; i >= 0; i--) {
    outStr += inpStr[i];
  }
  return outSt

Output with the existing deployed logic:

r;

const maskXYZ = (inpStr) => {
  let outStr = '';

Correct completion improved by this MR:

r;
}

const maskXYZ = (inpStr) => {
  let outStr = '';

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.

Closes !395 (merged)

Edited by Alexander Chueshev

Merge request reports