Reduce false positives when cleaning model reflection (!395) · Merge requests · GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

Alexander Chueshev requested to merge ac/enhance-model-reflection-cleanup into main Sep 19, 2023

What does this merge request do and why?

This MR further improves the model reflection cleanup logic by additionally:

increasing the min size of the candidate block up to 5 consequent lines
checking the percentage of special characters in the candidate block (skip the blocks with value >= 0.25)
checking lexical diversity of the candidate blocks (skip the blocks with value >= 0.35)

The intuition behind this logic: we don't need to trim the end of blocks for the C-like languages as we do now.

Low thresholds for special characters and lexical diversity reduce the likelihood of block deduplication.

Special characters we consider with this MR:

_SPECIAL_CHARS = "()[];.,$%&^*@#!{}//"

How to set up and validate locally

Example:

Input:

 // This code has a filename of test.js and is written in JavaScript.
import testing;

const newFunctionForValidatingEmail = (email) => {
  return emailRegex.test(email);
}

// For the mask XYZ
const writeStringBackwards = (inpStr) => {
  let outStr = '';
  for (let i = inpStr.length - 1; i >= 0; i--) {
    outStr += inpStr[i];
  }
  return outSt

Output with the existing deployed logic:

r;

const maskXYZ = (inpStr) => {
  let outStr = '';

Correct completion improved by this MR:

r;
}

const maskXYZ = (inpStr) => {
  let outStr = '';

Merge request checklist

Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Closes !395 (merged)

Edited Sep 21, 2023 by Alexander Chueshev

Reduce false positives when cleaning model reflection

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports