Skip to content

Clean model reflection for code completion

Alexander Chueshev requested to merge ac/clean-model-reflection into main

This MR implements an algorithm that cleans duplicate lines from the code-gecko model response. Examples of model reflection - https://gitlab.com/-/snippets/2589052

The algorithm relies on the tabulated implementation for the Longest Common Subsequence (LCS) problem.

Steps:

  • collect all duplicate lines using the LCS problem implementation. It allows us to collect appropriate lines (line sequences) preserving the same order as in the prompt prefix.
  • group all collected lines. Each group can consist of from 1 to n lines.
  • remove from the model response those groups that contain at least two lines.
  • remove from the model response all groups consisting of 1 line only if they look like a comment

To compare lines, we rely on the exact match metric. I will change the similarity metric with the next MR to support cases when the model counts. We do not clean individual lines that are not comments since these lines can contain keywords such as else, if, etc. At the same time, we can more confidently clean two or more lines following one another.

Ref: #273 (closed)

Edited by Alexander Chueshev

Merge request reports

Loading