post-process Strategy Group Snippets by File

What does this MR do and why?

issue: #579852 (closed)

reference: !219658 (merged) gitlab-org#19745

ff: #587194

multiple snippets in a same logic group can make it hard for LLM to reason about. For example, 4 snippets from server.rb, the first one is from line 1-line 10, the second is from line 11-line 20, the third is from line 100-121, the forth is from 120-130

These snippets share the same project_id, path, file_name, language, blob id. These information is repeated and it might negatively affect LLM. We can group and merge them.

now the content looks like:


Confidence: MEDIUM

1. tests/api/v2/test_v2_code.py (score: 0.7998)
   Lines 75-77:
    @pytest.mark.parametrize("prompt_version", [1])
    def test_request_latency(
        self,
        prompt_version: int,
        mock_client: TestClient,
        mock_completions: Mock,
    ):
   Lines 141-142:
        if prompt_version == 2:
            data.update(
                {
                    "prompt": current_file["content_above_cursor"],
                }
            )
            
   Lines 170-170:
        def get_request_duration(cap_logs):
            event = 'testclient:50000 - "POST /completions HTTP/1.1" 200'
            entry = next(entry for entry in cap_logs if entry["event"] == event)

            return entry["duration_request"]


2.......
........
........

References

#587194

Screenshots or screen recordings

Before After

How to set up and validate locally

  1. set the mcp client:

!205297 (comment 2756113040)

::Feature.enable(:post_process_semantic_code_search_group_by_file)
  1. Screenshot_2026-01-21_at_15.59.38
{
  "content": [
    {
      "type": "text",
      "text": "Confidence: MEDIUM\n\n1. tests/api/v2/test_v2_code.py (score: 0.7998)\n   Lines 141-142:\n                  ),\n            \n\n2. tests/duo_workflow_service/components/human_approval/test_tools_approval.py (score: 0.7981)\n   Lines 390-390:\n                  ]"
    }
  ],
  "structuredContent": {
    "items": [
      {
        "path": "tests/api/v2/test_v2_code.py",
        "project_id": 1000000,
        "language": "python",
        "blob_id": "f8be989993d07f0e7d472e32db1f6f7b0bc00981",
        "ranges": [
          {
            "start_line": 141,
            "end_line": 142,
            "content": "            ),\n            ",
            "score": 0.7998023
          }
        ],
        "score": 0.7998023
      },
      {
        "path": "tests/duo_workflow_service/components/human_approval/test_tools_approval.py",
        "project_id": 1000000,
        "language": "python",
        "blob_id": "96985e997aacef1469d89769173b55222fa10507",
        "ranges": [
          {
            "start_line": 390,
            "end_line": 390,
            "content": "            ]",
            "score": 0.7981448
          }
        ],
        "score": 0.7981448
      }
    ],
    "metadata": {
      "count": 2,
      "has_more": false,
      "confidence": "medium"
    }
  },
  "isError": false
}

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #579852 (closed)

[skip feature-flag]

Edited by Tian Gao

Merge request reports

Loading