post-process Strategy Group Snippets by File
What does this MR do and why?
issue: #579852 (closed)
reference: !219658 (merged) gitlab-org#19745
ff: #587194
multiple snippets in a same logic group can make it hard for LLM to reason about. For example, 4 snippets from server.rb, the first one is from line 1-line 10, the second is from line 11-line 20, the third is from line 100-121, the forth is from 120-130
These snippets share the same project_id, path, file_name, language, blob id. These information is repeated and it might negatively affect LLM. We can group and merge them.
now the content looks like:
Confidence: MEDIUM
1. tests/api/v2/test_v2_code.py (score: 0.7998)
Lines 75-77:
@pytest.mark.parametrize("prompt_version", [1])
def test_request_latency(
self,
prompt_version: int,
mock_client: TestClient,
mock_completions: Mock,
):
Lines 141-142:
if prompt_version == 2:
data.update(
{
"prompt": current_file["content_above_cursor"],
}
)
Lines 170-170:
def get_request_duration(cap_logs):
event = 'testclient:50000 - "POST /completions HTTP/1.1" 200'
entry = next(entry for entry in cap_logs if entry["event"] == event)
return entry["duration_request"]
2.......
........
........
References
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
- set the mcp client:
::Feature.enable(:post_process_semantic_code_search_group_by_file)
{
"content": [
{
"type": "text",
"text": "Confidence: MEDIUM\n\n1. tests/api/v2/test_v2_code.py (score: 0.7998)\n Lines 141-142:\n ),\n \n\n2. tests/duo_workflow_service/components/human_approval/test_tools_approval.py (score: 0.7981)\n Lines 390-390:\n ]"
}
],
"structuredContent": {
"items": [
{
"path": "tests/api/v2/test_v2_code.py",
"project_id": 1000000,
"language": "python",
"blob_id": "f8be989993d07f0e7d472e32db1f6f7b0bc00981",
"ranges": [
{
"start_line": 141,
"end_line": 142,
"content": " ),\n ",
"score": 0.7998023
}
],
"score": 0.7998023
},
{
"path": "tests/duo_workflow_service/components/human_approval/test_tools_approval.py",
"project_id": 1000000,
"language": "python",
"blob_id": "96985e997aacef1469d89769173b55222fa10507",
"ranges": [
{
"start_line": 390,
"end_line": 390,
"content": " ]",
"score": 0.7981448
}
],
"score": 0.7981448
}
],
"metadata": {
"count": 2,
"has_more": false,
"confidence": "medium"
}
},
"isError": false
}
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #579852 (closed)
[skip feature-flag]
Edited by Tian Gao
