feat(mcp): add read_repository_files tool POC
A simple (and ugly) POC outlining proper tool schema design and potential implementation paths.
See !204566 (comment 2746436731) and !203763 (closed) for more details.
in the simple case of read_repository_file
as a 1:1 mapping, which is a granular tool, we run the risk of context window saturation.
Take a look at this API https://docs.gitlab.com/api/repository_files/#get-file-from-repository
There are no parameters I can see to grab a line range or the ability to "expand" a truncated file that's above a reasonable file line limit.
So the current tool params look like:
Current 1:1 Mapped Tool Schema:
{
"tool": "read_repository_file",
"parameters": {
"project_id": "string",
"file_path": "string",
"ref": "string"
}
}
This returns the entire file content base64 encoded, which could be massive and immediately blow through context limits.
Better Tool Design with Context Engineering:
{
"tool": "read_repository_files",
"parameters": {
"project_path": "string",
"files": [
{
"path": "string",
"ref": "string",
"line_start": "integer (optional)",
"line_end": "integer (optional)",
"max_lines": "integer (optional, default: 100)"
}
],
"include_metadata": "boolean (default: true)"
}
}
Example Response with System Instructions:
<repository_files>
<file path="app/models/user.rb" ref="main">
<metadata>
<total_lines>450</total_lines>
<returned_lines start="1" end="100"/>
<truncated>true</truncated>
<size_bytes>15234</size_bytes>
</metadata>
<content>
<!-- First 100 lines of content here -->
</content>
<system_instruction>
File truncated. To view more content, use:
- Lines 101-200: {"line_start": 101, "line_end": 200}
- Lines around specific area: {"line_start": 250, "line_end": 300}
- Remaining lines: 350 lines available
</system_instruction>
</file>
<file path="config/routes.rb" ref="main">
<metadata>
<total_lines>75</total_lines>
<returned_lines start="1" end="75"/>
<truncated>false</truncated>
</metadata>
<content>
<!-- Complete file content -->
</content>
</file>
</repository_files>
This approach:
- Prevents context saturation by defaulting to reasonable chunks
-
Allows batch operations with
read_files
(plural) reducing round trips - Provides navigation hints so the model knows how to request more
- Uses XML structure for better model parsing (as per context engineering best practices)
- Includes metadata to inform decision-making without requiring full content
This is precisely why we need proper tool design patterns rather than just exposing raw API endpoints through MCP.