Improve read_file tool to support chunked reading and size limits
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem
Currently, the read_file tool can attempt to read files of any size, which can lead to:
- Tool responses that exceed reasonable size limits and overflow context windows
- Inefficient token usage when only a portion of a large file is needed
- No mechanism for the LLM to read specific portions of large files
When an LLM attempts to read a large file (e.g., a large CSV, log file, or generated code), the entire content is returned, which may be truncated aggressively or cause context issues.
We have recently moved from a read_file (single file read) to a read_files tool to read multiple files at once.
This problem is likely also connected to Agentic Duo Chat unnecessarily runs `sed` or ot... (#557751) where the LLM might try to work around the limitations of our read file too.
Desired Outcome
The read_file/read_files tool should:
- Support reading files in chunks by accepting
offsetandlimitparameters - Refuse to read files larger than a reasonable threshold (e.g., 2MB) without chunking
- Return a helpful error message when a file is too large, instructing the LLM to use chunked reading instead
- Allow the LLM to efficiently navigate large files by reading specific portions
This would enable the LLM to:
- Read the beginning of a file to understand its structure
- Navigate to specific sections of interest
- Handle large files without overwhelming the context window
Proposal
Completely re-review both our read_file and read_files implementation against industry standards + ask anthropic for suggestions on an efficient read files tool.
Implement support for offset and limit but only use/expose these to the LLM if an appropriate executor is used. This can e.g. be decided on the provided language server version.
Implementation Plan
- In language server, start reading the
offsetandlimitin the tool. If they exist, apply the offset and limit. Otherwise ignore them. - In DWS create ReadFileV2 tool. The tool should accept
limitandoffsetalong withfile_path. When setting tools, conditionally add the new tool based on language server version.