Improve read_file tool to support chunked reading and size limits
Problem
Currently, the read_file
tool can attempt to read files of any size, which can lead to:
- Tool responses that exceed reasonable size limits and overflow context windows
- Inefficient token usage when only a portion of a large file is needed
- No mechanism for the LLM to read specific portions of large files
When an LLM attempts to read a large file (e.g., a large CSV, log file, or generated code), the entire content is returned, which may be truncated aggressively or cause context issues.
We have recently moved from a read_file
(single file read) to a read_files
tool to read multiple files at once.
This problem is likely also connected to Agentic Duo Chat unnecessarily runs `sed` or ot... (#557751) where the LLM might try to work around the limitations of our read file too.
Desired Outcome
The read_file
/read_files
tool should:
- Support reading files in chunks by accepting
offset
andlimit
parameters - Refuse to read files larger than a reasonable threshold (e.g., 2MB) without chunking
- Return a helpful error message when a file is too large, instructing the LLM to use chunked reading instead
- Allow the LLM to efficiently navigate large files by reading specific portions
This would enable the LLM to:
- Read the beginning of a file to understand its structure
- Navigate to specific sections of interest
- Handle large files without overwhelming the context window
Proposal
Completely re-review both our read_file and read_files implementation against industry standards + ask anthropic for suggestions on an efficient read files tool.
TBD