Skip to content

Improve read_file tool to support chunked reading and size limits

Problem

Currently, the read_file tool can attempt to read files of any size, which can lead to:

  1. Tool responses that exceed reasonable size limits and overflow context windows
  2. Inefficient token usage when only a portion of a large file is needed
  3. No mechanism for the LLM to read specific portions of large files

When an LLM attempts to read a large file (e.g., a large CSV, log file, or generated code), the entire content is returned, which may be truncated aggressively or cause context issues.

We have recently moved from a read_file (single file read) to a read_files tool to read multiple files at once.

This problem is likely also connected to Agentic Duo Chat unnecessarily runs `sed` or ot... (#557751) where the LLM might try to work around the limitations of our read file too.

Desired Outcome

The read_file/read_files tool should:

  1. Support reading files in chunks by accepting offset and limit parameters
  2. Refuse to read files larger than a reasonable threshold (e.g., 2MB) without chunking
  3. Return a helpful error message when a file is too large, instructing the LLM to use chunked reading instead
  4. Allow the LLM to efficiently navigate large files by reading specific portions

This would enable the LLM to:

  • Read the beginning of a file to understand its structure
  • Navigate to specific sections of interest
  • Handle large files without overwhelming the context window

Proposal

Completely re-review both our read_file and read_files implementation against industry standards + ask anthropic for suggestions on an efficient read files tool.

TBD

Edited by Sebastian Rehm