Region/Line-Type Filtering for Image-Level Actions (Transcribe, Train)

Description

Enable filtering of regions or line types when launching transcription-related operations (e.g., Recognize, Training). This functionality already exists for exporting, but not for other tasks, which creates inconsistency and limits workflows for complex documents.

Motivation

Many documents contain a mix of scripts, languages, or distinct editorial categories (e.g., main text vs. marginalia). When only part of a document is appropriate for a given model, users are currently forced to:

  • Run recognition on the entire page, then manually delete incorrect outputs
  • Build separate documents to isolate content This is inefficient and error-prone. Consistent region/line filtering would support multilingual and multiscript projects, especially before fine-tuned models are available.

Proposed Feature

  • Add UI and API options to restrict image-level transcription operations to:
    • Selected region types (e.g., MarginalZone only)
    • Selected line types (e.g., TitleLine + MainLine)
  • Improve on the current UI of the Export modal.

Scope

  • Applicable operations:
    • Run Recognition (model inference)
    • Training data selection
  • Operates at the page/document level like other batch jobs.
  • Non-selected content remains untouched in the target transcription layer.

UI Integration

  • Include a filtering section in the operation configuration panel (checklist of region/line types).
  • Optionally remember last-used filters on a per-document basis.

Future Enhancements (Separate Issues)

  • Combine this filtering with transcription status filtering (Final-only, etc.).
  • Provide a preview showing how many regions/lines will be processed before running.

Rationale

Brings task filtering parity across export, transcribe, and training operations. Reduces manual cleanup and enables practical work on heterogeneous documents.