Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
  • v1.0.0 protected
2 results

README.md

README.md 7.44 KiB

UTF-8 Buffered Reader

This crate provides functions to read utf-8 text from any type implementing io::BufRead through a trait, BufRead, without waiting for newline delimiters. These functions take advantage of buffering and either return &str or chars. Each has an associated iterator, some have an equivalent to a Map iterator that avoids allocation and cloning as well.

crates.io docs.rs build status

Usage

Add this crate as a dependency in your Cargo.toml:

[dependencies]
utf8-bufread = "1.0.0"

The simplest way to read a file using this crate may be something along the following:

// Reader may be any type implementing io::BufRead
// We'll just use a cursor wrapping a slice for this example
let mut reader = Cursor::new("Löwe 老虎 Léopard");
loop { // Loop until EOF
    match reader.read_str() {
        Ok(s) => {
            if s.is_empty() {
                break; // EOF
            }
            // Do something with `s` ...
            print!("{}", s);
        }
        Err(e) => {
            // We should try again if we get interrupted
            if e.kind() != ErrorKind::Interrupted {
                break;
            }
        }
    }
}

Reading arbitrary-length string slices

The read_str function returns a &str of arbitrary length (up to the reader's buffer capacity) read from the inner reader, without cloning data, unless a valid codepoint ends up cut at the end of the reader's buffer. Its associated iterator can be obtained by calling str_iter, and since it involves cloning the data at each iteration, str_map is also provided.

Reading codepoints

The read_char function returns a char read from the inner reader. Its associated iterator can be obtained by calling char_iter.

Iterator types

This crate provides several structs for several ways of iterating over the inner reader's data:

  • StrIter and CodepointIter clone the data on each iteration, but use an Rc to check if the returned String buffer is still used. If not, it is re-used to avoid re-allocating.
let mut reader = Cursor::new("Löwe 老虎 Léopard");
for s in reader.str_iter().filter_map(|r| r.ok()) {
    // Do something with s ...
    print!("{}", s);
}
  • StrMap and CodepointMap allow having access to read data without allocating nor copying, but then it cannot be passed to further iterator adapters.
let s = "Löwe 老虎 Léopard";
let mut reader = Cursor::new(s);
let count: usize = reader
    .str_map(|s| s.len())
    .filter_map(Result::ok)
    .sum();
println!("There is {} valid utf-8 bytes in {}", count, s);
  • CharIter is similar to StrIter and others, except it relies on chars implementing Copy and thus doesn't need a buffer nor the "Rc trick".
let s = "Löwe 老虎 Léopard";
let mut reader = Cursor::new(s);
let count = reader
    .char_iter()
    .filter_map(Result::ok)
    .filter(|c| c.is_lowercase())
    .count();
assert_eq!(count, 9);