-
Morgane Austreelis authored
Hurray for le bump !
Morgane Austreelis authoredHurray for le bump !
UTF-8 Buffered Reader
This crate provides functions to read utf-8 text from any
type implementing io::BufRead
through a
trait, BufRead
, without waiting for newline
delimiters. These functions take advantage of buffering and
either return &
str
or char
s. Each has
an associated iterator, some have an equivalent to a
Map
iterator that avoids allocation and cloning as
well.
Usage
Add this crate as a dependency in your Cargo.toml
:
[dependencies]
utf8-bufread = "1.0.0"
The simplest way to read a file using this crate may be something along the following:
// Reader may be any type implementing io::BufRead
// We'll just use a cursor wrapping a slice for this example
let mut reader = Cursor::new("Löwe 老虎 Léopard");
loop { // Loop until EOF
match reader.read_str() {
Ok(s) => {
if s.is_empty() {
break; // EOF
}
// Do something with `s` ...
print!("{}", s);
}
Err(e) => {
// We should try again if we get interrupted
if e.kind() != ErrorKind::Interrupted {
break;
}
}
}
}
Reading arbitrary-length string slices
The read_str
function returns a
&
str
of arbitrary length (up to the reader's
buffer capacity) read from the inner reader, without cloning
data, unless a valid codepoint ends up cut at the end of the
reader's buffer. Its associated iterator can be obtained by
calling str_iter
, and since it involves
cloning the data at each iteration, str_map
is
also provided.
Reading codepoints
The read_char
function returns a
char
read from the inner reader. Its associated
iterator can be obtained by calling
char_iter
.
Iterator types
This crate provides several structs for several ways of iterating over the inner reader's data:
-
StrIter
andCodepointIter
clone the data on each iteration, but use anRc
to check if the returnedString
buffer is still used. If not, it is re-used to avoid re-allocating.
let mut reader = Cursor::new("Löwe 老虎 Léopard");
for s in reader.str_iter().filter_map(|r| r.ok()) {
// Do something with s ...
print!("{}", s);
}
-
StrMap
andCodepointMap
allow having access to read data without allocating nor copying, but then it cannot be passed to further iterator adapters.
let s = "Löwe 老虎 Léopard";
let mut reader = Cursor::new(s);
let count: usize = reader
.str_map(|s| s.len())
.filter_map(Result::ok)
.sum();
println!("There is {} valid utf-8 bytes in {}", count, s);
-
CharIter
is similar toStrIter
and others, except it relies onchar
s implementingCopy
and thus doesn't need a buffer nor the "Rc
trick".
let s = "Löwe 老虎 Léopard";
let mut reader = Cursor::new(s);
let count = reader
.char_iter()
.filter_map(Result::ok)
.filter(|c| c.is_lowercase())
.count();
assert_eq!(count, 9);