String.split should return an Iterator!(String) instead of an Array!(String)
Summary
String.split
returns an Array!(String)
, which requires that we allocate memory for every sub-string and the resulting Array
. Instead, it should return an Iterator!(String)
.
Motivation
Splitting a String
into a Array!(String)
may result in a lot of garbage if we only care about a small amount of values. For example, let's say we want to split a string on a ,
but only care about the first value. You would now
write something like this:
'foo,bar,baz'.split(',')[0]
Despite only caring about the first value, we allocate memory for 3 strings, the containing surrounding array, and whatever intermediate allocations are necessary by the string splitting logic.
Using an iterator allows us to write very similar code:
'foo,bar,baz'.split(',').next
If we just want all the values we can write this:
'foo,bar,baz'.split(',').to_array
The benefit is having more control over how expensive splitting is going to be, based on your needs (e.g. do you just need the first N values, all of them, etc).
Implementation
The internals of String.split
already use Block.while_true
to find all values, which should be straightforward to turn into an external iterator. If we return a dedicated Iterator type (instead of an Enumerator
) we can probably also turn std::string::extensions.split_at?
into an instance method of this Iterator, instead of it being a module method.
Drawbacks
It's a little different from other languages (e.g. Ruby), but I believe this to be a safer default.