Feedback on new files() and Traverseable API
I'm attempting to implement the new files()
and Traverseable
APIs in PyOxidizer and am struggling to figure it out.
First, the new classes in abc.py
don't have type annotations. I think I can work them out. But being explicit would be much appreciated. Having type annotations would also match the rest of the file.
Second, the overloading of Traverseable
to be both a resource loader and an actual resource is a bit confusing. Elsewhere in the importlib
world, there is a separation between the interface for loading things and the things returned by that interface. It is weird to me that a single Traverseable
both represents the returned value from files()
(a logical collection of resources or a mechanism to obtain resources) as well as a handle on an individual resource. I would like for these interfaces to be teased apart, if possible. More on this later.
Third - and this is related to the last point - I just don't understand how you are supposed to handle directories in all cases. Because Traversable
is both a loader and a handle on an individual resource, what are we supposed to do for operations like calling Traverseable.open()
on an object that represents a directory? Are we supposed to return None
? Raise an OsError
or IoError
or whatever the OS emits when you try to open a directory? Do we assume the caller calls is_dir()
/ is_file()
and isn't dumb enough to call open()
on a directory? But what if they do?
Fourth, I don't understand why there exists an open
, read_text
, and read_bytes
on Traverseable
. read_text
and read_bytes
can be implemented in terms of open()
. So why are they a) abstract b) part of the interface in the first place? If we're going to provide open()
, I think it makes sense to expose helper functions for read_text
and read_bytes
that are implemented in terms of open()
.
Fifth, to be honest, I'm not a huge fan of open()
because it puts a lot of burden on custom implementations. What subset of Path.open()
needs to be implemented? Path.open()
is supposed to accept mode
, buffering
, encoding
, errors
, and newline
arguments. Since these are all *args, **kwrags
-awayed in the interface and the docs only call out mode
, what is supposed to be supported? Is it really reasonable for custom implementations (which may not be able to simply call out to builtins.open()
because they aren't using a traditional filesystem) to have to implement all this functionality? I would highly encourage reducing the scope of open()
to binary file reading. Remove buffering, encoding, errors, and newline support. If people really want to buffer or read resources as text, they can use io.BufferedReader
and/or io.TextIOWrapper
. And of course, helper APIs that wrap low-level binary-only file-like objects can exist to make read_text()
a trivial one-liner. Or consider making the text reading APIs optional and having the default resource reading code go through a polyfill that uses io.*
wrappers accordingly.
I want to say the new files()
+ Traverseable
API is a net improvement and thank everybody responsible. But at this point in time, I feel like the pile of warts I described in #58 (closed) has more or less been replaced by a different pile of warts :/ I failed to see the new API being designed and feel like I missed an opportunity to influence its design towards something more reasonable for advanced use cases like PyOxidizer. For that I feel bad and apologize. If it isn't too late to make backwards-incompatible changes to the interface, I would strongly vote for reducing the scope of open()
and removing read_text
and read_bytes
. If we do that and clarify the interface and semantics a bit more, I think this new API will be clearly better than what came before.