Skip to content

Add safer close handling for file objects

Cal Pratt requested to merge cpratt34/safer-writes into master

The begin_write method adds a lot of complexity for the safe handling of file-like objects. The contract of StorageABC states it accepts a IO[bytes], however, each implementation of StorageABC had hard requirements about the concrete implementation. Given that the interface states that the file object may only be readable or seekable, I've decided to enforce this behavior more strongly. begin_write has been replaced by a top level function create_write_section which is either a TemporaryFile or BytesIO. If the receiver requires a different format they can copy the data into a more appropriate structure.

So far the only case where this made any behavioral difference was the WithCache and Disk storage implementations.

For the Disk stoage a secondary copy must be made for linking into the objects directory. The Disk storage is only valid for testing setups, so this overhead seems reasonable for the reduction in complexity.

For the WithCache storage, the _ObjectTee class has been removed (which we recently discovered had a bug with it's close method, invoking flush after close when disposed of by the GC). When performing a deferred cache write, the file is copied such that the ByteStreamInstance class can perform is normal cleanup strategy without relying on implicit GC cleanups of the temp files. The deferred writes IMO are probably not a good idea in the first place, as this can mask bugs in writes which otherwise could be discovered and reported by clients.

Additionally, the outputs of bulk_read_blobs has changed from returning a Dict[str, IO[bytes]] to a simple Dict[str, bytes]. This makes the logic much more simple throughout the application. It should not introduce memory overhead as all objects in these scenarios are already going to be encapsulated in IO[bytes] objects (which are themselves in memory). This also removes need for callers to close the fetched results, which was done inconsistently throughout the application by callers of bulk_read_blobs.

Merge request reports