De-duplicated pipelining code
By keeping the organization of the pipelining encapsulated, it was straightforward to deduplicate the spreading kernel call. Probably it runs slightly faster and is slightly easier to understand, too.
Also minor related improvements.
Edited by Mark Abraham