inplace ops are not happening for simple operations

I have been trying to benchmark a WIP branch of the apint crate against num-bigint and rug. rug does outperform apint and num-bigint for addition, but only after the integer lengths get into the thousands of bits. Just returning a clone of a small Integer takes about 70 ns/iter, just like apint and num-bigint do.

    const HEX: &str = "17f3feabf73e71234"; //this is just over 64 bits, because if it were smaller, small value optimization would kick in for `apint` at least
    #[bench]
    fn allocate_rug(b: &mut Bencher) {
        // types are explicitly annotated to make sure that the inputs and outputs are completed operations
        let int0: Integer = black_box(Integer::from(Integer::parse_radix(HEX, 16).unwrap()));
        b.iter(|| {
            let o: Integer = Integer::from(&int0);
            o
        })
    }

However, once anything else is done, another allocation appears to happen and the time for rug doubles, putting it way behind in the benchmarks. This seems to happen for every thing I tried, except for repeatedly adding like Integer::from(&int0) + &int0 + &int0;, where each extra addition after the first addition adds only 20ns instead of 70ns. Maybe the optimizer is doing something special and I should really use custom allocators to count the actual number of allocations happening, but either way something is wrong with the performance.

I like the idea of having incomplete operations that should cut down on the number of functions needed and automatically select the most optimal assignment or allocation, but I feel like the documentation is missing some details, like why does Integer::from(&int0) + &int0 + &int0 work but not &int0 + &int0 + &int0, and why the first form doesn't require an outer Integer::from as in Integer::from(Integer::from(&int0) + &int0 + &int0).