Update to v106r76 release.
byuu says: I added some useful new functions to nall/primitives: auto Natural<T>::integer() const -> Integer<T>; auto Integer<T>::natural() const -> Natural<T>; These let you cast between signed and unsigned representation without having to care about the value of T (eg if you take a Natural<T> as a template parameter.) So for instance when you're given an unsigned type but it's supposed to be a sign-extended type (example: signed multiplication), eg Natural<T> → Integer<T>, you can just say: x = y.integer() * z.integer(); The TLCS900H core gained some more pesky instructions such as DAA, BS1F, BS1B. I stole an optimization from RACE for calculating the overflow flag on addition. Assuming: z = x + y + c; Before: ~(x ^ y) & (x ^ z) & signBit; After: (x ^ z) & (y ^ z) & signBit; Subtraction stays the same. Assuming: z = x - y - c; Same: (x ^ y) & (x ^ z) & signBit; However, taking a speed penalty, I've implemented the carry computation in a way that doesn't require an extra bit. Adding before: uint9 z = x + y + c; c = z & 0x100; Subtracting before: uint9 z = x - y - c; c = z & 0x100; Adding after: uint8 z = x + y + c; c = z < x || z == x && c; Subtracting after: uint8 z = x - y - c; c = z > x || z == x && c; I haven't been able to code golf the new carry computation to be any shorter, unless I include an extra bit, eg for adding: c = z < x + c; But that defeats the entire point of the change. I want the computation to work even when T is uintmax_t. If anyone can come up with a faster method, please let me know. Anyway ... I also had to split off INC and DEC because they compute flags differently (word and long modes don't set flags at all, byte mode doesn't set carry at all.) I also added division by zero support, although I don't know if it's actually hardware accurate. It's what other emulators do, though.
Showing with 211 additions and 116 deletions