Fix alignment issue in Xmemset on some ARM cpus
This is a fix for alignment trap error showing an the ARM926 cpu.
Xmemset was taken from ccgo_linux_amd64.go and tweaked for readability and a couple optimizations.
From my tests, it is a tad slower than the previous implementation on the arm926 but faster on other ARM cpus like the i.MX6.