Use numpy parallel compilation if available
This speeds up the compilation of the extension, if the computer has multiple cores and the numpy compiler method is available. That should be available since numpy 1.10.
Skipped on Python 3.5, as the build fails.
Closes #416 (closed)