AVX512FP16: Optimize _Float16 reciprocal for div and sqrt
For _Float16 type, add insn and expanders to optimize
x / y to x * rcp (y), and x / sqrt (y) to x * rsqrt (y).
As Half float only have minor precision difference between div
and mul * rcp, there is no need for Newton-iteration.
gcc/ChangeLog:
* config/i386/i386.md (rcphf2): New define_insn.
(rsqrthf2): Likewise.
* config/i386/sse.md (div<mode>3): Change VF2H to VF2.
(div<mode>3): New expander for HF mode.
(rsqrt<mode>2): Likewise.
(*avx512fp16_vmrcpv8hf2<mask_scalar_name>): New define_insn
for rpad pass.
(*avx512fp16_vmrsqrtv8hf2<mask_scalar_name>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-recip-1.c: New test.
* gcc.target/i386/avx512fp16-recip-2.c: Ditto.