x-y-z to z-y-x (for a G->R transform) to make it consistent with the way
ffts are executed in the general parallel case.
- fft_dlay_scalar (in fft_types.f90)
- sticks_maps_scalar (in sticks_base.f90)
- all variant of fft_scalar.XXX.f90 ( tested for XXX=FFTW3 )
no more need to call fft_dlay_allocate with arguments like max(dfft%nr1x,dfft%nrx3) ... dfft%nr1x should be always fine.
changes should not be needed to CPV and GWW that use cfft3ds initializing data via the modified stick_maps_scalar and fft_dlay_scalar.
explicitely tested for CPV.
in PW, pw2blip.f90 uses cfft3ds. The new execution order needs a different definition of the auxiliary array do_fft_x -> do_fft_z,
that should now be the same as dfft%isind. For now it is initialized following the same logics as the original routine.
some auxiliary functions/subroutines like put_f_of_R, put_f_of_G, get_f_of_R, get_f_of_G added to fft_parallel to help
assigning/retrieving values to/from a distributed fft array in the parallel case.
These tools are NOT designed for efficiency but to make life easier in testing programs (see for instance test0.f90).
git-svn-id: http://qeforge.qe-forge.org/svn/q-e/trunk/espresso@12570 c92efa57-630b-4861-b058-cf58834340f0