MPI_ERR_COUNT: invalid count argument
I'm not familiar with running mpi, so this could very well be user error on my part. Nonetheless, I am attempting to run wsclean-mp so that I can run wgridder gridding in parallel. I haven't found any documentation on the MPI implementation of wsclean, but it seems like I don't need to set -parallel-gridding as wsclean-mp will automatically attempt to grid across available nodes.
I have built wsclean both with the standard openmpi library that is part of debian 10 buster (3.1.3), and also separately against 4.0.4 (the lastest) which I've built from source. Both give the following error:
mpirun -np 4 -mca orte_base_help_aggregate 0 ~/Downloads/wsclean/build/wsclean-mp -mwa-path /home/torrance/Code/mwaprocessing -name 1197977416-wsclean-briggs0 -apply-primary-beam -mgain 0.8 -pol i -weight briggs 0 -minuv-l 15 -size 13000 13000 -scale .007246 -niter 9999999 -auto-threshold 1 -auto-mask 3 -channels-out 4 -fit-spectral-pol 2 -deconvolution-channels 2 -join-channels -nmiter 12 -use-wgridder -padding 1.8 -multiscale -parallel-deconvolution 2048 1197977416.ms
Node 3, PID 19096 on pravic
Node 2, PID 19095 on pravic
Node 1, PID 19094 on pravic
Node 0, PID 19093 on pravic
WSClean version 2.10.1 (2020-07-20)
This software package is released under the GPL version 3.
Author: André Offringa (offringa@gmail.com).
First measurement set has corrected data: tasks will be applied on the corrected data column.
First measurement set has corrected data: tasks will be applied on the corrected data column.
=== IMAGING TABLE ===
# Pol Ch JG ²G In Freq(MHz)
| Independent group:
+-+-J- 0 I 0 0 0 0 72-80 (192)
|
+-J- 1 I 1 0 1 0 80-88 (192)
|
+-J- 2 I 2 0 2 0 88-95 (192)
|
+-J- 3 I 3 0 3 0 95-103 (192)
First measurement set has corrected data: tasks will be applied on the corrected data column.
First measurement set has corrected data: tasks will be applied on the corrected data column.
Reordering 1197977416.ms into 4 x 1 parts.
Reordering: 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Initializing model visibilities: 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Precalculating MF weights for Briggs'(0) weighting...
Opening reordered part 0 spw 0 for 1197977416.ms
Opening reordered part 1 spw 0 for 1197977416.ms
Opening reordered part 2 spw 0 for 1197977416.ms
Opening reordered part 3 spw 0 for 1197977416.ms
== Constructing PSF ==
== Constructing PSF ==
== Constructing PSF ==
== Constructing PSF ==
Finishing scheduler.
Sending gridding task to : 3
[pravic:19093] *** An error occurred in MPI_Send
[pravic:19093] *** reported by process [3982688257,0]
[pravic:19093] *** on communicator MPI_COMM_WORLD
[pravic:19093] *** MPI_ERR_COUNT: invalid count argument
[pravic:19093] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[pravic:19093] *** and potentially your MPI job)
[pravic:19096] *** An error occurred in MPI_Recv
[pravic:19096] *** reported by process [3982688257,3]
[pravic:19096] *** on communicator MPI_COMM_WORLD
[pravic:19096] *** MPI_ERR_COUNT: invalid count argument
[pravic:19096] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[pravic:19096] *** and potentially your MPI job)
[pravic:19081] PMIX ERROR: UNREACHABLE in file ../../../src/server/pmix_server.c at line 2079