Question about Hubbard projector types for DFT+U in PAW and "could not set nbH" error

I'm trying to implement DFT+U within the framework of PAW in a home-made code and am taking QE's implementation as a reference. I'm aware that the key formulas are well described in an article by Bengone et al, while Amadon et al proposes an implementation differ in the occupation matrix. In either case, the +U correction to hpsi seems to be merely an extra term in the D matrix of the original non-local term, instead of a new non-local term.

I notice that QE offers several projector types. After digging into the code, I guess the option "pseudo" might be the one in line with those PAW DFT+U literature, which indeed introduces +U correction by adding an extra term to the D matrix (add_vhub_to_deeq.f90). Furthermore, judging from the expression

                 deeq(ih,jh,na,1:nspin) = deeq(ih,jh,na,1:nspin) + &
                    v%ns(m1,m2,1:nspin,na)*q_ae(ow1,ih,na)*q_ae(ow2,jh,na)

I tend to think Amadon's formalism is employed, i.e., the occupation matrix is defined with respect to some atomic ground state orbitals.

However, when I tried to run some example (Ni2MnGa_paw.tar), the following error occurs:

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     Error in routine init_q_aeps (1):
     could not set nbH
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

     stopping ...

This error comes from the following code snippet in init_q_aeps.f90. If my understanding above is correct, it seems to fail to identify the atomic ground state orbital. But I don't understand what's going on below...

IF ( lH >= 0 ) THEN                                                                           
   !                                                                                          
   ! NOTE: one might run into troubles when using a PP with semicore                          
   ! states with same l as valence states (also otherwhere for DFT+Hubbard)                   
   DO nb = 1, upf(nt)%nwfc                                                                    
      IF (upf(nt)%lchi(nb) == lH .AND. upf(nt)%oc(nb) >= 0.d0) nchiH = nb                     
   ENDDO                                                                                      
   !                                                                                          
   DO nb = 1, upf(nt)%nbeta                                                                   
      !                                                                                       
      IF (upf(nt)%lll(nb) == lH) THEN                                                         
         ! check if chi and pswfc have the same sign or not                                   
         aux(1:msh(nt)) = upf(nt)%pswfc(1:msh(nt),nb)*upf(nt)%chi(1:msh(nt),nchiH)            
         CALL simpson( msh(nt), aux, rgrid(nt)%rab, psint )                                   
         wsgn = SIGN(1.0_DP,psint)                                                            
         ! compute norm of the difference [pswfc(r) - chi(r)]                                 
         aux(1:msh(nt)) = (upf(nt)%pswfc(1:msh(nt),nb) - wsgn*upf(nt)%chi(1:msh(nt),nchiH))**2
         CALL simpson( msh(nt), aux, rgrid(nt)%rab, psint )                                   
         IF ( ABS(psint) <= 1.d-9 ) nbH = nb                                                  
      ENDIF                                                                                   
      !                                                                                       
   ENDDO                                                                                      
   !                                                                                          
   ! DEBUG                                                                                    
   IF ( ionode .AND. iverbosity == 1 ) THEN                                                   
      WRITE(*,*) '> QQ_AE matrix:'                                                            
      DO nb = 1, upf(nt)%nbeta                                                                
         WRITE(*,'(99F9.6)') qq_ae(nb,1:upf(nt)%nbeta,nt)                                     
      ENDDO                                                                                   
      WRITE(*,*) "nbH=", nbH, ", lH", lH                                                      
   ENDIF                                                                                      
   !                                                                                          
   IF ( nbH == -1 ) CALL errore( "init_q_aeps", "could not set nbH", 1 )                      
   !                                                                                          
ENDIF

My questions are:

Does QE implement PAW DFT+U with Amadon's formalism, or do I miss some important literature on QE's implementation?
Is the current behavior expected, or a bug?
Is there any particular reason to choose the current one over Bengone's formalism?

This example can be run without error by switching to other projector type like "ortho-atomic", yet I'm not sure if it's the right way to go. The current behavior can be reproduced by various QE versions, including the latest develop branch.