causal mask when cross-attn
Hi, thanks for having this great repo. I just wonder if the causal mask should be used in the second forward pass when cross-attn is used. Or do I miss something here? Thank you
Hi, thanks for having this great repo. I just wonder if the causal mask should be used in the second forward pass when cross-attn is used. Or do I miss something here? Thank you