paper said an additional layernorm was placed in downsampling layer, but was it placed before or after?