Skip to content

Multi gpu training

Hongtao Yang requested to merge multi-gpu into main

Fix repeated logging when using multi-gpu. Repeated logging happens because that each gpu is a separate process, hence log its own message.

Fix OOM error when running multi-gpu validation. The current implementation will only log validation loss on gpu 0, while all gpus will be used during validation. Therefore we have lots of redundant computation, but at least it works.

Merge request reports