Skip to content

Weight normalization in prioritized experience replay

In prioritized experience replay, importance sampling weights are normalized just like the paper mentions. However, I noticed that weights coming from sample returns always values like 0.2. I think this is because normalizing the weights with the maximum weight ever computed, not the batch's maximum value.

Because of these small IS weights, using prioritized experience replay always worsened the performance in any environment I tried. Maybe they should be normalized based on the maximum value of itself, just like this repo does: https://github.com/rlcode/per/blob/master/prioritized_memory.py