A flat feature importance distribution shows only few information.
One can only determine that all features have equivalent importance without knowing if they are all important or all un-important.
While doing iterative feature selection, new features may appear more important and therefore to be kept.
While keeping only the best features of different training runs, the distribution of the importance of the selected features will tend to flatten, leading in a case where we know all features have similar importance
and that they are all important.