Feature engineering for Merchant's Group and category feature
Note: All the changes applies to both merchant group and category
Current Scenario
- Analysis of default rate per merchant group and default rate across merchant groups is done.
- This feature is frequency encoded instead of using one-hot encoding, to avoid creating high dimensional training set.
- This transformation brings down the performance of the final model, so the feature is removed from the training set.
Improvisation (Modified Strategy)
-
Highlight top 5 groups and categories that have largest default rate
andwithin group default rate
. - Carry out hypothesis testing to find out independence of merchant group and default features
-
Perform Chi-Square test of independence
-
- If the significance is found for dependence of the two features, perform post-hoc testing
-
Form groups and find combinations of each merchant groups that are responsible for high significance level. - Treat each merchant group as separate group and test alternate hypothesis for each 7 compute a p-value.
- Use Bonferroni-adjusted method to correct the combined p-value for all the hypothesis testing results.
-
-
Comment on prominent merchant groups. -
Include those merchant groups in decision making for training the final model. -
Create new custom transformer that implements the modified strategy.
Edited by Amoli Rajgor