Skip to content

Feature engineering for Merchant's Group and category feature

Note: All the changes applies to both merchant group and category

Current Scenario

  1. Analysis of default rate per merchant group and default rate across merchant groups is done.
  2. This feature is frequency encoded instead of using one-hot encoding, to avoid creating high dimensional training set.
  3. This transformation brings down the performance of the final model, so the feature is removed from the training set.

Improvisation (Modified Strategy)

  1. Highlight top 5 groups and categories that have largest default rate and within group default rate.
  2. Carry out hypothesis testing to find out independence of merchant group and default features
    • Perform Chi-Square test of independence
  3. If the significance is found for dependence of the two features, perform post-hoc testing
    • Form groups and find combinations of each merchant groups that are responsible for high significance level.
      1. Treat each merchant group as separate group and test alternate hypothesis for each 7 compute a p-value.
      2. Use Bonferroni-adjusted method to correct the combined p-value for all the hypothesis testing results.
  4. Comment on prominent merchant groups.
  5. Include those merchant groups in decision making for training the final model.
  6. Create new custom transformer that implements the modified strategy.
Edited by Amoli Rajgor