Skip to content

WIP: Feature selection comp

Jan-Peter Ceglarek requested to merge FeatureSelection_comp into master

Right now the feature selection is possible in of three methods in selection.py -> filter_importances(). I would like to build a pipeline (or even better to implement it into an existing one) which tries each method, validates it and chooses the one with the best outcome.

The basic steps:

  1. Takes the transformed features as an input
  2. Making a prediction with each method
    1. Saving the outcome using MLflow
  3. Validating the predictions
  4. Find the selection method with the best result
  5. Storing the selected best features

Since the selection process is currently done manually and per column I used for testing the implementation:
polaris fetch -s 2019-08-10 -e 2019-09-5 LightSail-2 /tmp/normalized_frames.json
polaris learn -g /tmp/new_graph.json /tmp/normalized_frames.json -c bat0_temp

Question/Problems raised during the work on this issue:
Q1. Purpose of FeaturesImpoartanceOptimization: Does the purpose of "flattening feature importance distribution from xgboost" learn-feature-selection.py change, if the class is also used for selecting the feature selection method?
Q2. Currently the feature selection process runs column by column. What can be done to automate this for all columns/features?
P1. The feature selection method all_best only generates a 2 features after the selection.
P2. The feature selection is currently different for each run. I believe it is because how train_test_split is implemented in this setting (splitting the DataFrame for each run and method, shuffling, and randomization).
P4. I would like to implement the comparison of the prediction with MLflow. But a comparison until now only could have done via the UI. Hence I decided to do the comparison "manually".

Disclamer: This is a prototyping branch. The commit history is not final.

Addresses issue #72 (closed)

I switched branches to !98

The corresponding problem to the last issues is https://gitlab.com/librespacefoundation/polaris/polaris/-/issues/103

Edited by Red Boumghar

Merge request reports