Tags give the ability to mark specific points in history as being important
-
-
-
-
-
-
2.1.2
28e659ca · ·- Fixed model monitoring metrics column order - Removed duplicate prediction_drift_status
-
-
2.1.0
b3b3dedd · ·- Added monitoring metrics functions to allow for model observability - Depreciated `model_metrics` - Updated testing pipeline to allow for local testing - Tidied up README
-
2.0.0
ff00d70e · ·ModelEvaluator Class * Comprehensive replacement for the older model_metrics function * Supports a wide variety of classification and regression models, including multi-class problems * Extensive visualizations (ROC, PR curves, lift charts, SHAP values) * Detailed metrics and performance analytics in a single, cohesive interface * Ability to add custom metrics, save plots, and export metrics to file Apply Functions * New functions to consistently transform new data using patterns from training: * apply_outliers() - Apply existing outlier limits * apply_missing_values() - Apply missing value handling * apply_dummy() - Apply existing dummy coding * These enable production pipelines to use identical transformations to training * Don't require building a separate .py file for scoring transformations - all transformations are handled directly in the configuration yaml file ConfigGenerator Class * Automatically creates scoring configuration files modularly * Supports nested parameters and complex configuration structures * Perfect for version controlling your model parameters and preprocessing steps Memory Optimization * New memory_optimization() function dramatically reduces DataFrame memory usage * Significantly reduces time to train XGBoost models by taking advantage of sparse arrays * Configurable precision modes to balance memory usage vs. numeric precision Other New Functions Added * generate_sql_trend_query() - Generate SQL for time-period analysis * trend_analysis() - Analyze time-series data for patterns Other Notable Improvements * Consistent return patterns (functions now return both data AND metadata) * Standardized function names and improved parameter handling * More robust outlier detection with skew adjustment options * Better missing value handling with more filling methods * Enhanced dummy coding with better prefix handling * Improved correlation/feature reduction with multiple correlation methods * Enhanced split_data() with stratification options and better sampling Breaking Changes * Many of the calls prior to 2.0.0 will not work correctly without slight modifications. Consult the documentation for exact changes. * inplace parameters removed from all functions to conform with pandas best practices * missing_fill() and missing_check() combined into missing_values() * dv_proxies() renamed to remove_outcome_proxies() for better clarity * memory_usage() renamed to memory_optimization -
-
-
-
-
-
-
-
-
1.0.19
e74993c3 · ·Improved supported for `model_metrics` outside of XGB and RF. Should work for all scikit-learn models now
-
-
1.0.17
223eac1b · ·- model_metrics: now fully supports xgboost. Additionally, you can specify an integer for deciles_n if you want to look at lift for non-decile splits (e.g. vigintile) - mad_outers, missing_fill, dummy_code, and dummy_top now support an additional argument output_file that will allow you to specify a file name to dump the python syntax to recreate the process elsewhere (in scoring for example). You could also do this via a pipeline, but from my experience using a hardcoded file like this helps ensure in a transparent, compatible, and easy-to-understand manner that the data prep and transformations that occured during training are also correctly applied to scoring in an automated fashion. More details in the README.