Improve approach for finding optimal alpha in LASSO

Description

At the moment the optimal \alpha value for the LASSO fit method is obtained by scanning a range of 50 (fifty!) \alpha values. This is obviously computationally very inefficient. While one could employ a Bayesian approach we should first optimize the present implementation. A substantial improvement should be possible by replacing the scan with a standard optimizer method (e.g., Nelder-Mead simplex) to find the \alpha value that minimizes the RMSE over the training set. Such methods are readily available in scipy.

Sub-tasks

replace the alpha grid search in fit_lasso_optimize_alpha

 RMSE_path = []
 for i, alpha in enumerate(alphas):
     lasso.alpha = alpha
     cv_fold = []
     for train, test in kf.split(X):
         X_train, X_test = X[train], X[test]
         y_train, y_test = y[train], y[test]
         lasso.fit(X_train, y_train)
         RMSE = compute_rmse(X_test, lasso.coef_, y_test)
         cv_fold.append(RMSE)
     RMSE_path.append(np.mean(cv_fold))

 RMSE_path = np.array(RMSE_path)
 alpha_min = alphas[np.argmin(RMSE_path)]

with a call to Nelder-Mead simplex

from scipy.optimize import minimize
def lasso_with_fixed_alpha(...):
   ...
res = minimize(lasso_with_fixed_alpha, x0, method='nelder-mead',
               options={'xtol': 1e-8, 'disp': True})
...

carry out testing to determine suitable values for x0 (the starting guess) and xtol
provide some documentation of the performance compared to grid search in the discussion thread of this issue; the key parameter is the number of function calls

Demonstration

tests pass
documentation of performance analysis (in the discussion thread of this issue)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information