Created
January 3, 2016 08:46
-
-
Save bthirion/834b78c274e7f411665d to your computer and use it in GitHub Desktop.
sklearn's gcv is actually formally correct, i.e. equivalent to standard mse in leave-one-out, BUT in a setting (fit_intercept=False, data pre-centered) that implies that the result is wrong, at least when n_features >= n_samples - 1
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Experiment on gcv to understand the issue: | |
It is indeed equivalent to leave-one-out selection with mean_squared error | |
but in a setting without fit_intercept, and with y and X initially centered. | |
In theory, this is correct. | |
Author: Bertrand Thirion, 2016 | |
""" | |
import numpy as np | |
from sklearn import linear_model | |
from sklearn.cross_validation import LeaveOneOut | |
from sklearn.model_selection import GridSearchCV | |
from sklearn.utils.testing import assert_array_almost_equal | |
# set the parameters | |
alphas = np.logspace(-5, 5, 6) | |
n_samples, n_features = 10, 20 | |
# generate the data | |
np.random.seed([1]) | |
y, X = (np.random.randn(n_samples), np.random.randn(n_samples, n_features)) | |
X, y, _, _, _ = linear_model.Ridge._center_data(X, y, True) | |
for n_features_ in [5, 20]: | |
X_ = X[:, :n_features_] | |
# sklearn's GCV | |
gcv = linear_model.RidgeCV( | |
alphas=alphas, store_cv_values=True, gcv_mode='svd', | |
scoring='mean_squared_error').fit(X_, y) | |
gs = GridSearchCV(linear_model.Ridge(fit_intercept=False), | |
param_grid={'alpha': alphas}, | |
cv=LeaveOneOut(n_samples), scoring='mean_squared_error') | |
gs.fit(X_, y) | |
score_gs = np.array([x[1] for x in gs.grid_scores_]) | |
score_gcv = - np.mean((gcv.cv_values_.T - y) ** 2, 1) | |
assert_array_almost_equal(score_gs, score_gcv) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment