2.使用scikit-learn和Python进行超参数调整(Python代码,包括数据集)
立即下载
资源介绍:
在本教程中,您将学习如何使用 scikit-learn 和 Python 调整模型超参数。
我们将从讨论什么是超参数调整以及它为什么如此重要来开始本教程。
从那里,我们将配置您的开发环境并检查项目目录结构。
然后我们将执行三个 Python 脚本:
1.无需调整超参数即可训练模型(这样我们就可以获得基线)
2.一种是利用一种称为“网格搜索”的算法来详尽检查所有超参数组合的方法——这种方法保证对超参数值进行全面扫描,但速度也很慢
3.最后一种方法是使用“随机搜索”,从分布中抽取各种超参数(不能保证覆盖所有超参数值,但在实践中通常与网格搜索一样准确,而且运行速度更快)
# USAGE
# python train_svr_random.py
# import the necessary packages
from pyimagesearch import config
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import RepeatedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from scipy.stats import loguniform
import pandas as pd
# load the dataset, separate the features and labels, and perform a
# training and testing split using 85% of the data for training and
# 15% for evaluation
print("[INFO] loading data...")
dataset = pd.read_csv(config.CSV_PATH, names=config.COLS)
dataX = dataset[dataset.columns[:-1]]
dataY = dataset[dataset.columns[-1]]
(trainX, testX, trainY, testY) = train_test_split(dataX,
dataY, random_state=3, test_size=0.15)
# standardize the feature values by computing the mean, subtracting
# the mean from the data points, and then dividing by the standard
# deviation
scaler = StandardScaler()
trainX = scaler.fit_transform(trainX)
testX = scaler.transform(testX)
# initialize model and define the space of the hyperparameters to
# perform the grid-search over
model = SVR()
kernel = ["linear", "rbf", "sigmoid", "poly"]
tolerance = loguniform(1e-6, 1e-3)
C = [1, 1.5, 2, 2.5, 3]
grid = dict(kernel=kernel, tol=tolerance, C=C)
# initialize a cross-validation fold and perform a grid-search to
# tune the hyperparameters
print("[INFO] grid searching over the hyperparameters...")
cvFold = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
randomSearch = RandomizedSearchCV(estimator=model, n_jobs=-1,
cv=cvFold, param_distributions=grid,
scoring="neg_mean_squared_error")
searchResults = randomSearch.fit(trainX, trainY)
# extract the best model and evaluate it
print("[INFO] evaluating...")
bestModel = searchResults.best_estimator_
print("R2: {:.2f}".format(bestModel.score(testX, testY)))