1.重要参数criterion
mse | |
friedman_mse | ? |
mae | ? |
mse:
在回归树中,mse不仅是分支质量衡量标准,也是我们常用的衡量回归树回归质量的衡量指标,但是回归树接口返回的时R平方,并不是mse。
2.(扩展)交叉验证--多次划分训练集,测试集多次计算模型的稳定性
from sklearn.datasets import load_boston
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor
boston = load_boston()
regressor = DecisionTreeRegressor(random_state=0)
cross_val_score(regressor
,boston.data
,boston.target
,cv=10
,scoring='neg_mean_squared_error'
3.一维回归曲线绘制
1.导库
import numpy as np
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pylot as plt
2.创建一条含有噪音的正弦曲线
rng=np.random.RandomState(1) #生成随机数种子
rng.rand(80,1) #生成8行10列个0,1之间随机数,特征必须是二维的
X = np.sort(5*rng.rand(80,1),axis=0)
y = np.sin(X).ravel() #标签必须是一维
y[::5] += 3*(0.5 - rng.rand(16)) #转化为-0.5,0.5之间
3.实例化,训练模型
regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_1.fit(X, y)
regr_2.fit(X, y)
4.生成测试集,导入模型
x_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis] #生成(起始点,终止点,步长);升维度
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)
5.绘图
plt.figure()
plt.scatter(X, y, s=20, edgecolor="black",c="darkorange", label="data")
plt.plot(X_test, y_1, color="cornflowerblue",label="max_depth=2", linewidth=2)
plt.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Decision Tree Regression")
plt.legend()
plt.show()
4.(扩展)用网格搜索调整参数
import numpy as np
from sklearn.model_selection import GridSearchCV
gini_thresholds = np.linspace(0,0.5,20)
parameters = {
"splitter":("best","random")
,"criterion":("gini","entropy")
,"max_depth":[*range(1,10)]
,"min_samples_leaf":[*range(1,50,5)]
,"min_impurity_decrease":[*np.linspace(0,0.5,20)]
}
clf = DecisionTreeRegressor(random_state=20)
GS = GridSearchCV(clf, parameters, cv=10)
GS.fit(Xtrain,Ytrain)
GS.best_params_
GS.best_score_
5.总结:决策树的优缺点
优点 | 1.可回归可分类 2.白盒模型容易观察给定情况 3.可以处理多标签问题 |
缺点 | 1.过拟合产生复杂树,调参困难 2.靠局部优化试图达到整体最优,但无法确定,而且不稳定,容易微小变化改变树 |
更多推荐