sklearn.exceptions.NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call ‘fit
标题sklearn.exceptions.NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call ‘fit’ with appropriate arguments before using this estimator. 问题处理处理泰坦尼克号代码时,发现一个博主的代码,在复现的过程中遇到的bug:i
·
标题 sklearn.exceptions.NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call ‘fit’ with appropriate arguments before using this estimator. 问题处理
处理泰坦尼克号代码时,发现一个博主的代码,在复现的过程中遇到的bug:
import numpy as np
import pandas as pd
df = pd.read_csv('C:\\Users\\Administrator\\Desktop\\titanic\\train.csv')
# print(df.shape)
# 数据预处理
df.drop(["PassengerId", "Cabin", "Name", "Ticket"], inplace=True, axis=1) # 将不在考量范围内的四个列删除
df["Age"] = df["Age"].fillna(df["Age"].mean()) # 将age列空的值填充为age列的平均值
df = df.dropna()
df["Sex"] = (df["Sex"] == "male").astype("int")
labels = df["Embarked"].unique().tolist()
df["Embarked"] = df["Embarked"].apply(lambda x: labels.index(x))
# print(df.isnull().sum())
# 建立模型
df_x = df.drop(['Survived'], axis=1) #丢掉这一列
df_y = df['Survived'] #只剩这一列
for i in [df_x, df_y]:
i.index = range(i.shape[0])
# 决策树
from sklearn import tree
# 交叉验证
from sklearn.model_selection import cross_val_score
# from sklearn.model_selection import GridSearchCV
#
clf = tree.DecisionTreeClassifier(criterion="entropy", random_state=25)
a=cross_val_score(clf, df_x, df_y, cv=10).mean()
#测试
df_test = pd.read_csv('C:\\Users\\Administrator\\Desktop\\titanic\\test.csv')
df_test.drop(["PassengerId","Cabin","Name","Ticket"],inplace=True,axis=1)
df_test["Age"] = df_test["Age"].fillna(df_test["Age"].mean())
df_test = df_test.dropna()
df_test["Sex"] = (df_test["Sex"]== "male").astype("int")
labels = df_test["Embarked"].unique().tolist()
df_test["Embarked"] = df_test["Embarked"].apply(lambda x: labels.index(x))
df_test['Survived']=pd.DataFrame(clf.predict(df_test))
df_survive = df_test['Survived'].value_counts(normalize=True)
n0_sample = df_survive[0]
n1_sample = df_survive[1]
print('死亡人数占{:.2%}; 幸存人数占{:.2%}'.format(n0_sample,n1_sample))
出错误的问题如下:
Traceback (most recent call last):
File "C:/Users/Administrator/Documents/WeChat Files/wxid_esk5nhrz913f22/FileStorage/File/2021-09/titanic.py", line 65, in <module>
df_test['Survived']=pd.DataFrame(clf.predict(df_test))
File "C:\xhx\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 441, in predict
check_is_fitted(self)
File "C:\xhx\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\xhx\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 1098, in check_is_fitted
raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This DecisionTreeClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
经查阅,错误的主要原因:
在进行测试数据之前没有对模型进行实例化,我的理解就是在测试数据之前,我们只是定义了决策树模型,而没有对决策树模型输入数据,进行训练,所以模型不知道怎么进行预测
修改:
clf.fit(df_x,df_y)
df_test['Survived']=pd.DataFrame(clf.predict(df_test))
在测试之前加入:clf.fit(df_x,df_y) 进行实例化,问题解决
代码参考连接:源码
更多推荐
已为社区贡献1条内容
所有评论(0)