【报错解决01】分层抽样报错ValueError: The least populated class in y has only 1 member
使用pandas和train_test_split进行分层抽样,报错ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
·
以下程序解决的是python分层抽样问题,采用的数据集是一些股票的开盘价最高价等14个变量的信息。
根据网上的参考我的代码是这样的:
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
data=pd.read_csv('D:\PROGRAM\hs300if0915.txt',sep='\t')
stratified_sample, _ = train_test_split(data,train_size=0.2, stratify=data['open'])
return stratified_sample
但是会报错说
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
查了一下说train_test_split
仅应该关于离散或分类变量进行分层。(参考文章的http://cn.voidcc.com/question/p-efrnblbp-tp.html)
然后我把倒数第二行中的代码stratify=data['open']改为stratify=data['month']就成功了。
修改后的代码:
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
data=pd.read_csv('D:\PROGRAM\hs300if0915.txt',sep='\t')
stratified_sample, _ = train_test_split(data,train_size=0.2, stratify=data['month'])
print(stratified_sample)
附运行结果:
更多推荐
所有评论(0)