以下程序解决的是python分层抽样问题,采用的数据集是一些股票的开盘价最高价等14个变量的信息。

根据网上的参考我的代码是这样的:

from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
data=pd.read_csv('D:\PROGRAM\hs300if0915.txt',sep='\t')
stratified_sample, _ = train_test_split(data,train_size=0.2, stratify=data['open']) 
return stratified_sample

但是会报错说

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

查了一下说train_test_split仅应该关于离散或分类变量进行分层。(参考文章的http://cn.voidcc.com/question/p-efrnblbp-tp.html

然后我把倒数第二行中的代码stratify=data['open']改为stratify=data['month']就成功了。

修改后的代码:

from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
data=pd.read_csv('D:\PROGRAM\hs300if0915.txt',sep='\t')
stratified_sample, _ = train_test_split(data,train_size=0.2, stratify=data['month']) 
print(stratified_sample)

附运行结果:

 

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐