使用LightGBM.feature_importance()函数给训练完毕的LightGBM模型的各特征进行重要性排序。

feature_importance             = pd.DataFrame()
feature_importance['fea_name'] = train_features
feature_importance['fea_imp']  = clf.feature_importance()
feature_importance             = feature_importance.sort_values('fea_imp',ascending = False)

plt.figure(figsize=[20,10],dpi=100)
ax = sns.barplot(x = feature_importance['fea_name'], y = feature_importance['fea_imp'])
ax.set_xticklabels(labels = ['file_id_api_nunique','file_id_api_count','file_id_tid_max','file_id_tid_mean','file_id_tid_min','file_id_tid_std','file_id_index_mean','file_id_tid_nunique','file_id_index_nunique','file_id_index_std','file_id_index_max','file_id_tid_count','file_id_index_count','file_id_index_min'],
                                    rotation = 45,fontsize = 15)
ax.set_yticklabels(labels = [0,2000,4000,6000,8000,10000,12000,14000,16000],fontsize = 15)
plt.xlabel('fea_name',fontsize=18)
plt.ylabel('fea_imp',fontsize=18)
# plt.tight_layout()
plt.savefig('D:/A_graduation_project/pictures/2_baseline1/特征重要性')

官方文档

feature_importance(importance_type='split', iteration=-1)

Get feature importances.

  • Parameters:
    • importance_type (string__, optional (default="split")) – How the importance is calculated. If “split”, result contains numbers of times the feature is used in a model. If “gain”, result contains total gains of splits which use the feature.
    • iteration (int or None, optional (default=None)) –  Limit number of iterations in the feature importance calculation. If None, if the best iteration exists,  it is used; otherwise, all trees are used. If <= 0, all trees are used(no limits).
  • Returns:
    • result – Array with feature importances.
  • Return type:
    • numpy array

 

 

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐