特征选择之Fisher Score算法思想及其python代码实现
一、算法思想1、特征选择特征选择是去除无关紧要或庸余的特征,仍然还保留其他原始特征,从而获得特征子集,从而以最小的性能损失更好地描述给出的问题。特征选择方法可以分为三个系列:过滤式选择、包裹式选择和嵌入式选择的方法 。本文介绍的Fisher Score即为过滤式的特征选择算法。关于过滤式的特征算法系列,可参考我的其他文章。特征选择之卡方检验特征选择之互信息2、Fisher score特征选择中的F
一、算法思想
1、特征选择
特征选择是去除无关紧要或庸余的特征,仍然还保留其他原始特征,从而获得特征子集,从而以最小的性能损失更好地描述给出的问题。
特征选择方法可以分为三个系列:过滤式选择、包裹式选择和嵌入式选择的方法 。
本文介绍的Fisher Score即为过滤式的特征选择算法。
关于过滤式的特征算法系列,可参考我的其他文章。
特征选择之卡方检验
2、Fisher score
特征选择中的Fisher Score
Fisher Score是特征选择的有效方法之一, 其主要思想是鉴别性能较强的特征表现为类内距离尽可能小, 类间距离尽可能大。
这个很好理解,在我们现实生活中也是如此,例如同一年龄层面的人间更有话题,而不同年龄层面的人之间就有代沟:同一个省市的人在一起就是老乡见老乡,不同省市的人之间可能就会有偏见与歧视。统一年龄层面的人属于同一类别,同一个省市的人属于同一类别,他们之间的类内距离更小:而不同年龄层,不同省市的人属于不同的类别,他们之间的类间距离更大。
有了这个例子的理解,我们继续往下看。
详细地说,给定一个 特征集合d,用 S 表示,fisher score 过滤式的特征选择的目标是选择一个特征子集m(m<d),由用T 表示,从而最大化某些数学标准F,
T ∗ = arg max T ⊆ S F ( T ) {}\mathcal{T}^{*}=\arg \max _{\mathcal{T} \subseteq \mathcal{S}} F(\mathcal{T}) T∗=argmaxT⊆SF(T) , s.t. ∣ T ∣ = m |\mathcal{T}|=m ∣T∣=m
其中 s.t.表示约束条件,| · |是集合的个数。
这个方程是一个组合优化问题。为 这个方程找到全局最佳解决方案是一个NP难度的问题。一种常见的启发式方法解决了这个问题,方法是首先根据标准 F 独立计算每个特征的分数,然后选择得分最高的 前m个特征。但是,启发式算法选择的特征也有缺点。一方面,由于启发式算法单独计算每个特征的分数,因此忽略了特征的组合,即将两个或多个特征合在一起进行评估。例如,特征 a 和特征 b 的fisher分数可能都很低,但组合 ab 的fisher分数非常高。在这种情况下,启发式算法将丢弃特征 a 和 b,尽管原则上应该选择它们。另一方面,它们无法处理冗余特征。例如,特征 a 和特征 b 的fisher分数非常高,但它们具有很强的相关性。在这种情况下,启发式算法将同时选择特征 a 和 b,而特征 a 或 b 可以被排除并且对后续的学习性能没有损失。
以上内容来自论文
Gu, Q., Z. Li, J. Han, Q. Gu, and Z. Li, 2012: Generalized fisher
score for feature selection. Uai.
由于解决这个NP难度的问题太复杂。本文主要是运用启发式的规则来单独计算每个特征的Fisher Score。
接下来就是Fisher Score的计算规则,需要一定的理解。
定义数据集中共有n个样本属于C个类ω1, ω2…, ωC, 每一类分别包含 n i n_i ni个样本。结合下面这个图表来理解。
特征1 | 特征2 | 特征3 | 类 | |
---|---|---|---|---|
样本1 | 2 | 1 | 3 | 0 |
样本2 | 4 | 5 | 7 | 1 |
样本3 | 7 | 3 | 0 | 0 |
样本4 | 9 | 2 | 5 | 0 |
样本5 | 18 | 5 | 3 | 1 |
这张图中的数据集中共有5个样本,属于两个类0、1。
0类包含样本1、样本3和样本4总共三个样本。
1类包含样本2和样本5总共两个样本。
定义 x ( k ) x^{(k)} x(k)表示样本x在第k个特征上的取值, m i ( k ) m_{i} ^{(k)} mi(k)表示第i类样本在第k个特征上的取值的均值, m ( k ) m ^{(k)} m(k)表示所有类别的样本在第k个特征上的取值的均值。(以上定义需要准确理解,若还是不太理解,继续看下面我结合上面的图表来理解。)
定义第k个特征在数据集上的类间方差为
S
B
(
k
)
S_{B}^{(k)}
SB(k),有
S
B
(
k
)
=
∑
i
=
1
C
n
i
n
(
m
i
(
k
)
−
m
(
k
)
)
2
S_{B}^{(k)}=\sum_{i=1}^{C} \frac{n_{i}}{n}\left(m_{i}^{(k)}-m^{(k)}\right)^{2}
SB(k)=i=1∑Cnni(mi(k)−m(k))2
定义第k个特征在数据集上的类内方差为
S
w
(
k
)
S_{w}^{(k)}
Sw(k),有
S
w
(
k
)
=
1
n
∑
i
=
1
C
∑
x
∈
ω
i
(
x
(
k
)
−
m
i
(
k
)
)
2
S_{w}^{(k)}=\frac{1}{n} \sum_{i=1}^{C} \sum_{x \in \omega_{i}}\left(x^{(k)}-m_{i}^{(k)}\right)^{2}
Sw(k)=n1i=1∑Cx∈ωi∑(x(k)−mi(k))2
最后我们定义第k个特征在数据集上的Fisher Score为 J f i s h e r ( k ) J_{fisher}(k) Jfisher(k),有
J f i s h e r ( k ) = S B ( k ) S w ( k ) J_{fisher}(k) =\frac {S_{B}^{(k)} }{ S_{w}^{(k)}} Jfisher(k)=Sw(k)SB(k)
这样,我们就可以求得第k个特征的Fisher Score。
ok,我们以上面那张图为例子,来求特征1的Fisher Score。
特征1 | 特征2 | 特征3 | 类 | |
---|---|---|---|---|
样本1 | 2 | 1 | 3 | 0 |
样本2 | 4 | 5 | 7 | 1 |
样本3 | 7 | 3 | 0 | 0 |
样本4 | 9 | 2 | 5 | 0 |
样本5 | 18 | 5 | 3 | 1 |
首先看
S
B
(
k
)
=
∑
i
=
1
C
n
i
n
(
m
i
(
k
)
−
m
(
k
)
)
2
S_{B}^{(k)}=\sum_{i=1}^{C} \frac{n_{i}}{n}\left(m_{i}^{(k)}-m^{(k)}\right)^{2}
SB(k)=i=1∑Cnni(mi(k)−m(k))2
其中n表示样本数,这里为5。 n i n_i ni表示第i类样本的个数。这里,0类样本的个数为3, 1类样本的个数为2。
m i ( k ) m_{i} ^{(k)} mi(k)表示第i类样本在第k个特征上的取值的均值。
那么 m 0 ( 1 ) m_{0} ^{(1)} m0(1)表示第0类样本在第1个特征上的取值的均值,由上图可知= ( 2 + 7 + 9 ) 3 \frac{(2+7+9)} {3} 3(2+7+9) =6
m 1 ( 1 ) m_{1} ^{(1)} m1(1)表示第1类样本在第1个特征上的取值的均值,由上图可知= ( 4 + 18 ) 2 \frac{(4+18)} {2} 2(4+18) = 11
m ( k ) m ^{(k)} m(k)表示所有类别的样本在第k个特征上的取值的均值
那么
m
(
1
)
m ^{(1)}
m(1)表示所有类别的样本在第1个特征上的取值的均值,则
m
(
1
)
m ^{(1)}
m(1)=
(
2
+
4
+
7
+
9
+
18
)
5
\frac{(2+4+7+9+18)} {5}
5(2+4+7+9+18) = 8
且这里n=5, n 0 n_0 n0=3, n 1 n_1 n1=2
S
B
(
1
)
S_{B}^{(1)}
SB(1)表示特征1的类间方差,则
S
B
(
1
)
S_{B}^{(1)}
SB(1)=
n
0
n
(
m
0
(
1
)
−
m
(
k
)
)
2
\frac {n_0} {n}(m_{0} ^{(1)} - m ^{(k)})^2
nn0(m0(1)−m(k))2 +
n
1
n
(
m
1
(
1
)
−
m
(
k
)
)
2
\frac {n_1} {n}(m_{1} ^{(1)} - m ^{(k)})^2
nn1(m1(1)−m(k))2=
3
5
∗
(
6
−
8
)
2
\frac{3} {5} * (6 - 8)^2
53∗(6−8)2 +
2
5
∗
(
11
−
8
)
2
\frac{2} {5} * (11- 8)^2
52∗(11−8)2=
12
5
\frac{12} {5}
512 +
18
5
\frac{18} {5}
518=6。
再来看
S
w
(
k
)
=
1
n
∑
i
=
1
C
∑
x
∈
ω
i
(
x
(
k
)
−
m
i
(
k
)
)
2
S_{w}^{(k)}=\frac{1}{n} \sum_{i=1}^{C} \sum_{x \in \omega_{i}}\left(x^{(k)}-m_{i}^{(k)}\right)^{2}
Sw(k)=n1i=1∑Cx∈ωi∑(x(k)−mi(k))2
这个复杂点。
首先n=5。
x
(
k
)
x^{(k)}
x(k)表示样本x在第k个特征上的取值
则,
x
(
1
)
x^{(1)}
x(1)表示样本x在第特征1的取值。由图可知分别为:2、4 、7 、9 、18。
而当
x
∈
ω
0
x \in \omega_{0}
x∈ω0,即样本属于0类样本时,
x
(
1
)
x^{(1)}
x(1)(当
x
∈
ω
0
x \in \omega_{0}
x∈ω0)为2、7 、9。
同理, x ( 1 ) x^{(1)} x(1)(当 x ∈ ω 1 x \in \omega_{1} x∈ω1)为4、18。
且由上面已经算得 m 0 ( 1 ) m_{0} ^{(1)} m0(1)=6, m 1 ( 1 ) m_{1} ^{(1)} m1(1)=11。
S
w
(
1
)
S_{w}^{(1)}
Sw(1)表示特征1的类内方差,则
S
w
(
1
)
=
1
n
∑
i
=
1
C
∑
x
∈
ω
i
(
x
(
1
)
−
m
i
(
1
)
)
2
S_{w}^{(1)}=\frac{1}{n} \sum_{i=1}^{C} \sum_{x \in \omega_{i}}\left(x^{(1)}-m_{i}^{(1)}\right)^{2}
Sw(1)=n1∑i=1C∑x∈ωi(x(1)−mi(1))2=
1
5
∗
(
(
2
−
6
)
2
+
(
7
−
6
)
2
+
(
9
−
6
)
2
+
(
4
−
11
)
2
+
(
18
−
11
)
2
)
\frac{1}{5}*( (2 - 6)^2+(7-6)^2 + (9-6)^2 +(4-11)^2 + (18-11)^2)
51∗((2−6)2+(7−6)2+(9−6)2+(4−11)2+(18−11)2)=
124
5
\frac{124}{5}
5124
终于,我们计算出了特征1的Fisher Score,
J
f
i
s
h
e
r
(
1
)
=
S
B
(
1
)
S
w
(
1
)
J_{fisher}(1) =\frac {S_{B}^{(1)} }{ S_{w}^{(1)}}
Jfisher(1)=Sw(1)SB(1)=
6
124
/
5
\frac{6} {124/5}
124/56=
15
62
\frac{15}{62}
6215。
当我们重复上面的计算规则,计算出特征2,特征3的Fisher Score后就算大功告成了!
ok,这里我感觉我是讲解得很清楚了详细了,如果还是不太懂的话,可以重新认真看看上面计算的过程。认真计算后,你就能体会到为什么一个是叫类间方差,一个叫类内方差。
最后,我们运用的是启发式的规则来单独计算每个特征的Fisher Score,因此我们需要计算出数据集中所有的特征的Fisher Score,进行排名。
而Fisher Score的主要思想是鉴别性能较强的特征表现为类内距离尽可能小, 类间距离尽可能大。
那么当类间方差越大,类内方差越小时,Fisher Score就越大。因此排名是根据从大到小的顺序进行排名。排名越靠前的特征在理论上鉴别能力越强。(当然存在我前文所叙述的缺点)
二、代码实现
1、github上大佬实现的代码
我整理了下,代码如下:
首先是construct_W.py
import numpy as np
from scipy.sparse import *
from sklearn.metrics.pairwise import pairwise_distances
def construct_W(X, **kwargs):
"""
Construct the affinity matrix W through different ways
Notes
-----
if kwargs is null, use the default parameter settings;
if kwargs is not null, construct the affinity matrix according to parameters in kwargs
Input
-----
X: {numpy array}, shape (n_samples, n_features)
input data
kwargs: {dictionary}
parameters to construct different affinity matrix W:
y: {numpy array}, shape (n_samples, 1)
the true label information needed under the 'supervised' neighbor mode
metric: {string}
choices for different distance measures
'euclidean' - use euclidean distance
'cosine' - use cosine distance (default)
neighbor_mode: {string}
indicates how to construct the graph
'knn' - put an edge between two nodes if and only if they are among the
k nearest neighbors of each other (default)
'supervised' - put an edge between two nodes if they belong to same class
and they are among the k nearest neighbors of each other
weight_mode: {string}
indicates how to assign weights for each edge in the graph
'binary' - 0-1 weighting, every edge receives weight of 1 (default)
'heat_kernel' - if nodes i and j are connected, put weight W_ij = exp(-norm(x_i - x_j)/2t^2)
this weight mode can only be used under 'euclidean' metric and you are required
to provide the parameter t
'cosine' - if nodes i and j are connected, put weight cosine(x_i,x_j).
this weight mode can only be used under 'cosine' metric
k: {int}
choices for the number of neighbors (default k = 5)
t: {float}
parameter for the 'heat_kernel' weight_mode
fisher_score: {boolean}
indicates whether to build the affinity matrix in a fisher score way, in which W_ij = 1/n_l if yi = yj = l;
otherwise W_ij = 0 (default fisher_score = false)
reliefF: {boolean}
indicates whether to build the affinity matrix in a reliefF way, NH(x) and NM(x,y) denotes a set of
k nearest points to x with the same class as x, and a different class (the class y), respectively.
W_ij = 1 if i = j; W_ij = 1/k if x_j \in NH(x_i); W_ij = -1/(c-1)k if x_j \in NM(x_i, y) (default reliefF = false)
Output
------
W: {sparse matrix}, shape (n_samples, n_samples)
output affinity matrix W
"""
# default metric is 'cosine'
if 'metric' not in kwargs.keys():
kwargs['metric'] = 'cosine'
# default neighbor mode is 'knn' and default neighbor size is 5
if 'neighbor_mode' not in kwargs.keys():
kwargs['neighbor_mode'] = 'knn'
if kwargs['neighbor_mode'] == 'knn' and 'k' not in kwargs.keys():
kwargs['k'] = 5
if kwargs['neighbor_mode'] == 'supervised' and 'k' not in kwargs.keys():
kwargs['k'] = 5
if kwargs['neighbor_mode'] == 'supervised' and 'y' not in kwargs.keys():
print ('Warning: label is required in the supervised neighborMode!!!')
exit(0)
# default weight mode is 'binary', default t in heat kernel mode is 1
if 'weight_mode' not in kwargs.keys():
kwargs['weight_mode'] = 'binary'
if kwargs['weight_mode'] == 'heat_kernel':
if kwargs['metric'] != 'euclidean':
kwargs['metric'] = 'euclidean'
if 't' not in kwargs.keys():
kwargs['t'] = 1
elif kwargs['weight_mode'] == 'cosine':
if kwargs['metric'] != 'cosine':
kwargs['metric'] = 'cosine'
# default fisher_score and reliefF mode are 'false'
if 'fisher_score' not in kwargs.keys():
kwargs['fisher_score'] = False
if 'reliefF' not in kwargs.keys():
kwargs['reliefF'] = False
n_samples, n_features = np.shape(X)
# choose 'knn' neighbor mode
if kwargs['neighbor_mode'] == 'knn':
k = kwargs['k']
if kwargs['weight_mode'] == 'binary':
if kwargs['metric'] == 'euclidean':
# compute pairwise euclidean distances
D = pairwise_distances(X)
D **= 2
# sort the distance matrix D in ascending order
dump = np.sort(D, axis=1)
idx = np.argsort(D, axis=1)
# choose the k-nearest neighbors for each instance
idx_new = idx[:, 0:k+1]
G = np.zeros((n_samples*(k+1), 3))
G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
G[:, 1] = np.ravel(idx_new, order='F')
G[:, 2] = 1
# build the sparse affinity matrix W
W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
bigger = np.transpose(W) > W
W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
return W
elif kwargs['metric'] == 'cosine':
# normalize the data first
X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
for i in range(n_samples):
X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
# compute pairwise cosine distances
D_cosine = np.dot(X, np.transpose(X))
# sort the distance matrix D in descending order
dump = np.sort(-D_cosine, axis=1)
idx = np.argsort(-D_cosine, axis=1)
idx_new = idx[:, 0:k+1]
G = np.zeros((n_samples*(k+1), 3))
G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
G[:, 1] = np.ravel(idx_new, order='F')
G[:, 2] = 1
# build the sparse affinity matrix W
W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
bigger = np.transpose(W) > W
W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
return W
elif kwargs['weight_mode'] == 'heat_kernel':
t = kwargs['t']
# compute pairwise euclidean distances
D = pairwise_distances(X)
D **= 2
# sort the distance matrix D in ascending order
dump = np.sort(D, axis=1)
idx = np.argsort(D, axis=1)
idx_new = idx[:, 0:k+1]
dump_new = dump[:, 0:k+1]
# compute the pairwise heat kernel distances
dump_heat_kernel = np.exp(-dump_new/(2*t*t))
G = np.zeros((n_samples*(k+1), 3))
G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
G[:, 1] = np.ravel(idx_new, order='F')
G[:, 2] = np.ravel(dump_heat_kernel, order='F')
# build the sparse affinity matrix W
W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
bigger = np.transpose(W) > W
W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
return W
elif kwargs['weight_mode'] == 'cosine':
# normalize the data first
X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
for i in range(n_samples):
X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
# compute pairwise cosine distances
D_cosine = np.dot(X, np.transpose(X))
# sort the distance matrix D in ascending order
dump = np.sort(-D_cosine, axis=1)
idx = np.argsort(-D_cosine, axis=1)
idx_new = idx[:, 0:k+1]
dump_new = -dump[:, 0:k+1]
G = np.zeros((n_samples*(k+1), 3))
G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
G[:, 1] = np.ravel(idx_new, order='F')
G[:, 2] = np.ravel(dump_new, order='F')
# build the sparse affinity matrix W
W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
bigger = np.transpose(W) > W
W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
return W
# choose supervised neighborMode
elif kwargs['neighbor_mode'] == 'supervised':
k = kwargs['k']
# get true labels and the number of classes
y = kwargs['y']
label = np.unique(y)
n_classes = np.unique(y).size
# construct the weight matrix W in a fisherScore way, W_ij = 1/n_l if yi = yj = l, otherwise W_ij = 0
if kwargs['fisher_score'] is True:
W = lil_matrix((n_samples, n_samples))
for i in range(n_classes):
class_idx = (y == label[i])
class_idx_all = (class_idx[:, np.newaxis] & class_idx[np.newaxis, :])
W[class_idx_all] = 1.0/np.sum(np.sum(class_idx))
return W
# construct the weight matrix W in a reliefF way, NH(x) and NM(x,y) denotes a set of k nearest
# points to x with the same class as x, a different class (the class y), respectively. W_ij = 1 if i = j;
# W_ij = 1/k if x_j \in NH(x_i); W_ij = -1/(c-1)k if x_j \in NM(x_i, y)
if kwargs['reliefF'] is True:
# when xj in NH(xi)
G = np.zeros((n_samples*(k+1), 3))
id_now = 0
for i in range(n_classes):
class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
D = pairwise_distances(X[class_idx, :])
D **= 2
idx = np.argsort(D, axis=1)
idx_new = idx[:, 0:k+1]
n_smp_class = (class_idx[idx_new[:]]).size
if len(class_idx) <= k:
k = len(class_idx) - 1
G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
G[id_now:n_smp_class+id_now, 2] = 1.0/k
id_now += n_smp_class
W1 = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
# when i = j, W_ij = 1
for i in range(n_samples):
W1[i, i] = 1
# when x_j in NM(x_i, y)
G = np.zeros((n_samples*k*(n_classes - 1), 3))
id_now = 0
for i in range(n_classes):
class_idx1 = np.column_stack(np.where(y == label[i]))[:, 0]
X1 = X[class_idx1, :]
for j in range(n_classes):
if label[j] != label[i]:
class_idx2 = np.column_stack(np.where(y == label[j]))[:, 0]
X2 = X[class_idx2, :]
D = pairwise_distances(X1, X2)
idx = np.argsort(D, axis=1)
idx_new = idx[:, 0:k]
n_smp_class = len(class_idx1)*k
G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx1, (k, 1)).reshape(-1)
G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx2[idx_new[:]], order='F')
G[id_now:n_smp_class+id_now, 2] = -1.0/((n_classes-1)*k)
id_now += n_smp_class
W2 = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
bigger = np.transpose(W2) > W2
W2 = W2 - W2.multiply(bigger) + np.transpose(W2).multiply(bigger)
W = W1 + W2
return W
if kwargs['weight_mode'] == 'binary':
if kwargs['metric'] == 'euclidean':
G = np.zeros((n_samples*(k+1), 3))
id_now = 0
for i in range(n_classes):
class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
# compute pairwise euclidean distances for instances in class i
D = pairwise_distances(X[class_idx, :])
D **= 2
# sort the distance matrix D in ascending order for instances in class i
idx = np.argsort(D, axis=1)
idx_new = idx[:, 0:k+1]
n_smp_class = len(class_idx)*(k+1)
G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
G[id_now:n_smp_class+id_now, 2] = 1
id_now += n_smp_class
# build the sparse affinity matrix W
W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
bigger = np.transpose(W) > W
W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
return W
if kwargs['metric'] == 'cosine':
# normalize the data first
X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
for i in range(n_samples):
X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
G = np.zeros((n_samples*(k+1), 3))
id_now = 0
for i in range(n_classes):
class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
# compute pairwise cosine distances for instances in class i
D_cosine = np.dot(X[class_idx, :], np.transpose(X[class_idx, :]))
# sort the distance matrix D in descending order for instances in class i
idx = np.argsort(-D_cosine, axis=1)
idx_new = idx[:, 0:k+1]
n_smp_class = len(class_idx)*(k+1)
G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
G[id_now:n_smp_class+id_now, 2] = 1
id_now += n_smp_class
# build the sparse affinity matrix W
W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
bigger = np.transpose(W) > W
W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
return W
elif kwargs['weight_mode'] == 'heat_kernel':
G = np.zeros((n_samples*(k+1), 3))
id_now = 0
for i in range(n_classes):
class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
# compute pairwise cosine distances for instances in class i
D = pairwise_distances(X[class_idx, :])
D **= 2
# sort the distance matrix D in ascending order for instances in class i
dump = np.sort(D, axis=1)
idx = np.argsort(D, axis=1)
idx_new = idx[:, 0:k+1]
dump_new = dump[:, 0:k+1]
t = kwargs['t']
# compute pairwise heat kernel distances for instances in class i
dump_heat_kernel = np.exp(-dump_new/(2*t*t))
n_smp_class = len(class_idx)*(k+1)
G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
G[id_now:n_smp_class+id_now, 2] = np.ravel(dump_heat_kernel, order='F')
id_now += n_smp_class
# build the sparse affinity matrix W
W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
bigger = np.transpose(W) > W
W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
return W
elif kwargs['weight_mode'] == 'cosine':
# normalize the data first
X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
for i in range(n_samples):
X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
G = np.zeros((n_samples*(k+1), 3))
id_now = 0
for i in range(n_classes):
class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
# compute pairwise cosine distances for instances in class i
D_cosine = np.dot(X[class_idx, :], np.transpose(X[class_idx, :]))
# sort the distance matrix D in descending order for instances in class i
dump = np.sort(-D_cosine, axis=1)
idx = np.argsort(-D_cosine, axis=1)
idx_new = idx[:, 0:k+1]
dump_new = -dump[:, 0:k+1]
n_smp_class = len(class_idx)*(k+1)
G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
G[id_now:n_smp_class+id_now, 2] = np.ravel(dump_new, order='F')
id_now += n_smp_class
# build the sparse affinity matrix W
W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
bigger = np.transpose(W) > W
W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
return W
然后是主函数,
import numpy as np
from scipy.sparse import *
from construct_W import construct_W
def fisher_score(X, y):
"""
This function implements the fisher score feature selection, steps are as follows:
1. Construct the affinity matrix W in fisher score way
2. For the r-th feature, we define fr = X(:,r), D = diag(W*ones), ones = [1,...,1]', L = D - W
3. Let fr_hat = fr - (fr'*D*ones)*ones/(ones'*D*ones)
4. Fisher score for the r-th feature is score = (fr_hat'*D*fr_hat)/(fr_hat'*L*fr_hat)-1
Input
-----
X: {numpy array}, shape (n_samples, n_features)
input data
y: {numpy array}, shape (n_samples,)
input class labels
Output
------
score: {numpy array}, shape (n_features,)
fisher score for each feature
Reference
---------
He, Xiaofei et al. "Laplacian Score for Feature Selection." NIPS 2005.
Duda, Richard et al. "Pattern classification." John Wiley & Sons, 2012.
"""
# Construct weight matrix W in a fisherScore way
kwargs = {"neighbor_mode": "supervised", "fisher_score": True, 'y': y}
W = construct_W(X, **kwargs)
# build the diagonal D matrix from affinity matrix W
D = np.array(W.sum(axis=1))
L = W
tmp = np.dot(np.transpose(D), X)
D = diags(np.transpose(D), [0])
Xt = np.transpose(X)
t1 = np.transpose(np.dot(Xt, D.todense()))
t2 = np.transpose(np.dot(Xt, L.todense()))
# compute the numerator of Lr
D_prime = np.sum(np.multiply(t1, X), 0) - np.multiply(tmp, tmp)/D.sum()
# compute the denominator of Lr
L_prime = np.sum(np.multiply(t2, X), 0) - np.multiply(tmp, tmp)/D.sum()
# avoid the denominator of Lr to be 0
D_prime[D_prime < 1e-12] = 10000
lap_score = 1 - np.array(np.multiply(L_prime, 1/D_prime))[0, :]
# compute fisher score from laplacian score, where fisher_score = 1/lap_score - 1
score = 1.0/lap_score - 1
return np.transpose(score)
def feature_ranking(score):
"""
Rank features in descending order according to fisher score, the larger the fisher score, the more important the
feature is
"""
idx = np.argsort(score, 0)
return idx[::-1]
labels_pred=[[2,2,3,4,5,7],[3,0,1,1,2,4],[4,5,6,9,7,10]]
labels_true=[0,0,1]
z=fisher_score(labels_pred,labels_true)
print(z)
输出:
[3. 5.33333333 5.33333333 6.25925926 1.81481481 3. ]
可以看到,输入参数为特征矩阵和标签类列表。输出为特征的Fisher Score值的列表。
大佬的github源代码地址↓
源代码地址
自己实现的代码
话不多说,上代码。
import pandas as pd
import numpy as np
#标签只能为0和1,特征空间任意,
#samle:n*m 的列表
#label:m*1 的列表
#返回每个特征的fisher score值的一个列表
#eg: sample = [[1,2,3],[1,0,1],[1,5,6]]
# label = [1, 0, 1]
#return lst=[nan, 1.8148148148148149, 1.8148148148148149]
def binary_fisher_score(sample,label):
if len(sample) != len(label):
print('Sample does not match label')
exit()
df1 = pd.DataFrame(sample)
df2 = pd.DataFrame(label, columns=['label'])
data = pd.concat([df1, df2], axis=1) # 合并成为一个dataframe
data0 = data[data.label == 0]#对标签分类,分成包含0和1的两个dataframe
data1 = data[data.label == 1]
n = len(label)#标签长度
n1 = sum(label)#1类标签的个数
n0 = n - n1#0类标签的个数
lst = []#用于返回的列表
features_list = list(data.columns)[:-1]
for feature in features_list:
# 算关于data0
m0_feature_mean = data0[feature].mean() # 0类标签在第m维上的均值
# 0类在第m维上的sw
m0_SW=sum((data0[feature] -m0_feature_mean )**2)
# 算关于data1
m1_feature_mean = data1[feature].mean() # 1类标签在第m维上的均值
# 1类在第m维上的sw
m1_SW=sum((data1[feature] -m1_feature_mean )**2)
# 算关于data
m_all_feature_mean = data[feature].mean() # 所有类标签在第m维上的均值
m0_SB = n0 / n * (m0_feature_mean - m_all_feature_mean) ** 2
m1_SB = n1 / n * (m1_feature_mean - m_all_feature_mean) ** 2
#计算SB
m_SB = m1_SB + m0_SB
#计算SW
m_SW = (m0_SW + m1_SW) / n
if m_SW == 0:
# 0/0类型也是返回nan
m_fisher_score = np.nan
else:
# 计算Fisher score
m_fisher_score = m_SB / m_SW
#Fisher score值添加进列表
lst.append(m_fisher_score)
return lst
sample = [[1,2,3],[1,0,1],[1,5,6]]
label = [1, 0, 1]
print(binary_fisher_score(sample,label))
输出:
[nan, 1.8148148148148149, 1.8148148148148149]
以上代码只是简单的实现了标签的值只能为0和1的情况,因此输入的参数的值请读者注意。当然标签的值大小并不重要,因为只是起到一个区分不同类别的作用。
后续我会再更新出一般的对于n个值的标签都适用的Fisher Score 的python实现。
希望本文对你有帮助!
更多推荐
所有评论(0)