特征选择之Fisher Score算法思想及其python代码实现

一、算法思想1、特征选择特征选择是去除无关紧要或庸余的特征，仍然还保留其他原始特征，从而获得特征子集，从而以最小的性能损失更好地描述给出的问题。特征选择方法可以分为三个系列：过滤式选择、包裹式选择和嵌入式选择的方法。本文介绍的Fisher Score即为过滤式的特征选择算法。关于过滤式的特征算法系列，可参考我的其他文章。特征选择之卡方检验特征选择之互信息2、Fisher score特征选择中的F

亨少德小迷弟

15728人浏览 · 2021-07-16 17:10:54

亨少德小迷弟 · 2021-07-16 17:10:54 发布

一、算法思想

1、特征选择

特征选择是去除无关紧要或庸余的特征，仍然还保留其他原始特征，从而获得特征子集，从而以最小的性能损失更好地描述给出的问题。

特征选择方法可以分为三个系列：过滤式选择、包裹式选择和嵌入式选择的方法。

本文介绍的Fisher Score即为过滤式的特征选择算法。

关于过滤式的特征算法系列，可参考我的其他文章。
特征选择之卡方检验

特征选择之互信息

2、Fisher score

特征选择中的Fisher Score

Fisher Score是特征选择的有效方法之一, 其主要思想是鉴别性能较强的特征表现为类内距离尽可能小, 类间距离尽可能大。

这个很好理解，在我们现实生活中也是如此，例如同一年龄层面的人间更有话题，而不同年龄层面的人之间就有代沟：同一个省市的人在一起就是老乡见老乡，不同省市的人之间可能就会有偏见与歧视。统一年龄层面的人属于同一类别，同一个省市的人属于同一类别，他们之间的类内距离更小：而不同年龄层，不同省市的人属于不同的类别，他们之间的类间距离更大。

有了这个例子的理解，我们继续往下看。

详细地说，给定一个特征集合d,用 S 表示，fisher score 过滤式的特征选择的目标是选择一个特征子集m(m<d)，由用T 表示，从而最大化某些数学标准F，

${}\mathcal{T}^{*}=\arg \max _{\mathcal{T} \subseteq \mathcal{S}} F(\mathcal{T})$ ， s.t. $|\mathcal{T}|=m$

其中 s.t.表示约束条件，| · |是集合的个数。

这个方程是一个组合优化问题。为这个方程找到全局最佳解决方案是一个NP难度的问题。一种常见的启发式方法解决了这个问题，方法是首先根据标准 F 独立计算每个特征的分数，然后选择得分最高的前m个特征。但是，启发式算法选择的特征也有缺点。一方面，由于启发式算法单独计算每个特征的分数，因此忽略了特征的组合，即将两个或多个特征合在一起进行评估。例如，特征 a 和特征 b 的fisher分数可能都很低，但组合 ab 的fisher分数非常高。在这种情况下，启发式算法将丢弃特征 a 和 b，尽管原则上应该选择它们。另一方面，它们无法处理冗余特征。例如，特征 a 和特征 b 的fisher分数非常高，但它们具有很强的相关性。在这种情况下，启发式算法将同时选择特征 a 和 b，而特征 a 或 b 可以被排除并且对后续的学习性能没有损失。

以上内容来自论文

Gu, Q., Z. Li, J. Han, Q. Gu, and Z. Li, 2012: Generalized fisher
score for feature selection. Uai.

由于解决这个NP难度的问题太复杂。本文主要是运用启发式的规则来单独计算每个特征的Fisher Score。

接下来就是Fisher Score的计算规则，需要一定的理解。

定义数据集中共有n个样本属于C个类ω1, ω2…, ωC, 每一类分别包含 $n_i$ 个样本。结合下面这个图表来理解。

	特征1	特征2	特征3	类
样本1	2	1	3	0
样本2	4	5	7	1
样本3	7	3	0	0
样本4	9	2	5	0
样本5	18	5	3	1

这张图中的数据集中共有5个样本，属于两个类0、1。

0类包含样本1、样本3和样本4总共三个样本。
1类包含样本2和样本5总共两个样本。

定义 $x^{(k)}$ 表示样本x在第k个特征上的取值, $m_{i} ^{(k)}$ 表示第i类样本在第k个特征上的取值的均值， $m ^{(k)}$ 表示所有类别的样本在第k个特征上的取值的均值。（以上定义需要准确理解，若还是不太理解，继续看下面我结合上面的图表来理解。）

定义第k个特征在数据集上的类间方差为 $S_{B}^{(k)}$ ，有
$S_{B}^{(k)}=\sum_{i=1}^{C} \frac{n_{i}}{n}\left(m_{i}^{(k)}-m^{(k)}\right)^{2}$

定义第k个特征在数据集上的类内方差为 $S_{w}^{(k)}$ ,有
$S_{w}^{(k)}=\frac{1}{n} \sum_{i=1}^{C} \sum_{x \in \omega_{i}}\left(x^{(k)}-m_{i}^{(k)}\right)^{2}$

最后我们定义第k个特征在数据集上的Fisher Score为 $J_{fisher}(k)$ ，有

$J_{fisher}(k) =\frac {S_{B}^{(k)} }{ S_{w}^{(k)}}$

这样，我们就可以求得第k个特征的Fisher Score。

ok,我们以上面那张图为例子，来求特征1的Fisher Score。

	特征1	特征2	特征3	类
样本1	2	1	3	0
样本2	4	5	7	1
样本3	7	3	0	0
样本4	9	2	5	0
样本5	18	5	3	1

首先看
$S_{B}^{(k)}=\sum_{i=1}^{C} \frac{n_{i}}{n}\left(m_{i}^{(k)}-m^{(k)}\right)^{2}$

其中n表示样本数，这里为5。 $n_i$ 表示第i类样本的个数。这里，0类样本的个数为3, 1类样本的个数为2。

$m_{i} ^{(k)}$ 表示第i类样本在第k个特征上的取值的均值。

那么 $m_{0} ^{(1)}$ 表示第0类样本在第1个特征上的取值的均值，由上图可知= $\frac{（2+7+9）} {3}$ =6

$m_{1} ^{(1)}$ 表示第1类样本在第1个特征上的取值的均值，由上图可知= $\frac{（4+18）} {2}$ = 11

$m ^{(k)}$ 表示所有类别的样本在第k个特征上的取值的均值

那么 $m ^{(1)}$ 表示所有类别的样本在第1个特征上的取值的均值，则
$m ^{(1)}$ = $\frac{（2+4+7+9+18）} {5}$ = 8

且这里n=5， $n_0$ =3, $n_1$ =2

$S_{B}^{(1)}$ 表示特征1的类间方差，则
$S_{B}^{(1)}$ = $\frac {n_0} {n}(m_{0} ^{(1)} - m ^{(k)})^2$ + $\frac {n_1} {n}(m_{1} ^{(1)} - m ^{(k)})^2$ =
$\frac{3} {5} * （6 - 8)^2$ + $\frac{2} {5} * （11- 8)^2$ = $\frac{12} {5}$ + $\frac{18} {5}$ =6。

再来看
$S_{w}^{(k)}=\frac{1}{n} \sum_{i=1}^{C} \sum_{x \in \omega_{i}}\left(x^{(k)}-m_{i}^{(k)}\right)^{2}$

这个复杂点。

首先n=5。

$x^{(k)}$ 表示样本x在第k个特征上的取值
则， $x^{(1)}$ 表示样本x在第特征1的取值。由图可知分别为：2、4 、7 、9 、18。

而当 $\in \omega_{0}$ ,即样本属于0类样本时，
$x^{(1)}$ （当 $\in \omega_{0}$ ）为2、7 、9。

同理， $x^{(1)}$ （当 $\in \omega_{1}$ ）为4、18。

且由上面已经算得 $m_{0} ^{(1)}$ =6， $m_{1} ^{(1)}$ =11。

$S_{w}^{(1)}$ 表示特征1的类内方差，则
$S_{w}^{(1)}=\frac{1}{n} \sum_{i=1}^{C} \sum_{x \in \omega_{i}}\left(x^{(1)}-m_{i}^{(1)}\right)^{2}$ = $\frac{1}{5}*( (2 - 6)^2+(7-6)^2 + (9-6)^2 +(4-11)^2 + (18-11)^2)$ = $\frac{124}{5}$

终于，我们计算出了特征1的Fisher Score，
$J_{fisher}(1) =\frac {S_{B}^{(1)} }{ S_{w}^{(1)}}$ = $\frac{6} {124/5}$ = $\frac{15}{62}$ 。

当我们重复上面的计算规则，计算出特征2，特征3的Fisher Score后就算大功告成了！

ok，这里我感觉我是讲解得很清楚了详细了，如果还是不太懂的话，可以重新认真看看上面计算的过程。认真计算后，你就能体会到为什么一个是叫类间方差，一个叫类内方差。

最后，我们运用的是启发式的规则来单独计算每个特征的Fisher Score，因此我们需要计算出数据集中所有的特征的Fisher Score,进行排名。

而Fisher Score的主要思想是鉴别性能较强的特征表现为类内距离尽可能小, 类间距离尽可能大。

那么当类间方差越大，类内方差越小时，Fisher Score就越大。因此排名是根据从大到小的顺序进行排名。排名越靠前的特征在理论上鉴别能力越强。（当然存在我前文所叙述的缺点）

二、代码实现

1、github上大佬实现的代码

我整理了下，代码如下：
首先是construct_W.py

import numpy as np
from scipy.sparse import *
from sklearn.metrics.pairwise import pairwise_distances


def construct_W(X, **kwargs):
    """
    Construct the affinity matrix W through different ways
    Notes
    -----
    if kwargs is null, use the default parameter settings;
    if kwargs is not null, construct the affinity matrix according to parameters in kwargs
    Input
    -----
    X: {numpy array}, shape (n_samples, n_features)
        input data
    kwargs: {dictionary}
        parameters to construct different affinity matrix W:
        y: {numpy array}, shape (n_samples, 1)
            the true label information needed under the 'supervised' neighbor mode
        metric: {string}
            choices for different distance measures
            'euclidean' - use euclidean distance
            'cosine' - use cosine distance (default)
        neighbor_mode: {string}
            indicates how to construct the graph
            'knn' - put an edge between two nodes if and only if they are among the
                    k nearest neighbors of each other (default)
            'supervised' - put an edge between two nodes if they belong to same class
                    and they are among the k nearest neighbors of each other
        weight_mode: {string}
            indicates how to assign weights for each edge in the graph
            'binary' - 0-1 weighting, every edge receives weight of 1 (default)
            'heat_kernel' - if nodes i and j are connected, put weight W_ij = exp(-norm(x_i - x_j)/2t^2)
                            this weight mode can only be used under 'euclidean' metric and you are required
                            to provide the parameter t
            'cosine' - if nodes i and j are connected, put weight cosine(x_i,x_j).
                        this weight mode can only be used under 'cosine' metric
        k: {int}
            choices for the number of neighbors (default k = 5)
        t: {float}
            parameter for the 'heat_kernel' weight_mode
        fisher_score: {boolean}
            indicates whether to build the affinity matrix in a fisher score way, in which W_ij = 1/n_l if yi = yj = l;
            otherwise W_ij = 0 (default fisher_score = false)
        reliefF: {boolean}
            indicates whether to build the affinity matrix in a reliefF way, NH(x) and NM(x,y) denotes a set of
            k nearest points to x with the same class as x, and a different class (the class y), respectively.
            W_ij = 1 if i = j; W_ij = 1/k if x_j \in NH(x_i); W_ij = -1/(c-1)k if x_j \in NM(x_i, y) (default reliefF = false)
    Output
    ------
    W: {sparse matrix}, shape (n_samples, n_samples)
        output affinity matrix W
    """

    # default metric is 'cosine'
    if 'metric' not in kwargs.keys():
        kwargs['metric'] = 'cosine'

    # default neighbor mode is 'knn' and default neighbor size is 5
    if 'neighbor_mode' not in kwargs.keys():
        kwargs['neighbor_mode'] = 'knn'
    if kwargs['neighbor_mode'] == 'knn' and 'k' not in kwargs.keys():
        kwargs['k'] = 5
    if kwargs['neighbor_mode'] == 'supervised' and 'k' not in kwargs.keys():
        kwargs['k'] = 5
    if kwargs['neighbor_mode'] == 'supervised' and 'y' not in kwargs.keys():
        print ('Warning: label is required in the supervised neighborMode!!!')
        exit(0)

    # default weight mode is 'binary', default t in heat kernel mode is 1
    if 'weight_mode' not in kwargs.keys():
        kwargs['weight_mode'] = 'binary'
    if kwargs['weight_mode'] == 'heat_kernel':
        if kwargs['metric'] != 'euclidean':
            kwargs['metric'] = 'euclidean'
        if 't' not in kwargs.keys():
            kwargs['t'] = 1
    elif kwargs['weight_mode'] == 'cosine':
        if kwargs['metric'] != 'cosine':
            kwargs['metric'] = 'cosine'

    # default fisher_score and reliefF mode are 'false'
    if 'fisher_score' not in kwargs.keys():
        kwargs['fisher_score'] = False
    if 'reliefF' not in kwargs.keys():
        kwargs['reliefF'] = False

    n_samples, n_features = np.shape(X)

    # choose 'knn' neighbor mode
    if kwargs['neighbor_mode'] == 'knn':
        k = kwargs['k']
        if kwargs['weight_mode'] == 'binary':
            if kwargs['metric'] == 'euclidean':
                # compute pairwise euclidean distances
                D = pairwise_distances(X)
                D **= 2
                # sort the distance matrix D in ascending order
                dump = np.sort(D, axis=1)
                idx = np.argsort(D, axis=1)
                # choose the k-nearest neighbors for each instance
                idx_new = idx[:, 0:k+1]
                G = np.zeros((n_samples*(k+1), 3))
                G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
                G[:, 1] = np.ravel(idx_new, order='F')
                G[:, 2] = 1
                # build the sparse affinity matrix W
                W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
                bigger = np.transpose(W) > W
                W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
                return W

            elif kwargs['metric'] == 'cosine':
                # normalize the data first
                X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
                for i in range(n_samples):
                    X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
                # compute pairwise cosine distances
                D_cosine = np.dot(X, np.transpose(X))
                # sort the distance matrix D in descending order
                dump = np.sort(-D_cosine, axis=1)
                idx = np.argsort(-D_cosine, axis=1)
                idx_new = idx[:, 0:k+1]
                G = np.zeros((n_samples*(k+1), 3))
                G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
                G[:, 1] = np.ravel(idx_new, order='F')
                G[:, 2] = 1
                # build the sparse affinity matrix W
                W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
                bigger = np.transpose(W) > W
                W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
                return W

        elif kwargs['weight_mode'] == 'heat_kernel':
            t = kwargs['t']
            # compute pairwise euclidean distances
            D = pairwise_distances(X)
            D **= 2
            # sort the distance matrix D in ascending order
            dump = np.sort(D, axis=1)
            idx = np.argsort(D, axis=1)
            idx_new = idx[:, 0:k+1]
            dump_new = dump[:, 0:k+1]
            # compute the pairwise heat kernel distances
            dump_heat_kernel = np.exp(-dump_new/(2*t*t))
            G = np.zeros((n_samples*(k+1), 3))
            G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
            G[:, 1] = np.ravel(idx_new, order='F')
            G[:, 2] = np.ravel(dump_heat_kernel, order='F')
            # build the sparse affinity matrix W
            W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W) > W
            W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
            return W

        elif kwargs['weight_mode'] == 'cosine':
            # normalize the data first
            X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
            for i in range(n_samples):
                    X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
            # compute pairwise cosine distances
            D_cosine = np.dot(X, np.transpose(X))
            # sort the distance matrix D in ascending order
            dump = np.sort(-D_cosine, axis=1)
            idx = np.argsort(-D_cosine, axis=1)
            idx_new = idx[:, 0:k+1]
            dump_new = -dump[:, 0:k+1]
            G = np.zeros((n_samples*(k+1), 3))
            G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
            G[:, 1] = np.ravel(idx_new, order='F')
            G[:, 2] = np.ravel(dump_new, order='F')
            # build the sparse affinity matrix W
            W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W) > W
            W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
            return W

    # choose supervised neighborMode
    elif kwargs['neighbor_mode'] == 'supervised':
        k = kwargs['k']
        # get true labels and the number of classes
        y = kwargs['y']
        label = np.unique(y)
        n_classes = np.unique(y).size
        # construct the weight matrix W in a fisherScore way, W_ij = 1/n_l if yi = yj = l, otherwise W_ij = 0
        if kwargs['fisher_score'] is True:
            W = lil_matrix((n_samples, n_samples))
            for i in range(n_classes):
                class_idx = (y == label[i])
                class_idx_all = (class_idx[:, np.newaxis] & class_idx[np.newaxis, :])
                W[class_idx_all] = 1.0/np.sum(np.sum(class_idx))
            return W

        # construct the weight matrix W in a reliefF way, NH(x) and NM(x,y) denotes a set of k nearest
        # points to x with the same class as x, a different class (the class y), respectively. W_ij = 1 if i = j;
        # W_ij = 1/k if x_j \in NH(x_i); W_ij = -1/(c-1)k if x_j \in NM(x_i, y)
        if kwargs['reliefF'] is True:
            # when xj in NH(xi)
            G = np.zeros((n_samples*(k+1), 3))
            id_now = 0
            for i in range(n_classes):
                class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
                D = pairwise_distances(X[class_idx, :])
                D **= 2
                idx = np.argsort(D, axis=1)
                idx_new = idx[:, 0:k+1]
                n_smp_class = (class_idx[idx_new[:]]).size
                if len(class_idx) <= k:
                    k = len(class_idx) - 1
                G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
                G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
                G[id_now:n_smp_class+id_now, 2] = 1.0/k
                id_now += n_smp_class
            W1 = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            # when i = j, W_ij = 1
            for i in range(n_samples):
                W1[i, i] = 1
            # when x_j in NM(x_i, y)
            G = np.zeros((n_samples*k*(n_classes - 1), 3))
            id_now = 0
            for i in range(n_classes):
                class_idx1 = np.column_stack(np.where(y == label[i]))[:, 0]
                X1 = X[class_idx1, :]
                for j in range(n_classes):
                    if label[j] != label[i]:
                        class_idx2 = np.column_stack(np.where(y == label[j]))[:, 0]
                        X2 = X[class_idx2, :]
                        D = pairwise_distances(X1, X2)
                        idx = np.argsort(D, axis=1)
                        idx_new = idx[:, 0:k]
                        n_smp_class = len(class_idx1)*k
                        G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx1, (k, 1)).reshape(-1)
                        G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx2[idx_new[:]], order='F')
                        G[id_now:n_smp_class+id_now, 2] = -1.0/((n_classes-1)*k)
                        id_now += n_smp_class
            W2 = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W2) > W2
            W2 = W2 - W2.multiply(bigger) + np.transpose(W2).multiply(bigger)
            W = W1 + W2
            return W

        if kwargs['weight_mode'] == 'binary':
            if kwargs['metric'] == 'euclidean':
                G = np.zeros((n_samples*(k+1), 3))
                id_now = 0
                for i in range(n_classes):
                    class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
                    # compute pairwise euclidean distances for instances in class i
                    D = pairwise_distances(X[class_idx, :])
                    D **= 2
                    # sort the distance matrix D in ascending order for instances in class i
                    idx = np.argsort(D, axis=1)
                    idx_new = idx[:, 0:k+1]
                    n_smp_class = len(class_idx)*(k+1)
                    G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
                    G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
                    G[id_now:n_smp_class+id_now, 2] = 1
                    id_now += n_smp_class
                # build the sparse affinity matrix W
                W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
                bigger = np.transpose(W) > W
                W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
                return W

            if kwargs['metric'] == 'cosine':
                # normalize the data first
                X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
                for i in range(n_samples):
                    X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
                G = np.zeros((n_samples*(k+1), 3))
                id_now = 0
                for i in range(n_classes):
                    class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
                    # compute pairwise cosine distances for instances in class i
                    D_cosine = np.dot(X[class_idx, :], np.transpose(X[class_idx, :]))
                    # sort the distance matrix D in descending order for instances in class i
                    idx = np.argsort(-D_cosine, axis=1)
                    idx_new = idx[:, 0:k+1]
                    n_smp_class = len(class_idx)*(k+1)
                    G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
                    G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
                    G[id_now:n_smp_class+id_now, 2] = 1
                    id_now += n_smp_class
                # build the sparse affinity matrix W
                W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
                bigger = np.transpose(W) > W
                W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
                return W

        elif kwargs['weight_mode'] == 'heat_kernel':
            G = np.zeros((n_samples*(k+1), 3))
            id_now = 0
            for i in range(n_classes):
                class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
                # compute pairwise cosine distances for instances in class i
                D = pairwise_distances(X[class_idx, :])
                D **= 2
                # sort the distance matrix D in ascending order for instances in class i
                dump = np.sort(D, axis=1)
                idx = np.argsort(D, axis=1)
                idx_new = idx[:, 0:k+1]
                dump_new = dump[:, 0:k+1]
                t = kwargs['t']
                # compute pairwise heat kernel distances for instances in class i
                dump_heat_kernel = np.exp(-dump_new/(2*t*t))
                n_smp_class = len(class_idx)*(k+1)
                G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
                G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
                G[id_now:n_smp_class+id_now, 2] = np.ravel(dump_heat_kernel, order='F')
                id_now += n_smp_class
            # build the sparse affinity matrix W
            W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W) > W
            W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
            return W

        elif kwargs['weight_mode'] == 'cosine':
            # normalize the data first
            X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
            for i in range(n_samples):
                X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
            G = np.zeros((n_samples*(k+1), 3))
            id_now = 0
            for i in range(n_classes):
                class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
                # compute pairwise cosine distances for instances in class i
                D_cosine = np.dot(X[class_idx, :], np.transpose(X[class_idx, :]))
                # sort the distance matrix D in descending order for instances in class i
                dump = np.sort(-D_cosine, axis=1)
                idx = np.argsort(-D_cosine, axis=1)
                idx_new = idx[:, 0:k+1]
                dump_new = -dump[:, 0:k+1]
                n_smp_class = len(class_idx)*(k+1)
                G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
                G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
                G[id_now:n_smp_class+id_now, 2] = np.ravel(dump_new, order='F')
                id_now += n_smp_class
            # build the sparse affinity matrix W
            W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W) > W
            W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
            return W

然后是主函数，

import numpy as np
from scipy.sparse import *
from construct_W import construct_W


def fisher_score(X, y):
    """
    This function implements the fisher score feature selection, steps are as follows:
    1. Construct the affinity matrix W in fisher score way
    2. For the r-th feature, we define fr = X(:,r), D = diag(W*ones), ones = [1,...,1]', L = D - W
    3. Let fr_hat = fr - (fr'*D*ones)*ones/(ones'*D*ones)
    4. Fisher score for the r-th feature is score = (fr_hat'*D*fr_hat)/(fr_hat'*L*fr_hat)-1
    Input
    -----
    X: {numpy array}, shape (n_samples, n_features)
        input data
    y: {numpy array}, shape (n_samples,)
        input class labels
    Output
    ------
    score: {numpy array}, shape (n_features,)
        fisher score for each feature
    Reference
    ---------
    He, Xiaofei et al. "Laplacian Score for Feature Selection." NIPS 2005.
    Duda, Richard et al. "Pattern classification." John Wiley & Sons, 2012.
    """

    # Construct weight matrix W in a fisherScore way
    kwargs = {"neighbor_mode": "supervised", "fisher_score": True, 'y': y}
    W = construct_W(X, **kwargs)

    # build the diagonal D matrix from affinity matrix W
    D = np.array(W.sum(axis=1))
    L = W
    tmp = np.dot(np.transpose(D), X)
    D = diags(np.transpose(D), [0])
    Xt = np.transpose(X)
    t1 = np.transpose(np.dot(Xt, D.todense()))
    t2 = np.transpose(np.dot(Xt, L.todense()))
    # compute the numerator of Lr
    D_prime = np.sum(np.multiply(t1, X), 0) - np.multiply(tmp, tmp)/D.sum()
    # compute the denominator of Lr
    L_prime = np.sum(np.multiply(t2, X), 0) - np.multiply(tmp, tmp)/D.sum()
    # avoid the denominator of Lr to be 0
    D_prime[D_prime < 1e-12] = 10000
    lap_score = 1 - np.array(np.multiply(L_prime, 1/D_prime))[0, :]

    # compute fisher score from laplacian score, where fisher_score = 1/lap_score - 1
    score = 1.0/lap_score - 1
    return np.transpose(score)


def feature_ranking(score):
    """
    Rank features in descending order according to fisher score, the larger the fisher score, the more important the
    feature is
    """
    idx = np.argsort(score, 0)
    return idx[::-1]

labels_pred=[[2,2,3,4,5,7],[3,0,1,1,2,4],[4,5,6,9,7,10]]
labels_true=[0,0,1]
z=fisher_score(labels_pred,labels_true)
print(z)

输出:

[3.         5.33333333 5.33333333 6.25925926 1.81481481 3.        ]

可以看到，输入参数为特征矩阵和标签类列表。输出为特征的Fisher Score值的列表。

大佬的github源代码地址↓
源代码地址

自己实现的代码

话不多说，上代码。

import pandas as pd
import numpy as np
#标签只能为0和1,特征空间任意，
#samle:n*m 的列表
#label:m*1 的列表
#返回每个特征的fisher score值的一个列表
#eg:  sample = [[1,2,3],[1,0,1],[1,5,6]]
#     label = [1, 0, 1]
#return  lst=[nan, 1.8148148148148149, 1.8148148148148149]
def binary_fisher_score(sample,label):

    if len(sample) != len(label):
        print('Sample does not match label')
        exit()
    df1 = pd.DataFrame(sample)
    df2 = pd.DataFrame(label, columns=['label'])
    data = pd.concat([df1, df2], axis=1)  # 合并成为一个dataframe

    data0 = data[data.label == 0]#对标签分类，分成包含0和1的两个dataframe
    data1 = data[data.label == 1]
    n = len(label)#标签长度
    n1 = sum(label)#1类标签的个数
    n0 = n - n1#0类标签的个数
    lst = []#用于返回的列表
    features_list = list(data.columns)[:-1]
    for feature in features_list:

        # 算关于data0
        m0_feature_mean = data0[feature].mean()  # 0类标签在第m维上的均值
        # 0类在第m维上的sw
        m0_SW=sum((data0[feature] -m0_feature_mean )**2)
        # 算关于data1
        m1_feature_mean = data1[feature].mean()  # 1类标签在第m维上的均值
        # 1类在第m维上的sw
        m1_SW=sum((data1[feature] -m1_feature_mean )**2)
        # 算关于data
        m_all_feature_mean = data[feature].mean()  # 所有类标签在第m维上的均值

        m0_SB = n0 / n * (m0_feature_mean - m_all_feature_mean) ** 2
        m1_SB = n1 / n * (m1_feature_mean - m_all_feature_mean) ** 2
        #计算SB
        m_SB = m1_SB + m0_SB
        #计算SW
        m_SW = (m0_SW + m1_SW) / n
        if m_SW == 0:
            # 0/0类型也是返回nan
            m_fisher_score = np.nan
        else:
            # 计算Fisher score
            m_fisher_score = m_SB / m_SW
        #Fisher score值添加进列表
        lst.append(m_fisher_score)

    return lst

sample = [[1,2,3],[1,0,1],[1,5,6]]
label = [1, 0, 1]
print(binary_fisher_score(sample,label))

输出：

[nan, 1.8148148148148149, 1.8148148148148149]

以上代码只是简单的实现了标签的值只能为0和1的情况，因此输入的参数的值请读者注意。当然标签的值大小并不重要，因为只是起到一个区分不同类别的作用。

后续我会再更新出一般的对于n个值的标签都适用的Fisher Score 的python实现。

希望本文对你有帮助！

华为开发者空间

华为开发者空间，是为全球开发者打造的专属开发空间，汇聚了华为优质开发资源及工具，致力于让每一位开发者拥有一台云主机，基于华为根生态开发、创新。

更多推荐

COC云运维中心新特性解读，让智能运维更高效

华为开发者空间

简单几步，基于云主机快速为Web项目添加AI助手

华为开发者空间

GaussDB性能调优

华为开发者空间

所有评论(0)

查看更多评论

亨少德小迷弟

@qq_39923466

已为社区贡献3条内容