【Pytorch】torch.nn.Dropout()的两种用法：防止过拟合 & 数据增强

Dropout方法是一种在训练模型时被广泛应用的trick，目的是防止模型过拟合，原理是使网络中某一层的每个参数以一定概率被mask（变为0），只用剩下的参数进行训练，从而达到防止模型过拟合的目的。...

Iron_lyk

6769人浏览 · 2022-07-13 10:53:58

Iron_lyk · 2022-07-13 10:53:58 发布

Dropout方法是一种在训练模型时被广泛应用的trick，目的是防止模型过拟合，原理是使网络中某一层的每个参数以一定概率被mask（变为0），只用剩下的参数进行训练，从而达到防止模型过拟合的目的。
在这里插入图片描述

以Pytorch中的实现为例，我们常用torch.nn.Dropout（p=0.5, inplace=False）方法实现，它调用的底层函数是torch.nn.functional.dropout()，官方源码见文章末尾。

在使用时，根据情况的不同，主要有以下两种用法：

1. 在搭建网络时使用，防止过拟合

在搭建网络时，一般将dropout层放于全连接层（nn.Linear）之后，用于在训练时将全连接层中参数以一定概率进行丢弃，以防止过拟合。在使用时有以下几点需注意：

dropout方法是用于训练的，因此在pytorch中，nn.Dropout()层只在model.train()模型下有效，在model.eval()模式下会自动失效
参数p，表示每个神经元以一定概率处于不激活的状态，默认为0.5
在训练时，nn.Dropout()不仅对每个神经元参数以一定概率变为0，还会将剩下不为0的参数进行rescale(缩放)，目的是为了保持期望不变，缩放比例是1/(1-p)
nn.Dropout()的输入可以是任意形状，输出的形状与输入形状相同

2. 对输出张量使用，用于数据增强

对于网络中某一层输出的张量，也可以对其使用nn.dropout()方法，这样可以使张量中每个元素以一定概率为0，从而模拟现实中数据缺失的情况，以达到数据增强的目的。并且，对于不归0的元素，会缩放为原来的1/(1-p)倍。
在这里插入图片描述

官方代码如下：

class Dropout(_DropoutNd):
    r"""During training, randomly zeroes some of the elements of the input
    tensor with probability :attr:`p` using samples from a Bernoulli
    distribution. Each channel will be zeroed out independently on every forward
    call.

    This has proven to be an effective technique for regularization and
    preventing the co-adaptation of neurons as described in the paper
    `Improving neural networks by preventing co-adaptation of feature
    detectors`_ .

    Furthermore, the outputs are scaled by a factor of :math:`\frac{1}{1-p}` during
    training. This means that during evaluation the module simply computes an
    identity function.

    Args:
        p: probability of an element to be zeroed. Default: 0.5
        inplace: If set to ``True``, will do this operation in-place. Default: ``False``

    Shape:
        - Input: :math:`(*)`. Input can be of any shape
        - Output: :math:`(*)`. Output is of the same shape as input

    Examples::

        >>> m = nn.Dropout(p=0.2)
        >>> input = torch.randn(20, 16)
        >>> output = m(input)

    .. _Improving neural networks by preventing co-adaptation of feature
        detectors: https://arxiv.org/abs/1207.0580
    """

    def forward(self, input: Tensor) -> Tensor:
        return F.dropout(input, self.p, self.training, self.inplace)