pytorch小记：nn.ModuleList和nn.Sequential的用法以及区别

参考链接：官方文档containers知乎：PyTorch 中的 ModuleList 和 Sequential: 区别和使用场景知乎：详解PyTorch中的ModuleList和Sequential

慕思侣

21025人浏览 · 2021-01-14 16:43:50

慕思侣 · 2021-01-14 16:43:50 发布

最近入坑pytorch，有很多基础概念不是特别清楚，参考很多大佬总结，作为笔记记录下来（好记性不如烂笔头嘛），方便以后查看。如果文章中有错误或者没有说明白的地方，欢迎在评论区指正和讨论。

在构建网络的时候，pytorch有一些基础概念很重要，比如nn.Module，nn.ModuleList，nn.Sequential，这些类我们称为为容器（containers），可参考containers。本文中我们主要学习nn.ModuleList和nn.Sequential，并判断在什么时候用哪一个比较合适。

1. nn.ModuleList和nn.Sequential简介

nn.ModuleList

nn.ModuleList,它是一个存储不同module，并自动将每个module的parameters添加到网络之中的容器。你可以把任意nn.Module的子类（如nn.Conv2d，nn.Linear等）加到这个list里面，方法和python自带的list一样，无非是extend，append等操作，但不同于一般的list，加入到nn.ModuleList里面的module是会自动注册到整个网络上的，同时module的parameters也会自动添加到整个网络中。若使用python的list，则会出问题。

class net1(nn.Module):
    def __init__(self):
        super(net1, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10,10) for i in range(2)])
    def forward(self, x):
        for m in self.linears:
            x = m(x)
        return x

net = net1()
print(net)
# net1(
#   (modules): ModuleList(
#     (0): Linear(in_features=10, out_features=10, bias=True)
#     (1): Linear(in_features=10, out_features=10, bias=True)
#   )
# )

for param in net.parameters():
    print(type(param.data), param.size())
# <class 'torch.Tensor'> torch.Size([10, 10])
# <class 'torch.Tensor'> torch.Size([10])
# <class 'torch.Tensor'> torch.Size([10, 10])
# <class 'torch.Tensor'> torch.Size([10])

可以看到，这个网络权重(weights)和偏置(bias)都在这个网络之内。而对于使用python自带list的例子如下：

class net2(nn.Module):
    def __init__(self):
        super(net2, self).__init__()
        self.linears = [nn.Linear(10,10) for i in range(2)]
    def forward(self, x):
        for m in self.linears:
            x = m(x)
        return x

net = net2()
print(net)
# net2()
print(list(net.parameters()))
# []

显然，使用python的list添加的卷积层和它们的parameters并没有自动注册到我们的网络中。当然，我们还是可以使用forward来计算输出结果。但是如果用其实例化的网络进行训练的时候，因为这些层的parameters不在整个网络之中，所以其网络参数也不会被更新，也就是无法训练。

但是，我们需要注意到，nn.ModuleList并没有定义一个网络，它只是将不同的模块储存在一起，这些模块之间并没有什么先后顺序可言，比如：

class net3(nn.Module):
    def __init__(self):
        super(net3, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10,20), nn.Linear(20,30), nn.Linear(5,10)])
    def forward(self, x):
        x = self.linears[2](x)
        x = self.linears[0](x)
        x = self.linears[1](x) 
        return x

net = net3()
print(net)
# net3(
#   (linears): ModuleList(
#     (0): Linear(in_features=10, out_features=20, bias=True)
#     (1): Linear(in_features=20, out_features=30, bias=True)
#     (2): Linear(in_features=5, out_features=10, bias=True)
#   )
# )
input = torch.randn(32, 5)
print(net(input).shape)
# torch.Size([32, 30])

根据net3的结果，我们可以看出ModuleList里面的顺序并不能决定什么，网络的执行顺序是根据forward函数来决定的。但是一般设置ModuleList中的顺序和forward中保持一致，增强代码的可读性。

我们再来考虑另一种情况，既然ModuleList可以根据序号来调用，那么一个模型可以在forward函数中被调用多次。但需要注意的是，被调用多次的模块，是使用同一组parameters的，也就是它们是参数共享的。

class net4(nn.Module):
    def __init__(self):
        super(net4, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(5, 10), nn.Linear(10, 10)])
    def forward(self, x):
        x = self.linears[0](x)
        x = self.linears[1](x)
        x = self.linears[1](x)
        return x

net = net4()
print(net)
# net4(
#   (linears): ModuleList(
#     (0): Linear(in_features=5, out_features=10, bias=True)
#     (1): Linear(in_features=10, out_features=10, bias=True)
#   )
# )
for name, param in net.named_parameters():
    print(name, param.size())
# linears.0.weight torch.Size([10, 5])
# linears.0.bias torch.Size([10])
# linears.1.weight torch.Size([10, 10])
# linears.1.bias torch.Size([10])

nn.Sequential

不同于nn.ModuleList，nn.Sequential已经实现了内部的forward函数，而且里面的模块必须是按照顺序进行排列的，所以我们必须确保前一个模块的输出大小和下一个模块的输入大小是一致的。

class net5(nn.Module):
    def __init__(self):
        super(net5, self).__init__()
        self.block = nn.Sequential(nn.Conv2d(1,20,5),
                                    nn.ReLU(),
                                    nn.Conv2d(20,64,5),
                                    nn.ReLU())
    def forward(self, x):
        x = self.block(x)
        return x

net = net5()
print(net)
# net5(
#   (block): Sequential(
#     (0): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
#     (1): ReLU()
#     (2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1))
#     (3): ReLU()
#   )
# )

下面给出了两个nn.Sequential初始化的例子，在第二个初始化中我们用到了OrderedDict来指定每个module的名字

# Example of using Sequential
model1 = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )
print(model1)
# Sequential(
#   (0): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
#   (1): ReLU()
#   (2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1))
#   (3): ReLU()
# )

# Example of using Sequential with OrderedDict
import collections
model2 = nn.Sequential(collections.OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))
print(model2)
# Sequential(
#   (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
#   (relu1): ReLU()
#   (conv2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1))
#   (relu2): ReLU()
# )

有同学可能发现了，诶，你这个 model1 和从类 net5 实例化来的 net 有什么区别吗？是没有的。这两个网络是相同的，因为 nn.Sequential 就是一个 nn.Module 的子类，也就是 nn.Module 所有的方法 (method) 它都有。并且直接使用 nn.Sequential 不用写 forward 函数，因为它内部已经帮你写好了。

这时候有同学该说了，既然 nn.Sequential 这么好，我以后都直接用它了。如果你确定 nn.Sequential 里面的顺序是你想要的，而且不需要再添加一些其他处理的函数 (比如 nn.functional 里面的函数，nn 与 nn.functional 有什么区别? )，那么完全可以直接用 nn.Sequential。这么做的代价就是失去了部分灵活性，毕竟不能自己去定制 forward 函数里面的内容了。

一般情况下 nn.Sequential 的用法是来组成卷积块 (block)，然后像拼积木一样把不同的 block 拼成整个网络，让代码更简洁，更加结构化。

2.nn.Sequential与nn.ModuleList的区别

不同点1：nn.Sequential内部实现了forward函数，因此可以不用写forward函数，而nn.ModuleList则没有实现内部forward函数。

不同点2：nn.Sequential可以使用OrderedDict对每层进行命名。

不同点3：nn.Sequential里面的模块按照顺序进行排列的，所以必须确保前一个模块的输出大小和下一个模块的输入大小是一致的。而nn.ModuleList 并没有定义一个网络，它只是将不同的模块储存在一起，这些模块之间并没有什么先后顺序可言。

不同点4：有的时候网络中有很多相似或者重复的层，我们一般会考虑用 for 循环来创建它们，而不是一行一行地写，比如：

layers = [nn.Linear(10, 10) for i in range(5)]

那么这里我们使用ModuleList：

class net4(nn.Module):
    def __init__(self):
        super(net4, self).__init__()
        layers = [nn.Linear(10, 10) for i in range(5)]
        self.linears = nn.ModuleList(layers)

    def forward(self, x):
        for layer in self.linears:
            x = layer(x)
        return x

net = net4()
print(net)
# net4(
#   (linears): ModuleList(
#     (0): Linear(in_features=10, out_features=10, bias=True)
#     (1): Linear(in_features=10, out_features=10, bias=True)
#     (2): Linear(in_features=10, out_features=10, bias=True)
#   )
# )

这个是比较一般的方法，但如果不想这么麻烦，我们也可以用 Sequential 来实现，如 net7 所示！注意 * 这个操作符，它可以把一个 list 拆开成一个个独立的元素。但是，请注意这个 list 里面的模块必须是按照想要的顺序来进行排列的。在 场景一 中，我个人觉得使用 net7 这种方法比较方便和整洁。

class net7(nn.Module):
    def __init__(self):
        super(net7, self).__init__()
        self.linear_list = [nn.Linear(10, 10) for i in range(3)]
        self.linears = nn.Sequential(*self.linears_list)

    def forward(self, x):
        self.x = self.linears(x)
        return x

net = net7()
print(net)
# net7(
#   (linears): Sequential(
#     (0): Linear(in_features=10, out_features=10, bias=True)
#     (1): Linear(in_features=10, out_features=10, bias=True)
#     (2): Linear(in_features=10, out_features=10, bias=True)
#   )
# )

下面我们考虑 场景二，当我们需要之前层的信息的时候，比如 ResNets 中的 shortcut 结构，或者是像 FCN 中用到的 skip architecture 之类的，当前层的结果需要和之前层中的结果进行融合，一般使用 ModuleList 比较方便，一个非常简单的例子如下：

class net8(nn.Module):
    def __init__(self):
        super(net8, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 20), nn.Linear(20, 30), nn.Linear(30, 50)])
        self.trace = []

    def forward(self, x):
        for layer in self.linears:
            x = layer(x)
            self.trace.append(x)
        return x

net = net8()
input  = torch.randn(32, 10) # input batch size: 32
output = net(input)
for each in net.trace:
    print(each.shape)
# torch.Size([32, 20])
# torch.Size([32, 30])
# torch.Size([32, 50])

我们使用了一个 trace 的列表来储存网络每层的输出结果，这样如果以后的层要用的话，就可以很方便地调用了。

参考链接：

官方文档containers

知乎：PyTorch 中的 ModuleList 和 Sequential: 区别和使用场景

知乎：详解PyTorch中的ModuleList和Sequential

pytorch社区：When should I use nn.ModuleList and when should I use nn.Sequential?