代码错误记录：TypeError: dropout(): argument ‘input‘ (position 1) must be Tensor, not str

TypeError: dropout（）: argument 'input' （position 1） must be Tensor, not str 背景解决方法整体代码参考链接背景使用 hugging face 中的预训练模型完成文本分类任务的过程中。出现了这个问题。问题排查的过程中，发现这里定义的 cls_layer() 出现问题。问题是数据类型错误，因此需要检查pooler_outpu

小王做笔记

13004人浏览 · 2022-04-06 11:04:34

小王做笔记 · 2022-04-06 11:04:34 发布

TypeError: dropout（）: argument 'input' （position 1） must be Tensor, not str

背景
解决方法 1 （直接在输出上进行修改）
- 整体代码
解决方法2 （直接在模型上进行修改）
参考链接

背景

使用 hugging face 中的预训练模型完成文本分类任务的过程中。出现了这个问题。

在这里插入图片描述
问题排查的过程中，发现这里定义的 cls_layer() 出现问题。

问题是数据类型错误，因此需要检查pooler_output的数据产生的位置和输出类型

在这里插入图片描述

解决方法 1 （直接在输出上进行修改）

定位位置，寻找pooler_output的输出
在这里插入图片描述

这个pooler_output是关于 bert_layer 中 [CLS]的输出向量，这里的返回值是一个字典类型，因此我们需要设置它的返回是不是字典类型

在这里插入图片描述

整体代码

class SentencePairClassifier(nn.Module):
    def __init__(self, bert_model="albert-base-v2", freeze_bert=False):
        super(SentencePairClassifier, self).__init__()
        #  Instantiating BERT-based model object
        self.bert_layer = AutoModel.from_pretrained(bert_model)
        
        #  Fix the hidden-state size of the encoder outputs (If you want to add other pre-trained models here, search for the encoder output size)
        if bert_model == "albert-base-v2":  # 12M parameters
            hidden_size = 768
        elif bert_model == "albert-large-v2":  # 18M parameters
            hidden_size = 1024
        elif bert_model == "albert-xlarge-v2":  # 60M parameters
            hidden_size = 2048
        elif bert_model == "albert-xxlarge-v2":  # 235M parameters
            hidden_size = 4096
        elif bert_model == "bert-base-uncased": # 110M parameters
            hidden_size = 768
            
        # Freeze bert layers and only train the classification layer weights
        if freeze_bert:
            for p in self.bert_layer.parameters():
                p.requires_grad = False
                
        # Classification layer
        self.cls_layer = nn.Linear(hidden_size, 1)
        self.dropout = nn.Dropout(p=0.1)
        
        
    @autocast()  # run in mixed precision
    
    def forward(self, input_ids, attn_masks, token_type_ids):
        '''
        Inputs:
            -input_ids : Tensor  containing token ids
            -attn_masks : Tensor containing attention masks to be used to focus on non-padded values
            -token_type_ids : Tensor containing token type ids to be used to identify sentence1 and sentence2
        
        outputs:
            - last_hidden_state: 最后一层的隐藏层向量表征
            - pooler_output: 最后一层 输出 
            - all_hidden_state: 全部层的 隐藏层向量表征 
        注：all_hidden_state可以将后面的4层取出来，做mean，然后在拼接到 classifier上。
        '''
        # Feeding the inputs to the BERT-based model to obtain contextualized representations
        cont_reps, pooler_output = self.bert_layer(input_ids, attn_masks, token_type_ids, return_dict=False) ## , return_dict=False)
        
        # Feeding to the classifier layer the last layer hidden-state of the [CLS] token further processed by a
        # Linear Layer and a Tanh activation. The Linear layer weights were trained from the sentence order prediction (ALBERT) or next sentence prediction (BERT)
        # objective during pre-training.
        logits = self.cls_layer(self.dropout(pooler_output))
        
        return logits