torchvision.transforms.ToTensor详解 | 使用transforms.ToTensor()出现用户警告 | 图像的H W C 代表什么

看看torchvision.transforms.ToTensor做了什么：把一个取值范围是[0,255]的PIL.Image或者shape为(H,W,C)的numpy.ndarray，转换成形状为[C,H,W]，取值范围是[0,1.0]的torch.FloadTensor。简单来说就是把ndarray转换为tensor。np.ndarray为[h, w, c]格式：数组中最外层即hight，表示

LolitaAnn

33061人浏览 · 2021-11-09 11:23:36

LolitaAnn · 2021-11-09 11:23:36 发布

看看`torchvision.transforms.ToTensor`做了什么：

简单来说就是把PIL.Image或ndarray从 (H x W x C)形状转换为 (C x H x W) 的tensor。

如果

转换前numpy.ndarray的dtype = np.uint8
转换前的PIL.Image是L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1 格式

还会将数值从 [0, 255] 归一化到[0,1]

不符合上边条件的话就不会归一化。

[h, w, c]：数组中最外层即hight，表示图像像素有几行；第二层元素width，表示图像像素几列，最后一层元素为每一个通道的数值。
[c, h, w]：数组中第一层元素为图像有一个通道，第二层元素为某个通道上的一行像素，第三层为该通道上某列的像素值。
举个栗子：
- ```
 import numpy as np
 from torchvision import transforms
 
 data = np.random.randint(0, 255, size=6)
 img = data.reshape(2,1,3)
 print(img)
 img_tensor = transforms.ToTensor()(img) # 转换成tensor
 print(img_tensor)
```
- ndarray最外边一层是每行像素，一共两行；中间一层是每列像素值，一共两列；最里层是三通道即RGB值。
- tensor最外层是有几个通道，三通道则表示RGB；第二层中为每列有几个像素值；第三层为每行几个像素值。

UserWarning

弄清楚原理，那来看看我今天遇到的一个用户警告。
在这里插入图片描述

D:\Program Files\python\python3.8\lib\site-packages\torchvision\datasets\mnist.py:498: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at …\torch\csrc\utils\tensor_numpy.cpp:180.)
return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)

大致意思是：
D:\\Program Files\\python\\python3.8\\lib\\site packages\\torchvision\\dataset\\mnist.py:498:UserWarning:说：
给定的NumPy数组不可写，并且PyTorch不支持不可写的张量。这意味着你可以使用张量写入底层（假定不可写）NumPy数组。在将数组转换为张量之前，可能需要复制数组以保护其数据或使其可写。
在本程序的其余部分，此类警告将被抑制。（在…\torch\csc\utils\tensor\u numpy.cpp:180处内部触发。）
返回return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)