问题

今天模型训练,遇到了个bug
先是在dataloder那报了这样一个错
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
然后后面报
RuntimeError: Trying to resize storage that is not resizable
完整错误代码如下

Traceback (most recent call last):File "train_temp.py", line 100, in <module>for data in train_dataloader:File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in __next__data = self._next_data()File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_datareturn self._process_data(data)File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_datadata.reraise()File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/_utils.py", line 543, in reraiseraise exceptionRuntimeError: Caught RuntimeError in DataLoader worker process 0.Original Traceback (most recent call last):File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loopdata = fetcher.fetch(index)File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetchreturn self.collate_fn(data)File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 265, in default_collatereturn collate(batch, collate_fn_map=default_collate_fn_map)File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 143, in collatereturn [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed]# Backwards compatibility.File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 143, in <listcomp>return [collate(samples, collate_fn_map=collate_fn_map) for samples in transposed]# Backwards compatibility.File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 120, in collatereturn collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 172, in collate_numpy_array_fnreturn collate([torch.as_tensor(b) for b in batch], collate_fn_map=collate_fn_map)File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 120, in collatereturn collate_fn_map[elem_type](batch, collate_fn_map=collate_fn_map)File "/data0/thw/anaconda3/envs/Denoising2/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 162, in collate_tensor_fnout = elem.new(storage).resize_(len(batch), *list(elem.size()))RuntimeError: Trying to resize storage that is not resizable

解决

一开始,在博客上看到是num_works设置有问题,需要设置为0 或 和显卡相同的数
当时,还是有点怀疑,因为之前还设置了16,显卡是4张,也没报错,还是尝试了下,看看问题解决没,(因为当时没想法了),果然,仍然报错
后来,看到这篇博客,感谢博主大大(点击),作者在末尾,提到数据维度不统一的问题,于是,就在dataloder中打印了下自己的数据维度,结果发现,输入的input和label的shape竟然不一样!!!!
一个是384*384*1,一个是256*256*1
要怀疑人生了>_<
然后,改了裁剪的大小,就好了^_^

琐碎

1 num_works是有多少个进程去加载数据,与显卡数量无关,只不过一般是相等,可以在训练的时候慢慢增加num_works直到加载数据速度无明显提升
2 数据集数据集!