0. 说明:

本篇学习记录主要包括:《Python深度学习》的第5章(深度学习用于计算机视觉)的第3节(使用预训练的卷积神经网络)内容。

相关知识点:

  1. 预训练模型的复用方法;
  2. 预训练网络 (pretrained network) 是一个保存好的网络,之前已经在大型数据集上完成训练。理论上数据集足够大,那么该预训练网络就可以学到足够的特征,模型的可移植性就更好。
  3. 预训练模型的使用方法:特征提取 (feature extraction) 和 微调模型 (fine-tuning)。

示例前提: 本例中,假设有一个在 ImageNet 数据集上训练好的大型卷积网络 (ImageNet上有很多中动物的图片,包括猫和狗,所以可以认为该网络在猫狗分类问题上也能有良好的效果)。

本例中使用 VGG16 架构 (由 Karen Simonyan 和 Andrew Zisserman 于2014年开发)。

1. 使用预训练的卷积神经网络:

用法1: 特征提取

用之前网络学到的表示来从新样本中提取特征,再将这些特征输入一个新的分类器,从头开始训练。

用于图像分类的卷积神经网络包含两部分:

1). 一系列卷积层和汇聚层 (这部分被称为模型的卷积基 convolutional base);

2). 一个密集链接分类器;

对于卷积网络而言,卷积层提取到的表示的通用性取决于该层在模型中的深度。越靠近输入层的的层提取到的是局部的、高度通用的特征 (比如视觉边缘、颜色、纹理等);越靠近输出层中的层,提取到的特征是更加抽象的概念 (比如“猫耳朵”、”狗眼睛“)。所以对于用预训练模型而言,应该使用模型的前几层来提取特征,而不是用整个卷积基。

思路:使用在 ImageNet 上训练的 VGG16 网络的卷积基,从猫狗图像中提取有用的特征,然后在这些特征上训练一个猫狗分类器。

VGG16 等模型内置于 Keras 中,可以直接从 keras.applications 模块中导入。

keras.applications 模块中还包含其他的图形分类模型 (都是用 ImageNet 数据集训练得到的):

Xception

Inception V3

ResNet50

VGG16

VGG19

MobileNet

1. 将 VGG16 卷积基实例化:

from keras.applications import VGG16conv_base = VGG16(weights="imagenet",## 指定模型初始化的权重检查点include_top=False, ## 指定模型最后是否包含密集分类器 (默认情况下分类器对应于ImageNet的1000个类别)input_shape=(150, 150, 3)) ## 输入到网络中的图像张量的形状 (该参数可选,如果不传入该参数,那么该网络可以处理任意形状的输入)
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h558889256/58889256 [==============================] - 10s 0us/step
conv_base.summary()
Model: "vgg16"_________________________________________________________________ Layer (type)Output ShapeParam # ================================================================= input_1 (InputLayer)[(None, 150, 150, 3)] 0 block1_conv1 (Conv2D) (None, 150, 150, 64)1792block1_conv2 (Conv2D) (None, 150, 150, 64)36928 block1_pool (MaxPooling2D)(None, 75, 75, 64)0 block2_conv1 (Conv2D) (None, 75, 75, 128) 73856 block2_conv2 (Conv2D) (None, 75, 75, 128) 147584block2_pool (MaxPooling2D)(None, 37, 37, 128) 0 block3_conv1 (Conv2D) (None, 37, 37, 256) 295168block3_conv2 (Conv2D) (None, 37, 37, 256) 590080block3_conv3 (Conv2D) (None, 37, 37, 256) 590080block3_pool (MaxPooling2D)(None, 18, 18, 256) 0 block4_conv1 (Conv2D) (None, 18, 18, 512) 1180160 block4_conv2 (Conv2D) (None, 18, 18, 512) 2359808 block4_conv3 (Conv2D) (None, 18, 18, 512) 2359808 block4_pool (MaxPooling2D)(None, 9, 9, 512) 0 block5_conv1 (Conv2D) (None, 9, 9, 512) 2359808 block5_conv2 (Conv2D) (None, 9, 9, 512) 2359808 block5_conv3 (Conv2D) (None, 9, 9, 512) 2359808 block5_pool (MaxPooling2D)(None, 4, 4, 512) 0=================================================================Total params: 14,714,688Trainable params: 14,714,688Non-trainable params: 0_________________________________________________________________

可以看出上面 conv_base 最后的特征形状为 (4, 4, 512),在该特征基础上加上一个密集连接分类器。

可以从以下两方面进行处理:

  1. 在自己的数据集上直接运行卷积基,将输出保存为 Numpy 数组,再将这个数组输入到 密集连接分类器中 (该方法:速度快,计算代价低,但是不能使用数据增强)。

  2. 在顶部添加 Dense 层扩展卷积基,再在输入数据上运行整个模型 (该方法:可以使用数据增强,但是计算代价高)。

2. 直接运行卷积基提取特征,再将特征输入密集链接分类器中 (不使用数据增强):

conv_base 提取特征:

import numpy as npfrom keras.preprocessing.image import ImageDataGenerator ## 可以将硬盘上的图像文件自动转换为预处理好的张量批量train_dir = "./dogs-vs-cats/small_dt/train/"validation_dir = "./dogs-vs-cats/small_dt/validation/"test_dir = "./dogs-vs-cats/small_dt/test/"datagen = ImageDataGenerator(rescale=1./255)batch_size = 20## 基于 conv_base 提取图像特征def extract_features(directory, sample_count):features = np.zeros(shape=(sample_count, 4, 4, 512)) ## 初始化一个形状为 (sample_count, 4, 4, 512)的特征labels = np.zeros(shape=(sample_count)) ## 初始化一个与特征对应的标签,形状为 (sample)## 创建批量张量generator = datagen.flow_from_directory(directory=directory,target_size=(150, 150),batch_size=batch_size,class_mode="binary")i = 0for inputs_batch, labels_batch in generator:features_batch = conv_base.predict(inputs_batch)features[i * batch_size : (i+1) * batch_size] = features_batch ## 将初始化的特征的第0维的 i*batch_size 到 (i+1)*batch_size 用 conv_base 提取到的特征进行填充。目的就是用 conv_base 提取的特征填充到初始化的矩阵中.labels[i * batch_size : (i+1) * batch_size] = labels_batch ## 对应的位置的标签的填充i += 1if i * batch_size >= sample_count:breakreturn features, labelstrain_features, train_labels = extract_features(train_dir, 2000) ## 2000张图片作为训练集,得到的特征形状为 (2000, 4, 4, 512)validation_features, validation_labels = extract_features(validation_dir, 1000) ## 1000张作验证test_features, test_labels = extract_features(test_dir, 1000) ## 1000张做测试## 将得到的特征展开 (samples, 4, 4, 512) => (samples, 8192)train_features = np.reshape(train_features, (2000, 4*4*512))validation_features = np.reshape(validation_features, (1000, 4*4*512))test_features = np.reshape(test_features, (1000, 4*4*512))
Found 2000 images belonging to 2 classes.1/1 [==============================] - 1s 649ms/step1/1 [==============================] - 0s 17ms/step1/1 [==============================] - 0s 16ms/stepFound 1000 images belonging to 2 classes.1/1 [==============================] - 0s 14ms/step1/1 [==============================] - 0s 16ms/step1/1 [==============================] - 0s 16ms/stepFound 1000 images belonging to 2 classes.1/1 [==============================] - 0s 13ms/step1/1 [==============================] - 0s 14ms/step1/1 [==============================] - 0s 14ms/step1/1 [==============================] - 0s 15ms/step

定义并训练密集链接分类器:

from keras import modelsfrom keras import layersfrom keras import optimizersmodel = models.Sequential()model.add(layers.Dense(256, activation="relu", input_dim=4*4*512))model.add(layers.Dropout(0.5))model.add(layers.Dense(1, activation="sigmoid"))model.compile(optimizer=optimizers.RMSprop(learning_rate=2e-5),loss="binary_crossentropy",metrics=["acc"])history = model.fit(train_features, train_labels,epochs=30,batch_size=20,validation_data=(validation_features, validation_labels))
Epoch 1/30100/100 [==============================] - 2s 14ms/step - loss: 0.5930 - acc: 0.6755 - val_loss: 0.4406 - val_acc: 0.8440Epoch 2/30100/100 [==============================] - 1s 10ms/step - loss: 0.3957 - acc: 0.8385 - val_loss: 0.3564 - val_acc: 0.8740Epoch 30/30100/100 [==============================] - 1s 10ms/step - loss: 0.0735 - acc: 0.9820 - val_loss: 0.2407 - val_acc: 0.9020

可视化损失和精度:

import matplotlib.pyplot as plthistory_dict = history.historyloss_values = history_dict["loss"]val_loss_values = history_dict["val_loss"]epochs = range(1, len(loss_values)+1)plt.plot(epochs, loss_values, "bo", label="Training loss") ## "bo" 表示蓝色圆点plt.plot(epochs, val_loss_values, "b", label="Validation loss") ## "bo" 表示蓝色实线plt.title("Training and validation loss")plt.xlabel("Epochs")plt.ylabel("Loss")plt.legend()plt.show()

## 训练精度和验证精度import matplotlib.pyplot as plthistory_dict = history.historyacc_values = history_dict["acc"]val_acc_values = history_dict["val_acc"]epochs = range(1, len(acc_values)+1)plt.plot(epochs, acc_values, "bo", label="Training accuracy") ## "bo" 表示蓝色圆点plt.plot(epochs, val_acc_values, "b", label="Validation accuracy") ## "bo" 表示蓝色实线plt.title("Training and validation accuracy")plt.xlabel("Epochs")plt.ylabel("Accuracy")plt.legend()plt.show()

该结果与之前 “从头开始训练的小模型” 的结果相比要好很多,尽管 dropout 比率较大,但是模型经过几次(大概epoch=5左右)训练基本就出现了过拟合的情况。

下面看一下使用数据增强的方法对应的结果如何?

3. 使用数据增强提取特征:

扩展 conv_base 模型,再将数据输入到模型中。

(ps. 书中说,该方法的计算代价很高,必须在GPU上运行。所以该部分的代码都在 Mac M1 上运行)

conv_base 上添加一个密集连接分类器:

from keras import modelsfrom keras import layersmodel = models.Sequential()model.add(conv_base)model.add(layers.Flatten())model.add(layers.Dense(256, activation="relu"))model.add(layers.Dense(1, activation="sigmoid"))
model.summary()
Model: "sequential_3"_________________________________________________________________ Layer (type)Output ShapeParam # ================================================================= vgg16 (Functional)(None, 4, 4, 512) 14714688flatten_2 (Flatten) (None, 8192)0 dense_6 (Dense) (None, 256) 2097408 dense_7 (Dense) (None, 1) 257=================================================================Total params: 16,812,353Trainable params: 16,812,353Non-trainable params: 0_________________________________________________________________

“冻结” 卷积基:

冻结 (freeze) 一个或多个层指的是在训练过程中保持其权重不变。如果不冻结,那么卷积基之前学到的表示将会在训练过程中被修改。

keras 中冻结网络的方法是将 trainable 属性设置为 False (注意:需要在编译模型compile之前冻结网络,否则将不起作用)。

print("冻结网络之前,训练权重的数目: ", len(model.trainable_weights))conv_base.trainable = False ## 冻结网络print("冻结网络之后,训练权重的数目: ", len(model.trainable_weights))
冻结网络之前,训练权重的数目:30冻结网络之后,训练权重的数目:4

经过 “冻结网络” 的设置之后,只有添加的那两个Dense层的权重需要训练,conv_base的权重不需要训练,所以总共有4个权重张量。

利用 “冻结” 的卷积基端到端地训练模型:

from keras.preprocessing.image import ImageDataGeneratorfrom keras import optimizers## 训练图像进行数据增强train_datagen = ImageDataGenerator(rescale=1./255,rotation_range=40,width_shift_range=0.2,height_shift_range=0.2,shear_range=0.2,zoom_range=0.2,horizontal_flip=True,fill_mode="nearest")## 验证数据不可以增强test_datagen = ImageDataGenerator(rescale=1./255)## 创建张量train_generator = train_datagen.flow_from_directory(train_dir,target_size=(150, 150), ## 所有训练图像的大小调整为 150x150batch_size=20,class_mode="binary")validation_generator = test_datagen.flow_from_directory(validation_dir,target_size=(150, 150),batch_size=20,class_mode="binary")## 编译模型model.compile(loss="binary_crossentropy",optimizer=optimizers.RMSprop(learning_rate=2e-5),metrics=["acc"])history = model.fit_generator(train_generator,steps_per_epoch=100, ## 一共2000个训练样本,批量大小为20,需要加载100次epochs=30,validation_data=validation_generator,validation_steps=50 ## 一共1000个验证样本,批量大小为20,需要加载50次)
Found 2000 images belonging to 2 classes.Found 1000 images belonging to 2 classes.Epoch 1/30/var/folders/0w/m6x2g_g94sqfmg3k8dldpwgm0000gn/T/ipykernel_32812/1274639931.py:39: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.history = model.fit_generator(100/100 [==============================] - 27s 267ms/step - loss: 0.5966 - acc: 0.6880 - val_loss: 0.4588 - val_acc: 0.8160Epoch 2/30100/100 [==============================] - 27s 270ms/step - loss: 0.4856 - acc: 0.7865 - val_loss: 0.3761 - val_acc: 0.8560Epoch 3/30100/100 [==============================] - 27s 269ms/step - loss: 0.4422 - acc: 0.7970 - val_loss: 0.3432 - val_acc: 0.8610Epoch 30/30100/100 [==============================] - 34s 341ms/step - loss: 0.2724 - acc: 0.8890 - val_loss: 0.2406 - val_acc: 0.9020

可视化损失和精度:

import matplotlib.pyplot as plthistory_dict = history.historyloss_values = history_dict["loss"]val_loss_values = history_dict["val_loss"]epochs = range(1, len(loss_values)+1)plt.plot(epochs, loss_values, "bo", label="Training loss") ## "bo" 表示蓝色圆点plt.plot(epochs, val_loss_values, "b", label="Validation loss") ## "bo" 表示蓝色实线plt.title("Training and validation loss")plt.xlabel("Epochs")plt.ylabel("Loss")plt.legend()plt.show()

## 训练精度和验证精度import matplotlib.pyplot as plthistory_dict = history.historyacc_values = history_dict["acc"]val_acc_values = history_dict["val_acc"]epochs = range(1, len(acc_values)+1)plt.plot(epochs, acc_values, "bo", label="Training accuracy") ## "bo" 表示蓝色圆点plt.plot(epochs, val_acc_values, "b", label="Validation accuracy") ## "bo" 表示蓝色实线plt.title("Training and validation accuracy")plt.xlabel("Epochs")plt.ylabel("Accuracy")plt.legend()plt.show()

使用数据增强的方法得到的结果和不使用数据增强的方法得到的结果相比,验证精度基本一样,但是可以明显看出,使用数据增强的方法,模型的过拟合情况得到了明显的改善。

用法2: 模型微调

模型复用的另一种方法是 模型微调 (fine-tuning),它与特征提取互为补充。

关于模型微调的理解:

对于特征提取中被冻结的模型基,微调指的是将其顶部几层进行 “解冻”,并将 “解冻” 的层与新加部分进行联合训练。

之所以叫 “微调”,是因为该方法只是略微调整了所复用的模型,以便该模型学到的表示与实际问题更加相关。

1. 模型微调的步骤:

  1. 在已经训练好的基网络(base network)上添加自定义网络;
  2. 冻结基网络;
  3. 训练所添加的部分;
  4. 解冻基网络的一些层;
  5. 联合训练解冻的这些层和添加的部分;

(ps. 由于在前面特征提取时已经完成了前3个步骤,所以继续完成后2步)

conv_base.summary()
Model: "vgg16"_________________________________________________________________ Layer (type)Output ShapeParam # ================================================================= input_1 (InputLayer)[(None, 150, 150, 3)] 0 block1_conv1 (Conv2D) (None, 150, 150, 64)1792block1_conv2 (Conv2D) (None, 150, 150, 64)36928 block1_pool (MaxPooling2D)(None, 75, 75, 64)0 block2_conv1 (Conv2D) (None, 75, 75, 128) 73856 block2_conv2 (Conv2D) (None, 75, 75, 128) 147584block2_pool (MaxPooling2D)(None, 37, 37, 128) 0 block3_conv1 (Conv2D) (None, 37, 37, 256) 295168block3_conv2 (Conv2D) (None, 37, 37, 256) 590080block3_conv3 (Conv2D) (None, 37, 37, 256) 590080block3_pool (MaxPooling2D)(None, 18, 18, 256) 0 block4_conv1 (Conv2D) (None, 18, 18, 512) 1180160 block4_conv2 (Conv2D) (None, 18, 18, 512) 2359808 block4_conv3 (Conv2D) (None, 18, 18, 512) 2359808 block4_pool (MaxPooling2D)(None, 9, 9, 512) 0 block5_conv1 (Conv2D) (None, 9, 9, 512) 2359808 block5_conv2 (Conv2D) (None, 9, 9, 512) 2359808 block5_conv3 (Conv2D) (None, 9, 9, 512) 2359808 block5_pool (MaxPooling2D)(None, 4, 4, 512) 0=================================================================Total params: 14,714,688Trainable params: 0Non-trainable params: 14,714,688_________________________________________________________________

卷积基的架构如上所示,这里微调最后三个卷积层,即 从 input_1 一直到 block4_pool 都要冻结,block5_conv1, block5_conv2, block5_conv3block5_pool 要被训练。

2. 关于微调中目标层的选择:

  1. 选择比较靠输出的层,因为这些层编码的特征更专业化,微调这些更专业化的特征更有助于应对特定的问题;

  2. 微调的层数不要太多,一般是基网络的最后2-3层;

## 在上面 “使用数据增强提取特征” 的基础上微调模型## 解冻基网络conv_base.trainable = True## 冻结 input_1 到 block4_pool 层set_trainable = Falsefor layer in conv_base.layers:if layer.name == "block5_conv1":set_trainable = Trueif set_trainable:layer.trainable = True ## blcok5_conv1及后面的层仍旧保持解冻状态else:layer.trainable = False ## block5_conv1 之前的层都由解冻状态转变为冻结状态
## 微调模型model.compile(loss="binary_crossentropy",optimizer=optimizers.RMSprop(learning_rate=1e-5),metrics=["acc"])history = model.fit_generator(train_generator,steps_per_epoch=100,epochs=100,validation_data=validation_generator,validation_steps=50)
Epoch 1/100/var/folders/0w/m6x2g_g94sqfmg3k8dldpwgm0000gn/T/ipykernel_32812/3508158211.py:6: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.history = model.fit_generator(100/100 [==============================] - 31s 302ms/step - loss: 0.2867 - acc: 0.8755 - val_loss: 0.2337 - val_acc: 0.9060Epoch 2/100100/100 [==============================] - 30s 299ms/step - loss: 0.2673 - acc: 0.8870 - val_loss: 0.2094 - val_acc: 0.9160Epoch 100/100100/100 [==============================] - 37s 366ms/step - loss: 0.0218 - acc: 0.9925 - val_loss: 0.3191 - val_acc: 0.9300

3. 可视化损失和精度:

import matplotlib.pyplot as plthistory_dict = history.historyloss_values = history_dict["loss"]val_loss_values = history_dict["val_loss"]epochs = range(1, len(loss_values)+1)plt.plot(epochs, loss_values, "bo", label="Training loss") ## "bo" 表示蓝色圆点plt.plot(epochs, val_loss_values, "b", label="Validation loss") ## "bo" 表示蓝色实线plt.title("Training and validation loss")plt.xlabel("Epochs")plt.ylabel("Loss")plt.legend()plt.show()

## 训练精度和验证精度import matplotlib.pyplot as plthistory_dict = history.historyacc_values = history_dict["acc"]val_acc_values = history_dict["val_acc"]epochs = range(1, len(acc_values)+1)plt.plot(epochs, acc_values, "bo", label="Training accuracy") ## "bo" 表示蓝色圆点plt.plot(epochs, val_acc_values, "b", label="Validation accuracy") ## "bo" 表示蓝色实线plt.title("Training and validation accuracy")plt.xlabel("Epochs")plt.ylabel("Accuracy")plt.legend()plt.show()

曲线光滑化处理结果:

## 将曲线做平滑化处理 (将每个损失和精度都转化为指数移动平均值)def smooth_curve(points, factor=0.8):smoothed_points = []for point in points:if smoothed_points:previous = smoothed_points[-1]smoothed_points.append(previous * factor + point * (1-factor))else:smoothed_points.append(point)return smoothed_points
import matplotlib.pyplot as plthistory_dict = history.historyloss_values = history_dict["loss"]val_loss_values = history_dict["val_loss"]epochs = range(1, len(loss_values)+1)plt.plot(epochs, smooth_curve(loss_values), "bo", label="Training loss") ## "bo" 表示蓝色圆点plt.plot(epochs, smooth_curve(val_loss_values), "b", label="Validation loss") ## "bo" 表示蓝色实线plt.title("Training and validation loss")plt.xlabel("Epochs")plt.ylabel("Loss")plt.legend()plt.show()

## 训练精度和验证精度import matplotlib.pyplot as plthistory_dict = history.historyacc_values = history_dict["acc"]val_acc_values = history_dict["val_acc"]epochs = range(1, len(acc_values)+1)plt.plot(epochs, smooth_curve(acc_values), "bo", label="Training accuracy") ## "bo" 表示蓝色圆点plt.plot(epochs, smooth_curve(val_acc_values), "b", label="Validation accuracy") ## "bo" 表示蓝色实线plt.title("Training and validation accuracy")plt.xlabel("Epochs")plt.ylabel("Accuracy")plt.legend()plt.show()

可以看出,与使用数据增强提取特征的方法相比,通过模型微调的方法,其验证精度可以从 90% 左右上升到 94% 左右。

4. 模型在测试数据上的评估结果:

根据上图可以发现 epoch=20 左右的时候,模型验证损失达到最小,所以将epoch设为20重新训练模型,并在测试集上进行测试。

## 将epoch设为20,重新训练模型model.compile(loss="binary_crossentropy",optimizer=optimizers.RMSprop(learning_rate=1e-5),metrics=["acc"])history = model.fit_generator(train_generator,steps_per_epoch=100,epochs=20,validation_data=validation_generator,validation_steps=50)
Epoch 1/20/var/folders/0w/m6x2g_g94sqfmg3k8dldpwgm0000gn/T/ipykernel_32812/71092168.py:6: UserWarning: `Model.fit_generator` is deprecated and will be removed in a future version. Please use `Model.fit`, which supports generators.history = model.fit_generator(100/100 [==============================] - 31s 302ms/step - loss: 0.0239 - acc: 0.9900 - val_loss: 0.3687 - val_acc: 0.9280Epoch 2/20100/100 [==============================] - 30s 300ms/step - loss: 0.0141 - acc: 0.9940 - val_loss: 0.2587 - val_acc: 0.9420Epoch 3/20100/100 [==============================] - 30s 300ms/step - loss: 0.0181 - acc: 0.9935 - val_loss: 0.3359 - val_acc: 0.9350Epoch 20/20100/100 [==============================] - 36s 364ms/step - loss: 0.0150 - acc: 0.9935 - val_loss: 0.3387 - val_acc: 0.9390
test_geneartor = test_datagen.flow_from_directory(test_dir,target_size=(150, 150),batch_size=20,class_mode="binary")test_loss, test_acc = model.evaluate_generator(test_geneartor, steps=50)print('test_acc: ', test_acc)
Found 1000 images belonging to 2 classes./var/folders/0w/m6x2g_g94sqfmg3k8dldpwgm0000gn/T/ipykernel_32812/2442981357.py:8: UserWarning: `Model.evaluate_generator` is deprecated and will be removed in a future version. Please use `Model.evaluate`, which supports generators.test_loss, test_acc = model.evaluate_generator(test_geneartor, steps=50)test_acc:0.9320000410079956

当epoch=20时,根据训练集重新训练的模型在测试集上的精度达到93.2%