2020-01-19

python深度学习

环境问题总结

启动keras示例，python3 examples/mnist_cnn.py

找不到cuda相关的库。原因是tensorflow不支持10.2版本的cuda，安装10.0版本的cuda并添加库的搜索目录后解决（修改ld.conf后运行ldconfig）。
NVIDIA的驱动仍然可以使用最新的版本。

启动keras示例，python3 examples/mnist_cnn.py

遇到显卡无法正确使用的问题(无法分配内存)，可以在代码的头部使用添加如下片段解决。

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

遇到tensorflow报错Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize

重启后解决，可能和休眠的处理不好有关系。
使用notebook的时候睡眠，遇到了两次休眠后无法唤醒的问题。

实验问题

5.3.1无法复现书中的96%精度

问题现象

使用书中附带的notebook，在执行快速特征提取时，可以获得预期的90%精度。
但是使用数据增强后的程序，仍然也只能达到90%的精度。

问题分析

反复核对了代码，数据集，然后重新执行了多次仍然只有90%的精度。
使用goolge搜索using-a-pretrained-convnet，可以找到https://github.com/fchollet/deep-learning-with-python-notebooks/issues/21。
阅读其中的讨论，最终可以在https://forums.manning.com/posts/list/42880.page 中找到较为完整的解释。
When I run the code given in the book, my validation accuracy plateaus around 0.90, exactly like the original poster described. I think the problem is that in the book’s code, the images are being read in from disk as numpy arrays of float32 values in the range [0.0, 1.0], due to the keyword argument [tt]rescale=1./255[/tt] in the ImageDataGenerator constructors. These images are then automatically fed directly to the VGG16 convolutional base when the model’s [tt]fit_generator[/tt] method is called.

However, the original VGG16 network was trained on images that were preprocessed by zero-centering each color channel with respect to the ImageNet database. In Keras, there is a function available (in keras.applications.vgg16) that does this transformation, called [tt]preprocess_input[/tt]. In fact, if you test the full pretrained VGG16 network by itself, you will find that in order to get accurate classification results, you must call [tt]preprocess_input[/tt] first before calling the [tt]predict[/tt] method. Furthermore, [tt]preprocess_input[/tt] must be called on images of float32 values in the range [0., 255.], not [0., 1.].

So to summarize, there are two problems with the book’s example: first, it uses images with values in the range [0., 1.], and second, it does not call [tt]preprocess_input[/tt] before feeding the images to the VGG16 base.
问题的本质是本书作者在这一节犯了一个低级错误，在使用VGG16这个预训练模型时，再次训练时送入的数据没有按照VGG16模型的要求进行预处理。同时，作者使用的keras老版本有一个bug，conv_base.trainable = False这句没有生效，所以作者实际上是对整个VGG16模型做了再次完整的训练，网络自动适配了新的值域，精度仍然达到了96%。

按照上面的建议修改notebook中的代码如下，确实能获得预期的96%精度。
使用 preprocessing_function=keras.applications.vgg16.preprocess_input和preprocessing_function=preprocess_input 没有实质性的差异。

from keras.applications.imagenet_utils import preprocess_input

from keras import optimizers

train_datagen = ImageDataGenerator(
    #rescale=1./255,
 #    preprocessing_function=preprocess_input,
    preprocessing_function=keras.applications.vgg16.preprocess_input,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

#test_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

并且，按照后续讨论的结果，对使用快速特征提取的代码也做正确的预处理，确实也能很快达到96%的精度。后面网友的讨论应该是正确的，对于这个问题来说，数据增强并不是必要的。

from keras.applications.imagenet_utils import preprocess_input

#datagen = ImageDataGenerator(rescale=1./255)
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

收获

输入数据的准备极端重要（类型正确，数值范围正确，统一的预处理方法），可能严重影响模型的精度；
深度网络自动学习特征表示的能力确实很强，一个使用0~~255输入值域预训练的网络，只要不冻结它的weight，即使使用0~~1值域的输入样本重新训练，也仍然能保留其绝大部分的分类能力。网络自动适应了输入值域的失配问题；
深度学习确实是实践重于理论，大神也可能犯低级错误。本文作者就犯了输入数据准备的错误。由于低版本keras的bug（froenzen不生效，实际上预训练的网络被完全重新训练）加上深度网络的强大自动学习能力，作者使用低版本的keras即使输入了错误的值域数据也仍然获得了96%的高精度。由于这一结果符合预期，作者忽略了这个错误。

第6章学习的问题记录

6.1.3 冻结与不冻结词嵌入层的对比
作为练习，6.1.3节中提到冻结词嵌入层和不冻结词嵌入层，针对不同的训练样本数量进行性能比较。
实际操作时，因为操作错误没有对训练过的模型进行清零。多次训练时，发现冻结词嵌入层时，如果扩大过训练样本数量，然后再降低样本数量再训练，仍然能保持很高的精度。本次练习中，10000个样本，如果用数千个样本训练过，再用500个样本训练网络，仍然可以保持很高的精度95%。而不冻结词嵌入层，用小样本量再训练，无法维持高精度。应该是反向传播破坏了嵌入层的正确表示。

LSTM训练速度问题
LSTM训练速度很慢，并且GPU的利用率只能到35%。后续应该看看这里的性能瓶颈。