I was curious about the real influence of parameter on CNN training result. And I want to see if the result aligns with the estimation. I used the MNIST dataset and tune the parameters based on an old network called LeNet-5.
CNN Layer
The CNN layout is mostly the same as LeNet-5, with small modifications, for example, adding Dropout to avoid overfitting. The reason that I used LeNet-5 is mostly that it’s small and can train at a super fast speed with little GPU requirement as I have to change the parameter to do a few hundred training tests.
1 | Image Input | 28x28x1 images with ’zerocenter’ normalization |
2 | Convolution | 6 9x9x1 convolutions with stride [1 1] and padding [0 0 0 0] |
3 | Batch Normalization | Batch normalization with 6 channels |
4 | ReLU | ReLU |
5 | Average Pooling | 2×2 average pooling with stride [2 2] and padding [0 0 0 0] |
6 | Convolution | 16 9x9x6 convolutions with stride [1 1] and padding [0 0 0 0] |
7 | Batch Normalization | Batch normalization with 16 channels |
8 | ReLU | ReLU |
9 | Average Pooling | 2×2 average pooling with stride [2 2] and padding [0 0 0 0] |
10 | Fully Connected | 128 fully connected layer |
11 | ReLU | ReLU |
12 | Fully Connected | 64 fully connected layer |
13 | ReLU | ReLU |
14 | Dropout | 50% dropout |
15 | Fully Connected | 10 fully connected layer |
16 | Regression Output | mean-squared-error with response ’Response’ |
Testing Result
Convolutional Layer Filter Size
The filter size changes from 1 to 9, which is the maximum for the network. The result is shown in Figure 1.
The result shows that network with a filter size of 3 or 5 has the best accuracy. The reason why filter with size 1 is not accurate is most likely because it’s not correlating a pixel with another. The reason why we experience accuracy drop after 5 can be the increase of parameters that may lead to overfitting or low training speed (the maximum training iteration reached). This result aligns with the suggestion by Szegedy et al. [2] that a relatively small convolution kernel can be sufficient in training. Considering the fact that increasing the filter size significantly increases the number of calculations, 3 is a good enough filter size.
Convolutional Layer Filter Number
I changed the filter number of both convolutional layer separately and together. The change on the filter number of the first convolutional layer is shown as Figure 2. The change on the filter number of the second convolutional layer is shown as Figure 3. The change on the filter number of both convolutional layer with the size of the second layer twice the size of the first layer is shown as Figure 4.
The general conclusion is that increasing the filter number will result in better accuracy. However, the gain through increasing filter size drops dramatically after a certain point. So, it may not be necessary to have a huge filter size in exchange for the training speed.
Full-connected Layer Size
The sizes of both fully connected layers are changed together with the first layer twice the size of the second layer. The result is shown in Figure 5.
The accuracy remains almost the same with a slight increase after a certain point. In Figure 5, this point is 40. The interesting thing here is that when the fully-connected layer size is low, there are training results with significantly low accuracy. However, I am not quite sure about the reason behind this.
Reference
[1] Kaggle. (2018) Digit recognizer. [Online]. Available: https://www.kaggle.
com/c/digit-recognizer#description
[2] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
the inception architecture for computer vision,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE,
[3] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “