Tricks to prevent overfitting in CNN model trained on a small dataset

Jinwen
5 min readMay 23, 2021

--

When using a deep learning model to process images, we generally choose a convolutional neural network (CNN) model. But when the amount of data is small and the neural network model is complex, over-fitting occurs. If the model has a small loss function on the training data, the prediction accuracy is high; but if the loss function is large on the test data, the prediction accuracy is low, which is called overfitting.

In this article, using a 15-Scene classification convolutional neural network model as an example, introduced Some tricks for optimizing the CNN model trained on a small dataset. The full 15-Scene Dataset can be obtained here.

To classify 15-Scene Dataset, the basic procedure is as follows.
1) Shuffling and splitting the data
2) Design and implement an CNN
3) Training the CNN on the training and validation data

1) Shuffling and splitting the data

Random shuffle the training data

To load the image data, first grab the image paths and randomly shuffle the images with a random seed. It is commonly believed in the space that training data should be shuffled before splitting to break possible biases during data preparation. Random shuffling the training data offers some help to improve the accuracy, even the dataset is quie small. In the 15-Scene Dataset, accuracy improved by 10% after shuffling the data.

# example of random shuffle the training data
# set shuffle=True
history = cnnmodel.fit_generator(
datagen.flow(X_train,y_train,batch_size=32,
shuffle=True,
sample_weight=None,
seed=100,
save_to_dir=None,
subset=None),
epochs = epochs,
validation_data = (X_test,y_test),
verbose = 1,
steps_per_epoch=X_train.shape[0]// batch_size,
callbacks=[ckpt])

2) Design and implement an CNN

The design of the CNN model is as follows:

# buliding a CNN modeldef buildmodel(HEIGHT, WIDTH, N_CHANNELS):
model = Sequential()
model.add(Convolution2D(32, (2, 2),
activation='relu',
kernel_initializer ='he_normal',
input_shape=(HEIGHT, WIDTH, N_CHANNELS)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Convolution2D(64, (2, 2), padding='same'))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Convolution2D(32, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Convolution2D(32, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(Dropout(0.15))
model.add(Flatten())
model.add(Dense(32,
kernel_regularizer=l2(0.01),
bias_regularizer=l2(0.01)))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(32,
kernel_regularizer=l2(0.01),
bias_regularizer=l2(0.01)))
model.add(Activation('tanh'))
model.add(Dropout(0.25))
model.add(Dense(15, activation='softmax'))

opt = Adam(lr=0.001, decay=1e-6)
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
print(model.summary())
return model

The structure of the CNN model is as follows (take the input shape (28, 28, 1) as an example):

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_12 (Conv2D) (None, 27, 27, 32) 160
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 13, 13, 32) 0
_________________________________________________________________
activation_16 (Activation) (None, 13, 13, 32) 0
_________________________________________________________________
dropout_16 (Dropout) (None, 13, 13, 32) 0
_________________________________________________________________
conv2d_13 (Conv2D) (None, 13, 13, 64) 8256
_________________________________________________________________
activation_17 (Activation) (None, 13, 13, 64) 0
_________________________________________________________________
dropout_17 (Dropout) (None, 13, 13, 64) 0
_________________________________________________________________
conv2d_14 (Conv2D) (None, 13, 13, 32) 18464
_________________________________________________________________
activation_18 (Activation) (None, 13, 13, 32) 0
_________________________________________________________________
dropout_18 (Dropout) (None, 13, 13, 32) 0
_________________________________________________________________
conv2d_15 (Conv2D) (None, 13, 13, 32) 9248
_________________________________________________________________
activation_19 (Activation) (None, 13, 13, 32) 0
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 6, 6, 32) 0
_________________________________________________________________
dropout_19 (Dropout) (None, 6, 6, 32) 0
_________________________________________________________________
flatten_3 (Flatten) (None, 1152) 0
_________________________________________________________________
dense_6 (Dense) (None, 32) 36896
_________________________________________________________________
activation_20 (Activation) (None, 32) 0
_________________________________________________________________
dropout_20 (Dropout) (None, 32) 0
_________________________________________________________________
dense_7 (Dense) (None, 32) 1056
_________________________________________________________________
activation_21 (Activation) (None, 32) 0
_________________________________________________________________
dropout_21 (Dropout) (None, 32) 0
_________________________________________________________________
dense_8 (Dense) (None, 15) 495
=================================================================
Total params: 74,575
Trainable params: 74,575
Non-trainable params: 0
_________________________________________________________________

Regulization

Regularization forces the neural network to become simpler. It optimizes the model by penalizing complex models, thereby minimizing loss and complexity. In the case, l2 regularizer, which is the most comment one, is applied on a Dense fully connected layer.

# example of l2 on a dense layerfrom keras.layers import Dense
from keras.regularizers import l2
...
model.add(Dense(32, kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
...

Regularization can be also added to a Convolutional layers:

# example of l2 on a convolutional layerfrom keras.layers import Conv2D
from keras.regularizers import l2
...
model.add(Conv2D(32, (3,3), kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
...

Dropout

Dropout regularization ignores a random subset of the units in this layer, and sets its weight to zero during this training phase. Because of Dropout, two neurons may not always appear in a dropout network every time. In this way, the update of weights no longer depends on the interaction of implicit nodes with fixed relationships. Dropout forces the network to learn more robust features, which also exist in random subsets of other neurons. In other words, if our neural network is making a certain prediction, it should not be too sensitive to some specific clues. Even if the specific clue is lost, it should be able to learn some common features from many other clues. Dropouts can reduce the complexity of our neural network model, thereby prevent overfitting.

Dropout
# example of a drop out layer
from keras.layers import Dropout
...
model.add(Dropout(0.25))
...

3) Training the CNN on the training and validation data

Early Stopping

In practice, it is comment that the training loss decreasing whereas the validation error stays the same (or increases). To stop the train befroe the validation error stays the same (or increases) thus prevent the model form overfitting, early stopping is a good choice. In the CNN model, callback (an object that can perform actions at various stages of training) is uesd to do early stopping.

# example of a callback list (early stopping)
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)

Data augmentation

Data augmentation “increased” the data set to reduce the generalization gap (generalization gap: the gap between model performance in the training set and model performance test set).

In the CNN model, ImageDataGenerator is uesd to do data augmentation. In the example, setting horizontal flip=true gives the best result, while the rest parameters do not seem to change a lot.

# example of data augmentation
datagen = ImageDataGenerator(horizontal_flip=True)
datagen.fit(X_train)
history = cnnmodel.fit_generator(
datagen.flow(X_train,y_train,batch_size=32,
shuffle=True,
sample_weight=None,
seed=100,
save_to_dir=None,
subset=None),
epochs = epochs,
validation_data = (X_test,y_test),
verbose = 1,
steps_per_epoch=X_train.shape[0]// batch_size,
callbacks=[callback])

Reference

About me:

I am a graduate student at the National University of Singapore studying Industry 4.0.

--

--