Tricks to prevent overfitting in CNN model trained on a small dataset

5 min readMay 23, 2021

When using a deep learning model to process images, we generally choose a convolutional neural network (CNN) model. But when the amount of data is small and the neural network model is complex, over-fitting occurs. If the model has a small loss function on the training data, the prediction accuracy is high; but if the loss function is large on the test data, the prediction accuracy is low, which is called overfitting.

In this article, using a 15-Scene classification convolutional neural network model as an example, introduced Some tricks for optimizing the CNN model trained on a small dataset. The full 15-Scene Dataset can be obtained here.

To classify 15-Scene Dataset, the basic procedure is as follows.
1) Shuffling and splitting the data
2) Design and implement an CNN
3) Training the CNN on the training and validation data

1) Shuffling and splitting the data

Random shuffle the training data

To load the image data, first grab the image paths and randomly shuffle the images with a random seed. It is commonly believed in the space that training data should be shuffled before splitting to break possible biases during data preparation. Random shuffling the training data offers some help to improve the accuracy, even the dataset is quie small. In the 15-Scene Dataset, accuracy improved by 10% after shuffling the data.

# example of random shuffle the training data
# set shuffle=Truehistory = cnnmodel.fit_generator(
          datagen.flow(X_train,y_train,batch_size=32,
                       shuffle=True,   
                       sample_weight=None,
                       seed=100,
                       save_to_dir=None,
                       subset=None),
          epochs = epochs, 
          validation_data = (X_test,y_test),
          verbose = 1,
          steps_per_epoch=X_train.shape[0]// batch_size,
          callbacks=[ckpt])

2) Design and implement an CNN

The design of the CNN model is as follows:

# buliding a CNN modeldef buildmodel(HEIGHT, WIDTH, N_CHANNELS):
    model = Sequential()
    model.add(Convolution2D(32, (2, 2), 
              activation='relu', 
              kernel_initializer ='he_normal',
              input_shape=(HEIGHT, WIDTH, N_CHANNELS)))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Activation('relu'))
    model.add(Dropout(0.25))
    model.add(Convolution2D(64, (2, 2), padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(0.25))
    model.add(Convolution2D(32, (3, 3), padding='same'))
    model.add(Activation('relu'))
    model.add(Dropout(0.25))
    model.add(Convolution2D(32, (3, 3), padding='same'))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=2))
    model.add(Dropout(0.15))
    model.add(Flatten())
    model.add(Dense(32, 
              kernel_regularizer=l2(0.01),      
              bias_regularizer=l2(0.01)))
    model.add(Activation('relu'))
    model.add(Dropout(0.25))
    model.add(Dense(32, 
              kernel_regularizer=l2(0.01),      
              bias_regularizer=l2(0.01)))
    model.add(Activation('tanh'))
    model.add(Dropout(0.25))
    model.add(Dense(15, activation='softmax'))
    
    opt = Adam(lr=0.001, decay=1e-6)
    model.compile(loss='categorical_crossentropy', 
                  optimizer=opt, 
                  metrics=['accuracy'])
    print(model.summary())return model

The structure of the CNN model is as follows (take the input shape (28, 28, 1) as an example):

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_12 (Conv2D)           (None, 27, 27, 32)        160       
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
activation_16 (Activation)   (None, 13, 13, 32)        0         
_________________________________________________________________
dropout_16 (Dropout)         (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 13, 13, 64)        8256      
_________________________________________________________________
activation_17 (Activation)   (None, 13, 13, 64)        0         
_________________________________________________________________
dropout_17 (Dropout)         (None, 13, 13, 64)        0         
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 13, 13, 32)        18464     
_________________________________________________________________
activation_18 (Activation)   (None, 13, 13, 32)        0         
_________________________________________________________________
dropout_18 (Dropout)         (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 13, 13, 32)        9248      
_________________________________________________________________
activation_19 (Activation)   (None, 13, 13, 32)        0         
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 6, 6, 32)          0         
_________________________________________________________________
dropout_19 (Dropout)         (None, 6, 6, 32)          0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 1152)              0         
_________________________________________________________________
dense_6 (Dense)              (None, 32)                36896     
_________________________________________________________________
activation_20 (Activation)   (None, 32)                0         
_________________________________________________________________
dropout_20 (Dropout)         (None, 32)                0         
_________________________________________________________________
dense_7 (Dense)              (None, 32)                1056      
_________________________________________________________________
activation_21 (Activation)   (None, 32)                0         
_________________________________________________________________
dropout_21 (Dropout)         (None, 32)                0         
_________________________________________________________________
dense_8 (Dense)              (None, 15)                495       
=================================================================
Total params: 74,575
Trainable params: 74,575
Non-trainable params: 0
_________________________________________________________________

Regulization

Regularization forces the neural network to become simpler. It optimizes the model by penalizing complex models, thereby minimizing loss and complexity. In the case, l2 regularizer, which is the most comment one, is applied on a Dense fully connected layer.

# example of l2 on a dense layerfrom keras.layers import Dense
from keras.regularizers import l2
...
model.add(Dense(32, kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
...

Regularization can be also added to a Convolutional layers:

# example of l2 on a convolutional layerfrom keras.layers import Conv2D
from keras.regularizers import l2
...
model.add(Conv2D(32, (3,3), kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
...

Dropout

Dropout regularization ignores a random subset of the units in this layer, and sets its weight to zero during this training phase. Because of Dropout, two neurons may not always appear in a dropout network every time. In this way, the update of weights no longer depends on the interaction of implicit nodes with fixed relationships. Dropout forces the network to learn more robust features, which also exist in random subsets of other neurons. In other words, if our neural network is making a certain prediction, it should not be too sensitive to some specific clues. Even if the specific clue is lost, it should be able to learn some common features from many other clues. Dropouts can reduce the complexity of our neural network model, thereby prevent overfitting.

# example of a drop out layer
from keras.layers import Dropout
...
model.add(Dropout(0.25))
...

3) Training the CNN on the training and validation data

`Early Stopping`

In practice, it is comment that the training loss decreasing whereas the validation error stays the same (or increases). To stop the train befroe the validation error stays the same (or increases) thus prevent the model form overfitting, early stopping is a good choice. In the CNN model, callback (an object that can perform actions at various stages of training) is uesd to do early stopping.

# example of a callback list (early stopping)
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)

Data augmentation

Data augmentation “increased” the data set to reduce the generalization gap (generalization gap: the gap between model performance in the training set and model performance test set).

In the CNN model, ImageDataGenerator is uesd to do data augmentation. In the example, setting horizontal flip=true gives the best result, while the rest parameters do not seem to change a lot.

# example of data augmentation
datagen = ImageDataGenerator(horizontal_flip=True)
datagen.fit(X_train)
history = cnnmodel.fit_generator(
          datagen.flow(X_train,y_train,batch_size=32,
                       shuffle=True,
                       sample_weight=None,
                       seed=100,
                       save_to_dir=None,
                       subset=None),
          epochs = epochs, 
          validation_data = (X_test,y_test),
          verbose = 1,
          steps_per_epoch=X_train.shape[0]// batch_size,
          callbacks=[callback])

Reference

How to use Keras fit and fit_generator (a hands-on tutorial) - PyImageSearch

In this tutorial, you will learn how the Keras .fit and .fit_generator functions work, including the differences…

www.pyimagesearch.com

Building powerful image classification models using very little data

Note: this post was originally written in June 2016. It is now very outdated. Please see this guide to fine-tuning for…

blog.keras.io

函数式 API - Keras 中文文档

在函数式 API 中，给定一些输入张量和输出张量，可以通过以下方式实例化一个 Model ： from keras.models import Model from keras.layers import Input, Dense a =…

keras.io

Keras documentation: Callbacks API

A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch…

keras.io

Keras documentation: Image data preprocessing

Generates a tf.data.Dataset from image files in a directory. If your directory structure is: Then calling…

keras.io

About me:

I am a graduate student at the National University of Singapore studying Industry 4.0.

Tricks to prevent overfitting in CNN model trained on a small dataset

1) Shuffling and splitting the data

Random shuffle the training data

2) Design and implement an CNN

Regulization

Dropout

3) Training the CNN on the training and validation data

`Early Stopping`

Data augmentation

Reference

How to use Keras fit and fit_generator (a hands-on tutorial) - PyImageSearch

In this tutorial, you will learn how the Keras .fit and .fit_generator functions work, including the differences…

Building powerful image classification models using very little data

Note: this post was originally written in June 2016. It is now very outdated. Please see this guide to fine-tuning for…

函数式 API - Keras 中文文档

在函数式 API 中，给定一些输入张量和输出张量，可以通过以下方式实例化一个 Model ： from keras.models import Model from keras.layers import Input, Dense a =…

Keras documentation: Callbacks API

A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch…

Keras documentation: Image data preprocessing

Generates a tf.data.Dataset from image files in a directory. If your directory structure is: Then calling…

About me:

Written by Jinwen

No responses yet