MNIST Digit Recognition with CNN
Recognizing handwritten digits from MNIST dataset using a CNN in Tensorflow 2.
June 2020
My first project on Convolution Neural Networks (CNN) with Tensorflow 2.
The MNIST dataset consist of handwritten digits of 28x28 pixels.
- 32 feature maps, with a stride of 1x1
- 2 convolutional layers and 2 max-pooling layers
- 1 dropout layer to account for overfitting
- Evaluted using mean squared error
- Optimized using Stochastic Gradient Descent
# Building the network
convnet = Sequential()
# 32 feature maps (equal to the no. of neurons)
convnet.add(Conv2D(32, (4, 4), activation='relu', input_shape=(28,28,1)))
print("Layer 1:", convnet.output_shape)
convnet.add(MaxPooling2D(pool_size=(2,2)))
print("Layer 2:", convnet.output_shape)
# Keras automatically adjusts input shape to match the output shape of previous layer
convnet.add(Conv2D(32, (3, 3), activation='relu'))
print("Layer 3:", convnet.output_shape)
convnet.add(MaxPooling2D(pool_size=(2,2)))
print("Layer 4:", convnet.output_shape)
convnet.add(Dropout(0.3))
convnet.add(Flatten())
print("Layer 5:", convnet.output_shape)
convnet.add(Dense(10, activation='softmax'))
# Compile network
convnet.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
- Conv2D: 4x4 receptive field, 32 feature maps, ReLU.
- Output is 32, 25(=28-4+1)x25 pixel images, each with distinct features.
- No. parameters = 32 * (4 * 4 * 1 + 1) = 544
- Output dim = (None, 25, 25, 32)
- Max-pooling: 2x2 pool size, padding uncertained (ASSUMED TO BE VALID).
- Seperates the image into 2x2 pixels and taking the pixel with maximum value.
- Default stride equal to pool size.
- Since padding == “valid”, no padding, output is 32, 12x12 images.
- If padding == “same”, padding present, output is 32, 13x13 images.
- Output dim = (None, 12, 12, 32)
- Conv2D: 3x3 receptive field, 32 feature maps, ReLU.
- Output is 32, 10(=12-3+1)x10 pixel images.
- No. parameters = 32 * (3 * 3 * 32 + 1) = 9248
- Output dim = (None, 10, 10, 32)
- Max-pooling: 2x2 pool size, padding uncertained (ASSUMED TO BE VALID).
- Seperates the image into 2x2 pixels and taking the pixel with maximum value.
- Default stride equal to pool size.
- Output is 32, 5x5 images.
- Output dim = (None, 5, 5, 32)
- Flatten (None, 5, 5, 32) into (None, 800), a 800-dimensional vector
- Dense layer: 10 neurons, Softmax.
- No. parameters = 800 * 10 = 8000
Results
- 94.61% accuracy after 20 epochs, batch size of 32 out of 60000 samples.