Trying to Understand Tries

In every installment of this series, we’ve tried to understand and dig deep into the tradeoffs of the things that we’re learning about. When we were learning about data structures, we looked at the…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Building a Convolutional Neural Network based Dog Breed Classifier

Can modern Deep Learning approaches be used to build an algorithm that detects dog breeds based on a given input image?

The answer is most certainly: “Yes!”

Can the Convolutional Neural Network (CNN) achieving this task be build up from scratch on a common local machine?

The answer here has to be a little bit more vague:

“Theoretically yes, but it takes a lot of time, data and ressources“.

In this article I want to show how I build a CNN based dog breed classifier while comparing two approaches:

The Dataset

The labeled dog images depict dogs in front of different backgrounds and from different angles.

Images are provided in different sizes and formats and were chosen randomly. The average width is 571px and the average height is 532px. There are some outliers with heights or widths of greater than 3500px

Distribution of width and height in training set as whisker plot

The amount of images per dog breed shows the following general distribution.

On average 60.6 images are available per dog breed with and a standard deviation of 37.5. Whereas we can observe a slight right skew in the data, we can generally assume a rather balanced dataset.

Metrics

The main focus of this experiment is to identify which approach yields the higher ratio of correct predictions to total predictions. Since the dataset is rather evenly distributed and we are not concerned with generating exceptionally high precision or recall in this experimental setup, I am choosing accuracy as the main metric for model comparison.

I am using 20 epochs for both models to generate comparable results.

Used Deep Learning Libraries

Due to its intuitive, high-level synthax for building CNNs I am using the python based deep learning library Keras whereas TensorFlow serves as the backend. Keras provides direct access to pre-trained CNNs. This is the pre-requisite for using a transfer learning based approach in the latter part of the project.

Preparing the data

In order to feed the images to the CNN I am splitting the dataset into 3 subsets: “Training Set (6680 images)”, ”Validation Set (835 images)”, ”Test Set (836 images)”.

For better comparability it is common practice to resize all images to an equally shaped square format. Additionally TensorFlow requires the images to be passed in as a 4D array (or 4D Tensor).

To accomplish the necessary data transformations I am implementing a function which converts given images to a 244x244 square and outputs a 4D tensor with the shape (1,244,244,3). 1 being the number of samples (or number of images) and 3 being the amount of color channels per image (Red, Green,Blue).

Now that functions to provide the images in the required format are prepared, I am loading the train, validation and test sets. Before passing the images to the functions all pixel values need to be normalized. All values need to be represented on the same scale in a range between 0 and 1. Re-scaling can be achieved by dividing all pixel values by 255.

Now that the images have been converted into the right format, I am setting up the CNN architecture from scratch.

CNN Architecture

I am using the Keras Sequential layout to setup the CNN. The model is setup with the following properties:

Below I am displaying the model summary as printed by model.summary()

After Compiling the model using rmsprop as optimizer and categorical cross entropy as loss function

I am training the model for 20 epochs while using a gpu on the training data set.

When loading the model with the best validation loss, it achieves a test accuracy of approx. 7.18%

7.18% after 20 epochs is already better than total random chance for 133 potential dog breeds. But it seems that to reach good performance the approach to build the CNN from scratch is not very efficient.

The model currently seems to memorize the training set showing a strong tendency to overfit the data. Accuracy for the validation set is stagnating and we don’t see real improvements.

The CNN will likely require a multitude of connected Convolutional and Pooling layers, Drop-Out layers etc. which will result in rather high calculation times in order to achieve reasonably good results. Taking a glimpse at award winning CNN architectures reveals the complexity and required ressources.

Therefore instead of continuing this path I will try to use a transfer-learning based approach instead.

2. Using a transfer-learning based approach

I will use ResNet-50 mainly as shape, pattern, as well as feature detector for the images. Therefore the model itself is kept in-tact with its pre-trained weights. Only the output layers are modified and trained. I am adding 2 additional layers to the end of the model:

The architecture as returned by model.summary() is as displayed below:

To compile and train the model I am using a similar setup as above when trying to build the CNN from scratch. I am again training with the same amount of 20 epochs on the same gpu.

I am then evaluating the best trained model on the test images.

The performance is quite remarkable. Approx. 0.814 of the test images displaying dog breeds were classified correctly by the best model. And this already without adding a complex model architecture and within a limited processing time of approx. 5 minutes on a GPU. With transfer learning the model receives a jumpstart with an accuracy of approx. 0.77 out of the box.

Unlike with the above approach of building the model from scratch, we see an improvement on the validation set within the given range of 20 epochs.

I am testing the output of the model given the following image. The dog breed is correctly identified as Labrador Retriever.

Image of sitting Labrador Retriever

Improvement Points

Although the performance of the transfer learning based CNN is already quite remarkable, the classification accuracy could certainly be further improved. Among others the following approaches come to mind:

Since the goal of this project is not to come up with the perfect dog breed image classifier, but rather to evaluate the feasibiliy of two different approaches, the optimization of the network won’t be executed and left for a later point in time.

Learnings

In this project I laid out two different approaches to build a deep learning based CNN to classify dog breeds based on given input images. The first approach “Building the CNN from scratch” proved to be theoretically possible, but not quite reasonable due to limited ressources and the required amount of time to reach acceptable results. Whereas the second, “Transfer-learning based approach” showed quite remarkable results even with almost no model tweaking using limited training ressources.

Therefore transfer-learning appears to be a reasonable and valid approach. By leveraging the power of pre-trained CNNs the creation of image classifiers can be jump-started right away whereas the model can be further adapted to the concrete use-case, such as detecting dog breeds, in a very cost-efficient way.

Add a comment

Related posts:

But a gap analysis is very limiting at the same time.

Gap analysis is simple enough. You take the current state vs. the future desired or potential state, and compare them. Then you figure out what it will take to go from the current to the future…

Amazon S3

Amazon S3 is easy-to-use object storage with a simple web service interface that you can use to store and retrieve any amount of data from anywhere on the web. Amazon S3 also allows you to pay only…

Android UI testing with mock data

How Loblaw Digital created generate and utilize mock data for Android UI tests