The goal of the post is to share my experience with this topic called One Shot Learning which is normally used while we have a small training data set for face recognition. After testing with various codes shared in github and posts shown in references, I made this post to show my observations and collected results.
One shot learning :
It is commonly a classification / categorization / similarity identification technique while having small training data set for computer vision tasks such as object detection, face recognition and hand writing recognition. Normally computer vision models have very large deep neural networks which are all not easy to train and requires more resources and training data. But real time problems doesn’t have that much data to train those large models. One shot learning is the recommended solution for these kind of specific problems.
Sample structure - One Shot Learning - source : reference_1
Here Two Convolution Networks pre-trained models are followed by a custom layer with sigmoid activated final layer used to learn the similarity between the two image inputs. The work flow would be like,
- Pre-trained networks generates the feature vectors from its final layer.
- That custom layer which joins the both pre-trained models, used to found the Euclidean distance between the generated feature vectors.
- Sigmoid activation in last classifies the distance value to its target label.
The pre-trained deep learning neural model Keras-VGG-Face-ResNet-50 is used again for training to learn our custom data faces. The reason for choosing ResNet50 was discussed in the evaluation of Face Authentication. Custom Final layer followed by sigmoid activation function was implemented on tensor layers for calculating the euclidean distance.
The results of Siamese Network test accuracy scores and real time scores are not up to the expectation as discussed in theories. I have experienced these scenarios for various data and also in varying metrics size of the data (but low for single class), number of epochs learned. Until increasing the size of training data for single class, can not found any enhancement in test set accuracy score and real time accuracy.
The results are shown below,
Applying Low Epochs size : 5
Applying Low Epochs size : 50
After increasing the epochs size, The model seems well with cross validation test data. But when this siamese model trained and loaded in real time test, It may even get 0 % accuracy.
$ Loaded model accuracy : 0 %
The point is Siamese network for face authentication with the discussed One shot learning technique is not reliable in my observations or may be i am wrong with implementation (If yes please correct me). As said in theories, the siamese network with transfer learned deep learning neural network can’t learn from lowest data (4-5 images per class. Even they mentioned 1 image per class idk how ?) even highly performing transfer learned model loaded.
One shot learning with siamese network may be work well with simple convolutional neural networks having few layers only. These kind of architecture only fit for the Similarity Detection based tasks such as hand write recognition, shapes similarity level calculation and etc.
If we increase the size of the convolutional network the learning phase would requires more system resources and consumes large time. So continuous / online learning is a difficult one for these kind of situations. Please correct me if any thing wrong.