Deep learning models are becoming more prevalent in image recognition and prediction tasks.
While these models have shown breakthroughs with high performances in accomplishing these
previously computationally impossible tasks, mass users have found it hard to trust the outcomes
of these algorithms, as the underlying mechanism is opaque. This trust issue makes the
explainability of deep learning models vital.
When hiking, encountering snakes is fairly common on a trail; however ascertaining which species a snake is can be a daunting task to most, as there are a plethora of snake species.
A tool that can help identify a snake’s species, thus finding out if it is venomous can come in handy to hikers world-wide.
However, current methods of snake classification are opaque, given that people do not understand the “black box”
of Convolutional Neural Networks. Thus, a method with more interpretability might be able to change the status quo.
This project focuses on building explainable image classification models on snake images from
AiCrowd. To do this,
we applied Grad-CAM and Neural-Backed Decision Trees to a snake species classification task,
in an attempt to remove the "black box" around neural networks by visualiziation.
Our data can be found from the
AiCrowd Snake Classification Challenge.
This data included...
The data are able to be categorized into three types:
Certain snakes look completely different at the juvenile stage vs adult stage, or female vs male adult.
For example, the images below show snakes that are female adult (top left and bottom right), adult male (top right), and juvenile (bottom left).
One method of analysis, called Gradient-Weighted Class Activation Mapping (Grad-CAM), uses class-specific information from a Convolutional Neural Network's (CNN) model weights to produce a coarse localization map of the important regions in the image. This tells the user what the model deems important to classifying the image. However, this only tells the user where the model is looking, not why the model thinks the image belongs to a certain class.
NBDTs are a method to jointly improve both accuracy and interpretability of neural networks, by creating a decision tree from an already trained model - and fine tuning it.
The steps for applying NBDT can be summarized in three steps:
Below is the hierarchy generated by the our model, though due to the nature of the algorithm designed to assign meaning to the splits within the tree, the induced hierarchy failed to work the way this project intended it to (learn more here).
Model | Accuracy | F1-Score |
---|---|---|
Baseline CNN | 0.6617 | 0.516 |
SoftNBDT | 0.4450 | 0.302 |
HardNBDT | 0.6850 | 0.542 |