Snake Classification

using

Neural-Backed Decision Trees

Github Repo Paper

About

Deep learning models are becoming more prevalent in image recognition and prediction tasks. While these models have shown breakthroughs with high performances in accomplishing these previously computationally impossible tasks, mass users have found it hard to trust the outcomes of these algorithms, as the underlying mechanism is opaque. This trust issue makes the explainability of deep learning models vital.

When hiking, encountering snakes is fairly common on a trail; however ascertaining which species a snake is can be a daunting task to most, as there are a plethora of snake species. A tool that can help identify a snake’s species, thus finding out if it is venomous can come in handy to hikers world-wide. However, current methods of snake classification are opaque, given that people do not understand the “black box” of Convolutional Neural Networks. Thus, a method with more interpretability might be able to change the status quo.

This project focuses on building explainable image classification models on snake images from AiCrowd. To do this, we applied Grad-CAM and Neural-Backed Decision Trees to a snake species classification task, in an attempt to remove the "black box" around neural networks by visualiziation.

Data

Source

Our data can be found from the AiCrowd Snake Classification Challenge.
This data included...

  • 45 different snake species to classify
  • 66,000 images in training set
  • 16,500 images in test set

Data types

The data are able to be categorized into three types:

  • Snake blends in with background
  • Snake contrasts background
  • Snake on human hand or arm

difficulties

Certain snakes look completely different at the juvenile stage vs adult stage, or female vs male adult.

For example, the images below show snakes that are female adult (top left and bottom right), adult male (top right), and juvenile (bottom left).

Algorithms

GradCAM

One method of analysis, called Gradient-Weighted Class Activation Mapping (Grad-CAM), uses class-specific information from a Convolutional Neural Network's (CNN) model weights to produce a coarse localization map of the important regions in the image. This tells the user what the model deems important to classifying the image. However, this only tells the user where the model is looking, not why the model thinks the image belongs to a certain class.

Learn More about GradCAM

Neural-Backed Decision Trees

NBDTs are a method to jointly improve both accuracy and interpretability of neural networks, by creating a decision tree from an already trained model - and fine tuning it.

The steps for applying NBDT can be summarized in three steps:

  • Induced Hierarchy
  • Tree Supervision Loss
  • Fine-tuning model


Below is the hierarchy generated by the our model, though due to the nature of the algorithm designed to assign meaning to the splits within the tree, the induced hierarchy failed to work the way this project intended it to (learn more here).

Learn More about NBDTs

Results

Model Performance



Model Accuracy F1-Score
Baseline CNN 0.6617 0.516
SoftNBDT 0.4450 0.302
HardNBDT 0.6850 0.542

References