Explainable Neural Networks: Recent Advancements, Part 4

Looking back a decade (2010–2020), a four part series

Published in

Towards Data Science

7 min readFeb 7, 2021

Where are we?

This blog focusses on developments on explainability of neural networks. We divide our presentation into a four part blog series:

Part 1 talks about the effectiveness of Visualizing Gradients of the image pixels for explaining the pre-softmax class score of CNNs.
Part 2 talks about some more advanced/modified gradient based methods like DeConvolution, Guided Back Propagation for explaining CNNs.
Part 3 talks about some short comings of gradient based approaches and discusses alternate axiomatic approaches like Layer-wise Relevance Propagation, Taylor Decomposition, Deep LiFT.
Part 4 talks about some recent developments like Integrated Gradients (continuing from part 3) and recent novelties in CNN architecture like Class Activation Maps developed to make the the feature maps more interpretable.

Axiomatic Approaches

Continuing from the last section, we tie together axiomatic approaches and gradient based approaches. We discuss a gradient based approach which follows all the desired axioms.

Integrated Gradients (2017)

In the last section, we saw how Taylor Decomposition, assigns a product of gradient and difference of pixel values (and pixels of the baseline image) as the relevance of individual pixels. DeepLiFT assigns a similar product of the coarse gradient and the difference of pixel values between input and baseline image. From the RevealCancel rule, we can observe that the relevance of the individual pixel is a discrete path integral of the (coarse) gradient along positive part of Δx followed by negative part of Δx. This begs the question:

How effective are path integrals over gradients of score function at feature attribution of input image pixels.

This idea of using Integrated Gradients was studied by Mukund Sundararajan, Ankur Taly and Qiqi Yan in their work “Axiomatic Attribution for Deep Networks (ICML 2017)”. The authors critique the then popular attribution schemes along two desirable axioms that they would like all feature attribution schemes to satisfy:

Axiom 1. Sensitivity: Whenever the input and baseline differ in exactly one feature, the differing feature should be given non-zero attribution.

It can be shown that LRP and DeepLiFT follow sensitivity due to the Conservation of Total Relevance. But gradient based methods do not guarantee the Sensitivity Axiom. This happens because of saturation at ReLU or MaxPool stages when the score function is locally “flat” with respect to some input features. Passing relevance or attribution in a proper way through saturated activations is a recurring theme in all feature attribution research works.

Axiom 2. Implementation Invariance: Whenever two models are functionally equivalent, they must have identical attributions to input features

Implementation Invariance is mathematically guaranteed by “vanilla” gradients. But coarse approximation to gradients like LRP and DeepLiFT might break this assumption. The authors show examples of LRP and DeepLiFT violating the Implementation Invariance axiom.

The authors propose using Integrated Gradients for feature attribution:

Integrated Gradients, Source: https://arxiv.org/pdf/1703.01365.pdf

The authors further show that the above definition follows both the desirable assumptions:

Sensitivity: By the Fundamental Theorem of Calculus, Integrated Gradients sum up to the difference in feature scores, just like LRP and DeepLiFT. Hence they follow sensitivity just like LRP and DeepLiFT.
Implementation Invariance: Since it is defined completely in terms of gradients, it follows implementation invariance.

The default path used by the integral is a straight line path of the feature from baseline to the input. The choice of path is immaterial with respect to the above axioms. The straight line path has the additional property of being symmetric with respect to baseline and input image.

Here are some of the results provided by the author on GoogleNet model trained on the ImageNet dataset:

Integrated Gradients Vs Gradients, Source: https://arxiv.org/pdf/1703.01365.pdf

Novel Architectures

Along with developments in Neural Networks for better performance at image recognition and localization, there has also been interest in modifying the network architecture to be more interpretable. In our previous sections, we have discussed about some methods for visualizing the feature maps of CNNs. But beyond the feature maps, CNNs also have a stack of fully connected layers (on top of the last CNN layer) that convert the filtered feature maps to a pre-softmax score, which is not very interpretable. This section talks about some works which try to make the CNN architecture more interpretable.

Class Activation Maps(2016)

Since the fully connected layer is not very easy to interpret, some researchers suggested replacing the fully connected layer with a Global Average Pooling (GAP) on each feature map of the (last) CNN layer to reduce it into a one dimensional tensor and a single linear layer on top of it before feeding to the softmax. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba pointed out that such feature maps (from the last CNN) are more interpretable in their work Learning Deep Features for Discriminative Localization (CVPR 2016).

CNN Architecture with Global Average Pooling, Source: https://arxiv.org/pdf/1512.04150.pdf

How does GAP make the CNN more interpretable? What interpretations can we derive from such an architecture?

The authors note that the GAP layer reduces each feature map into a single scalar and that the weights following it can be interpreted as the relevance of each feature map towards a particular class. To illustrate this, the authors define Class Activation Map (CAM) of each class as the weighted sum of the respective feature maps corresponding to that class.

Class Activation Map, Source: https://arxiv.org/pdf/1512.04150.pdf

The authors show that CAMs are good at image localization though they are trained only for image recognition. Here are some example CAMs of various classes for a single image shown by the authors.

CAM visualizations for different classes, Source: https://arxiv.org/pdf/1512.04150.pdf

GradCAM and Guided GradCAM (2019)

One limitation of CAM visualization was that it could only be applied to architectures with Global Average Pooling (GAP). Ramprasaath Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra extended the CAM approach to other architectures and proposed the idea of GradCAM and Guided GradCAM in their work “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (IJCV 2019, ICCV 2017)”.

The authors note that in a GAP based CNN, all the pixels in a feature map in the last CNN layer receives the same gradient from the layer above during back propagation. Pivoting this idea, the authors propose that in a general CNN, the average gradient received by the feature map in the last CNN layer can be used as the corresponding weight for defining the Class Activation Maps. The authors call this Gradient based CAM or GradCAM in short.

Local Linear Approximation of Fully Connected to layers to extract CAM weights, Source: https://arxiv.org/pdf/1610.02391.pdf

Visualizing the resulting GradCAM images (upsampled to the input image size) provides a heat map describing which portions of the image strongly influence the output.

GradCAM Dog Vs Cat visualization, Source: https://arxiv.org/pdf/1610.02391.pdf

GradCAM only highlights portions of the input image responsible for a particular class activation. For more fine grained details, the author suggest running Guided BackProp and multiplying the resulting signals element wise with GradCAM. The authors call this Guided GradCAM.

Guided GradCAM, Source: https://arxiv.org/pdf/1610.02391.pdf

The authors show many interesting application of Guided GradCAM including visualizing a wide variety of tasks performed by neural networks like image recognition, visual question answering, detecting gender bias, establishing better human trust in the AI system etc.

The Road Ahead

This blog highlighted some of the remarkable works in visualizing the decisions made by a neural network largely in the domain of image recognition/localization made in the last decade (2010–2020). Though this discussion has been limited to computer vision topics, most of the approaches discussed here have been successfully applied to many areas of Natural Language Processing, Genomics etc. Explainable neural networks is still evolving and new researches are still coming up every day to better explain the decisions made by the neural network.

The last decade has seen growing interest in questions of transparency, fairness, privacy, trust in AI. There are many interesting works in this more general domain of explainable AI like LIME, SHAP etc. Some researchers have been interested in exploring new machine learning models like Soft Decision Tree, Neural-Backed Decision Tree which are implicitly explainable and also powerful enough to extract complex features/representations. Most approaches listed in this blog do not involve retraining the network and merely peek into the network to provide visualizations. Some recent work called PatternNet has challenged this assumption and has explored neural network architectures which are more effective at generating explanations. A lot of exciting work is surely on the way!

The Holy Grail of explainable AI is when AI can help humans make new discoveries from data and guide our decision making beyond just providing us with the “correct” answers.