Visualizing Image Attributions using transformers_interpret library in PyTorch for image classification tasks with Google’s Vision Transformer (ViT) model pre-trained on ImageNet-21k

in

Now, if you’re like me and have been living under a rock for the past year or so, let me give you a quick rundown of what all these fancy words mean:

– transformers_interpret library: This is a Python package that allows us to visualize and interpret the predictions made by pre-trained models using various attribution methods. It’s pretty cool because it can help us understand how the model arrived at its decision, which can be useful for debugging or improving our models.

– PyTorch: This is a popular open-source machine learning library that allows us to build and train deep neural networks in Python. We’re using it here because we want to use transformers_interpret with the ViT model pre-trained on ImageNet-21k, which was developed by Google AI and released as part of their Transformers for Image Recognition at Scale paper.

So, let’s get started! First, you need to install both PyTorch and transformers_interpret using pip:

# Install PyTorch using pip
pip install torch

# Install transformers_interpret using pip
pip install transformers-interpret

# Install ViT model pre-trained on ImageNet-21k
# Developed by Google AI and released as part of their Transformers for Image Recognition at Scale paper
# Note: This model is not available through pip and must be downloaded separately
# Instructions for downloading can be found on the Hugging Face website
# Once downloaded, the model can be loaded using the transformers library
# Example code for loading the model:
# from transformers import ViTModel
# model = ViTModel.from_pretrained('google/vit-base-patch16-224-in21k')

# Note: The following code is not necessary as the model is not available through pip
# pip install transformers-interpret

# Note: The following code is not necessary as PyTorch is already installed
# pip install torch transformers-interpret

Once that’s done, we can load the ViT model pre-trained on ImageNet-21k like so:

# Import necessary libraries
from transformers import AutoFeatureExtractor, AutoModelForImageClassification # Importing AutoFeatureExtractor and AutoModelForImageClassification from the transformers library
from transformers_interpret import ImageClassificationExplainer # Importing ImageClassificationExplainer from the transformers_interpret library
from PIL import Image # Importing Image from the PIL library
import requests # Importing requests library for making HTTP requests

# Define the name of the pre-trained model
model_name = "google/vit-base-patch16-224" # Replace this with the name of your pre-trained model

# Load the pre-trained model for image classification
model = AutoModelForImageClassification.from_pretrained(model_name) # Using the from_pretrained method to load the pre-trained model and assigning it to the variable 'model'

# Load the feature extractor for the pre-trained model
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name) # Using the from_pretrained method to load the feature extractor for the pre-trained model and assigning it to the variable 'feature_extractor'

Now, let’s load an image and get its predictions using the ViT model:

# Load an image from a given URL
url = "https://example.com/image.jpg" # replace this with the URL of your image
response = requests.get(url) # sends a GET request to the given URL and stores the response
img_bytes = response.content # extracts the content of the response, which is the image in bytes

# Open the image using the Image module from the PIL library
img = Image.open(io.BytesIO(img_bytes))

# Get predictions using a pre-trained ViT model
predictions, scores = model(feature_extractor(images=img, return_tensors="np")) # replace this with the name of your pre-trained model and the URL or path to your image
# The feature_extractor function extracts features from the given image and converts them into a numpy array
# The model then uses these features to make predictions and returns the predicted labels and their corresponding scores

Finally, we can use transformers_interpret to visualize the attributions for each class:

# Import the necessary libraries
import transformers_interpret

# Create an instance of the ImageClassificationExplainer class and pass in the pre-trained model as a parameter
image_classification_explainer = ImageClassificationExplainer(model)

# Get the predicted label by finding the index of the highest value in the predictions array
predicted_labels = predictions.argmax(-1).item()

# Use the get_attribution method to get the attributions and scores for the given image
# Pass in the image and the predicted label as parameters
attributions, scores = image_classification_explainer.get_attribution(img, target=predicted_labels)

And that’s it! You can now visualize the attributions using various methods such as alpha scaling, gradient input, etc. Here’s an example:

# This script is used to visualize the attributions of an image classification model using various methods such as alpha scaling, gradient input, etc.

# Import the necessary library
import image_classification_explainer

# Define the preferred attribution method
attribution_method = "alpha_scaling"

# Set the display option to show the original image and its attributions side by side
display_option = True

# Set a threshold for highlighting pixels that have a significant impact on the prediction
outlier_threshold = 0.03

# Call the visualize function from the image_classification_explainer library, passing in the defined parameters
image_classification_explainer.visualize(method=attribution_method, side_by_side=display_option, outlier_threshold=outlier_threshold)

Hope this helps! Let us know if you have any questions or suggestions in the comments below.

SICORPS