Transformers for NLP 2nd Edition – Chapter 14: Interpreting Black Box Transformer Models

in

First things first: what is a black box model? It’s basically any machine learning algorithm where we don’t really know how it works under the hood, but we trust that it does its job well enough. And when it comes to transformers, they are definitely in the “black box” category.

But don’t freak out! There is a way to peek inside and see what’s going on without having to dive into the details of how each layer works. This is where interpretability tools come in handy. One such tool that we will be discussing today is called Transformers Interpret, which allows us to explain any transformer model using just two lines of code!

Let’s start with an example: say you have a sentiment analysis task and you want to know why the model classified a certain text as positive or negative. With Transformers Interpret, we can get a visual representation of which words in that text had the biggest impact on the final prediction. This is called a “salience map” and it looks like this:

[Insert image here]

As you can see, the word “great” has the highest saliency score (meaning it contributed the most to the positive sentiment), followed by “amazing”, “perfect”, etc. This gives us a pretty good idea of what words were important in making that prediction.

Transformers Interpret also allows us to explain individual predictions for specific examples using a technique called “attention visualization”. This shows us which parts of the input text the model was paying attention to when it made its decision. Here’s an example:

[Insert image here]

In this case, we can see that the model focused on the words “great” and “perfect” in order to make a positive prediction for this review. This is pretty cool because it allows us to understand why the model made its decision based on specific parts of the input text.

With just two lines of code, we can get detailed explanations for any transformer model using Transformers Interpret. And best of all, these visualizations are easy to interpret and don’t require any advanced knowledge of machine learning or data science. So give it a try your inner nerd will thank you!

SICORPS