BERTViz: Visualizing Attention in BERT

If you haven’t heard of it yet, let me give you a quick rundown. BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art pretrained language model that can understand the context and meaning behind words in a sentence. And now, thanks to BERTViz, we can visualize exactly how it’s doing this!

So what makes BERT so special? Well, for starters, it uses a transformer architecture (which is all the rage these days) that allows it to process entire sentences at once instead of one word at a time. This means it can better understand the context and meaning behind words in a sentence, which is crucial for tasks like sentiment analysis or question answering.

But how does BERTViz help us visualize this? Well, let’s say you have a piece of text that you want to analyze maybe something like “The quick brown fox jumps over the lazy dog.”

Now, if we run this through BERT and ask it to predict whether the sentence is positive or negative (which is called sentiment analysis), it might output something like:

# Import necessary libraries
import bertviz as bz
from transformers import BertTokenizerFast, BertForSequenceClassification

# Load pretrained model and tokenizer
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased') # Load the tokenizer for BERT model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) # Load the pre-trained BERT model for sentiment analysis with 2 labels (positive and negative)

# Preprocess input text
input_text = "The quick brown fox jumps over the lazy dog."
encoded_input = tokenizer(input_text, return_tensors='pt') # Tokenize the input text and convert it into a PyTorch tensor for BERT model input

# Run BERT and get predictions
outputs = model(**encoded_input) # Pass the encoded input to the BERT model
logits = outputs.logits # Get the output logits from the BERT model
predictions = torch.argmax(logits, dim=-1).item() # Get the predicted label by taking the index of the highest logit value
print("Prediction:", predictions) # Print the predicted label (0 for negative, 1 for positive)

This code loads the pretrained BERT model (which is available on Hugging Face’s website), and then uses it to predict whether our input text is positive or negative. The output will be either 0 for “negative” or 1 for “positive”.

But what if we want to see exactly how BERT arrived at this prediction? That’s where BERTViz comes in! By running the same code through BERTViz, we can visualize which words and phrases are most important (or “attention-worthy”) when making our prediction.

Here’s what it might look like:

# Import necessary libraries
import bertviz as bz
from transformers import BertTokenizerFast, BertForSequenceClassification

# Load pretrained model and tokenizer
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased') # Load the tokenizer for BERT model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) # Load the pre-trained BERT model for sequence classification with 2 labels

# Preprocess input text
input_text = "The quick brown fox jumps over the lazy dog."
encoded_input = tokenizer(input_text, return_tensors='pt') # Tokenize the input text and convert it into PyTorch tensors

# Run BERT and get predictions
outputs = model(**encoded_input) # Pass the encoded input to the BERT model
logits = outputs.logits # Get the output logits from the BERT model
predictions = torch.argmax(logits, dim=-1).item() # Get the predicted label by taking the index of the maximum value in the logits
print("Prediction:", predictions) # Print the predicted label

# Visualize attention using BERTViz
attention_map = bz.AttentionMap(model=model, tokenizer=tokenizer) # Create an instance of the AttentionMap class with the BERT model and tokenizer
attention_map.from_input(encoded_input['input_ids'], encoded_input['attention_mask']) # Pass the input ids and attention mask to the AttentionMap class
attention_map.show() # Display the attention map visualization

This code loads the pretrained BERT model and tokenizer (just like before), but then uses BERTViz to visualize the attention map for our input text. The output will be a heatmap that shows which words are most important when making our prediction with red indicating high attention, and blue indicating low attention.

With BERTViz, we can not only predict sentiment accurately but also understand exactly how the model arrived at its decision. And who knows? Maybe someday we’ll be able to use this tool to analyze our own writing and see which words are most important for conveying our message!

SICORPS