Transformer Encoder Hidden States and Attentions

This is a fancy way of saying that we can see what the model was thinking while processing our input data.

So, let’s say you have some text that you want to analyze using this transformer thingy. The encoder part of it takes your text and breaks it down into smaller pieces called tokens. These tokens are then fed through a series of layers (like a stack of pancakes) where they get transformed and mixed together in different ways.

At the end of each layer, we can see what’s going on inside by looking at something called hidden states. This is like peeking into the brain of the model to see how it’s processing our input data. The output_hidden_states parameter tells us whether or not to return these hidden state tensors for all layers (which can be helpful if you want to do some fancy analysis).

We also have something called attentions which are like little spotlights that the model uses to focus on certain parts of our input data. These attentions help us understand how the model is paying attention to different words and phrases in our text. The output_attentions parameter tells us whether or not to return these attentions tensors for all layers (which can be helpful if you want to do some fancy analysis).

So, let’s say we have a piece of text that says “The quick brown fox jumps over the lazy dog”. If we run this through our transformer model and look at the hidden states and attentions for each layer, we might see something like this:

– Layer 1: The model is paying attention to all the words in the input text. It’s not really focusing on any one word or phrase yet. Layer 2: The model is starting to pay more attention to certain parts of the input text, such as “quick” and “brown”. This could be because these words are important for understanding the meaning of the sentence. Layer 3: The model has now focused in on specific parts of the input text that it thinks are most important for understanding the overall meaning. For example, it might pay more attention to “jumps” and “lazy”. By looking at the hidden states and attentions for each layer, we can see how the model is processing our input data and what words or phrases it’s paying special attention to. This information can be helpful if you want to do some fancy analysis on your text data!

SICORPS