Attention Masks for Machine Learning with PyTorch

in

For example, imagine we have a sentence: “The quick brown fox jumps over the lazy dog.” If we want our model to focus on the words that actually matter (like “quick” or “jumps”), we can use an attention mask to tell it which parts of the input sequence are important.

Here’s how you might create an attention mask in PyTorch:

# Define your input tensor and attention mask
input_tensor = torch.randn(10, 5) # A random input tensor with shape (batch size, sequence length)
attention_mask = torch.ones(10, 5).bool() # An attention mask that is True for all elements in the input tensor

# Apply the attention mask to your model's output
output = my_model(input_tensor)
output *= attention_mask # Multiply the output by the attention mask to get a weighted result

# Define input tensor with shape (batch size, sequence length)
input_tensor = torch.randn(10, 5)

# Define attention mask with shape (batch size, sequence length)
attention_mask = torch.ones(10, 5).bool()

# Apply the attention mask to the output of the model
output = my_model(input_tensor)

# Multiply the output by the attention mask to get a weighted result
output *= attention_mask

In this example, we create an attention mask that is True for all elements in our input tensor. This means that every part of the input sequence will receive equal attention from our model.

However, if you want your model to focus on specific parts of the input sequence (like “quick” or “jumps”), you can modify the attention mask accordingly:

# Define your input tensor and attention mask
input_tensor = torch.randn(10, 5) # A random input tensor with shape (batch size, sequence length)
attention_mask = torch.zeros(10, 5).bool() # An attention mask that is False for all elements in the input tensor
attention_mask[2] = True # Set the third element of the attention mask to True (i.e., focus on "quick")
attention_mask[6] = True # Set the seventh element of the attention mask to True (i.e., focus on "jumps")

# Apply the attention mask to your model's output
output = my_model(input_tensor) # Pass the input tensor to the model and store the output
output *= attention_mask # Multiply the output by the attention mask to get a weighted result, focusing on the elements specified in the mask. This allows the model to pay more attention to specific parts of the input sequence.

In this example, we create an attention mask that is False for all elements in our input tensor. This means that every part of the input sequence will receive zero attention from our model (i.e., it’s ignored).

However, we set the third and seventh elements of the attention mask to True, which tells our model to focus on those specific parts of the input sequence (“quick” and “jumps”). This can be useful if you have a lot of noise or irrelevant information in your data that you want to filter out.

I hope this helps clarify how attention masks work!

SICORPS