So how does it work? Well, first we download its weights from Hugging Face (which is like a library for machine learning models) using their `from_pretrained()` method. Then we add some layers on top to do question-answering tasks or language modeling fine-tuning.
For example, let’s say you want to use RoBERTa for extractive question answering like in the SQuAD task. You would load its weights using `from_pretrained()` and then add a span classification head on top with some linear layers to compute start and end logits.
Here’s an example script:
# Import necessary libraries
import torch
from transformers import RobertaForQuestionAnswering
# Load pre-trained RoBERTa model for question answering task
model = RobertaForQuestionAnswering.from_pretrained('roberta-base')
# Define input text and question to be answered
input_text = "What is the capital of France?"
question = "What is the capital city of France?"
# Preprocess inputs for RoBERTa model (tokenization, etc.)
inputs = torch.tensor(model.encode(input_text)) # Encode input text using RoBERTa's tokenizer
query = torch.tensor(model.encode(question)) # Encode question using RoBERTa's tokenizer
# Run inference on pre-trained model to get start and end logits for answer span
outputs = model(inputs, query) # Pass inputs and query to RoBERTa model
start_logits, end_logits = outputs[0] # Get start and end logits from model output
# Find the highest logit value for each possible starting index (i.e., where the question starts)
start_scores = torch.softmax(start_logits, dim=-1).detach().numpy() # Apply softmax to start logits and convert to numpy array
end_scores = torch.softmax(end_logits, dim=-1).detach().numpy() # Apply softmax to end logits and convert to numpy array
# Find the indices of the highest logit value for each possible starting index and end index (i.e., where the question ends)
start_indices = np.argmax(start_scores, axis=0) # Get index of highest start score for each input token
end_indices = np.argmax(end_scores[np.arange(len(inputs)), start_indices], axis=-1) + 1 # Get index of highest end score for each input token, add one to get end index (since RoBERTa uses zero-indexing for input tokens)
# Find the span of text that corresponds to the answer based on the starting and ending indices
answer = "".join([input_text[i] for i in range(start_indices, end_indices)]) # Get substring of input text using start and end indices
# Print answer
print(answer)
RoBERTa can understand complex language and provide accurate answers to your questions.