Low-Rank Adaptation of linear layers: extra parameters (in orange) are added next to the frozen layer (in blue), and the resulting encoded hidden states are added together with the hidden states of the frozen layer. -

Imagine you have a pre-trained model that does really well at recognizing images, but you want to use it for something else like text classification or machine translation. Instead of starting from scratch and training the entire model again (which can take forever), we’re going to add some extra parameters next to the frozen layers in blue and train just those new ones in orange.

Here’s what that might look like:

# Load pre-trained model with frozen layers
model = load_pretrained() # Load a pre-trained model with frozen layers

# Add LoRA layers for fine-tuning (in parallel with original weights)
for layer in range(num_layers):
    # Get input and output shapes of the current layer
    input_shape, output_shape = get_layer_shapes(model.get_layer(layer)) # Get the input and output shapes of the current layer in the model
    
    # Define LoRA matrices A and B (with intermediate rank r)
    A = tf.keras.layers.Dense(input_shape[1], use_bias=False)(inputs) # Define a dense layer with input shape and no bias
    B = tf.keras.layers.Dense(output_shape[-1], use_bias=False, kernel_initializer='zeros')(A) # Define a dense layer with output shape, no bias, and zero kernel initializer
    
    # Add LoRA layers to the model (in parallel with original weights)
    inputs = model.get_layer(layer).input # Get the input of the current layer in the model
    outputs = B + model.get_layer(layer)(inputs) # Add the output of the LoRA layer to the output of the current layer in the model
    
    # Define new layer for fine-tuning and add it to the model
    model.add(tf.keras.layers.Dense(output_shape[-1], use_bias=False, kernel_initializer='zeros')) # Define a new dense layer with output shape, no bias, and zero kernel initializer and add it to the model

So basically we’re taking the input from the frozen layers in blue (which are already really good at recognizing images), and adding some extra parameters next to them in orange that will help us fine-tune for our new task. We then add these LoRA layers in parallel with the original weights, so they don’t interfere too much but still have an impact on the final output.

Hope this helps clarify things! Let me know if you have any other questions or if there’s anything else I can do for you.

Low-Rank Adaptation of linear layers: extra parameters (in orange) are added next to the frozen layer (in blue), and the resulting encoded hidden states are added together with the hidden states of the frozen layer.

Social

About

Privacy