Here’s how it works:
1. Split the PRNG key (that’s just a fancy name for a number that helps us generate random numbers) using jax.random.split() into three keys, including one for Flax Linen Dropout. This allows us to use different streams of randomness in our model and ensures full reproducibility.
2. Define your model with dropout by subclassing flax.linen.Module(), adding a flax.linen.Dropout() layer, and passing the ‘dropout’ stream as an argument when calling apply_fn(). This allows us to enable or disable dropout based on whether we are training (or train) is True or False.
3. During evaluation, use the same code without Flax Dropout enabled by setting training=False in your forward pass and not passing a RNG key to state.apply_fn().
4. In the training step function, generate a new PRNG key from the dropout_key using jax.random.fold_in() or jax.random.split(), depending on whether you want unique data or longer sequences of PRNG streams. Then pass this new key to state.apply_gradients() as an extra parameter when performing the forward pass and calculating gradients.
5. Repeat steps 2-4 for each training step, ensuring that dropout is applied randomly during training but not during evaluation. This helps prevent overfitting by forcing our model to learn more robust features that can generalize well to new data.
Here’s an example code snippet:
# Import necessary libraries
import jax.numpy as np # Importing numpy library and renaming it as np for easier use
from flax import linen as nn # Importing flax library and renaming it as nn for easier use
from flax.training import train_state # Importing train_state from flax.training library
from optax import softmax_cross_entropy, adamw # Importing necessary functions from optax library
# Define a class for our model
class MyModel(nn.Module):
def __init__(self):
super().__init__(params) # Calling the parent class constructor
@nn.compact
def apply(self, params, x, training=False, rngs={'dropout': jax.random.PRNGKey(0)}):
# Define your model here...
# This function applies the model to the input data x, using the parameters params
# The training parameter is used to indicate whether the model is being used for training or evaluation
# The rngs parameter is used to provide a random number generator for dropout
# Define a class for our training step
class TrainStep:
def __init__(self):
self.params = init_fn() # Initializing the parameters for our model
self.optimizer = adamw(learning_rate=1e-3) # Initializing the optimizer with a learning rate of 1e-3
self.state = train_state.TrainState.create(apply_fn=MyModel, params=self.params, tx=optax.transform('clip', 0., 1.), opt_state=self.optimizer)
# Creating a train state object with our model, parameters, and optimizer
def __call__(self, batch):
# Define your training step function here...
# This function is called for each training step and takes in a batch of data as input
# It updates the parameters and optimizer state based on the loss and logits calculated from the batch
# Run the training loop for a specified number of steps or until convergence is reached
for i in range(num_steps):
state = TrainStep() # Creating a TrainStep object
loss, logits = state.step(batch) # Calling the step function to update the parameters and optimizer state and get the loss and logits for the batch
And that’s it! With Flax Dropout, you can easily add randomness to your neural network models during training and prevent overfitting without sacrificing performance or accuracy.