Getting Started with Maze RL Framework

If you haven’t heard of them before, let me explain what they are in simple terms: these frameworks allow us to train and test reinforcement learning agents for solving complex mazes using Python.

Now, I know some of you might be thinking “Why would anyone want to do that?” Well, the answer is quite simple because it’s fun! And not only that, but it can also help us understand how RL works in practice and improve our skills as developers. Plus, who doesn’t love a good challenge?

So without further ado, Let’s roll with this maze RL framework thingy and see what all the fuss is about! To start you need to install it. This can be done using pip or conda (depending on your preference) by running:

# This script installs the mazerl-framework using either pip or conda, depending on user preference.

# Install mazerl-framework using pip
pip install mazerl-framework

# OR

# Install mazerl-framework using conda from the conda-forge channel
conda install -c conda-forge mazerl-framework

Once you’ve installed it, you can start playing around with the framework. The first thing we need to do is create a new environment for our agent:



# Import the MazeEnv class from the maze_rl library
from maze_rl import MazeEnv
# Import the numpy library and rename it as np for easier use
import numpy as np

# Create a new class called MyAgent that inherits from the MazeAgent class
class MyAgent(MazeAgent):
    # Define the constructor method for the MyAgent class
    def __init__(self):
        # Call the constructor method of the MazeAgent class
        super().__init__()
        # Initialize the state variable to None
        self.state = None
        # Initialize the reward variable to 0
        self.reward = 0
        # Initialize the done variable to False
        self.done = False
        
    # Define the reset_state method for the MyAgent class
    def reset_state(self, state):
        # Set the state variable to the given state
        self.state = state
        # Return a numpy array of zeros with a shape of (16, 16)
        return np.zeros((16, 16))
    
    # Define the get_action method for the MyAgent class
    def get_action(self, observation):
        # TODO: implement your own policy here!
        # This method is used to determine the action to take based on the given observation
        # The policy for this agent still needs to be implemented
        pass

In this example, we’re creating a new agent called `MyAgent`, which inherits from the `MazeAgent` class provided by the framework. We also initialize some variables that will be used later on (state, reward, and done). The `reset_state()` function is responsible for resetting our internal state to the given initial state.

Now let’s move onto the most important part training! To do this, we need to create a new instance of our agent and pass it into the environment:

# Import necessary libraries
from maze_rl import MazeEnv # Importing the MazeEnv class from the maze_rl library
import numpy as np # Importing the numpy library and aliasing it as np
from myagent import MyAgent # Importing the MyAgent class from the myagent library

# Create an instance of the environment
env = MazeEnv() # Creating an instance of the MazeEnv class and assigning it to the variable env

# Create an instance of the agent
agent = MyAgent() # Creating an instance of the MyAgent class and assigning it to the variable agent

# Loop for 10 episodes
for i in range(10):
    # Take a step in the environment
    observation, reward, done, info = env.step([np.array([[1, 2], [3, 4]])]) # Taking a step in the environment by passing in an action as a numpy array and assigning the returned values to the respective variables
    if done:
        break # If the episode is done, break out of the loop
    
    # Reset the agent's internal state
    state = agent.reset_state(observation) # Calling the reset_state() function of the agent and passing in the observation as the initial state
    # Get the next action from the agent
    next_action = agent.get_action(state) # Calling the get_action() function of the agent and passing in the state to get the next action to take in the environment

In this example, we’re creating a new instance of the `MazeEnv()` class and passing it into our `MyAgent`. We then loop through 10 episodes (or iterations), where each episode consists of taking an action based on our current state, receiving feedback in the form of reward and done flags, and resetting our internal state if we reach a terminal state.

And that’s it! You now have a basic understanding of how to get started with maze RL frameworks using Python. Of course, there are many more advanced techniques you can use (such as deep learning or policy gradients), but for the sake of simplicity, let’s stick to this basic example for now.

SICORPS