Solving K-MDPs with Ferrer-Mestres Algorithm

in

Today we’re going to talk about solving K-MDPs with the Ferrer-Mestres algorithm. This is a fancy way of saying that we’re gonna figure out how to get from point A to point B in the most efficient way possible when there are multiple paths and rewards involved.

Now, before you start rolling your eyes at me for using big words like “K-MDPs” and “Ferrer-Mestres algorithm,” let me explain what these terms mean in plain English. K-MDP stands for “Markov Decision Process with k states.” Essentially, it’s a fancy way of saying that we have multiple options to choose from at each step along the way, but only a limited number of possible outcomes (k).

The Ferrer-Mestres algorithm is just a fancy name for a mathematical formula that helps us figure out which path is best. It involves some pretty complicated math, but don’t worry we won’t be doing any actual calculations today. We’re going to let the computer do all of that heavy lifting for us!

So how does this algorithm work? Well, first we need to define our problem in terms of a decision tree. This is essentially just a fancy flowchart that shows all of the possible paths and outcomes. Here’s an example:

// This is a decision tree that represents a problem with multiple paths and outcomes
// The numbers represent different options or decisions that can be made

// The root node is 10, and it has two child nodes: 30 and 40
// The root node represents the first decision that needs to be made

// The left child node of 10 is 30, and it has two child nodes: 60 and 80
// The left child node represents the first possible outcome of the first decision
// The right child node represents the second possible outcome of the first decision

// The right child node of 10 is 40, and it has two child nodes: 5 and 7
// The left child node represents the first possible outcome of the second decision
// The right child node represents the second possible outcome of the second decision

// The left child node of 30 is 60, and it has no child nodes
// This represents the end of the decision tree, as there are no more possible outcomes

// The right child node of 30 is 80, and it has no child nodes
// This also represents the end of the decision tree

// The left child node of 40 is 5, and it has two child nodes: 9 and 11
// The left child node represents the first possible outcome of the second decision
// The right child node represents the second possible outcome of the second decision

// The right child node of 40 is 7, and it has no child nodes
// This represents the end of the decision tree

// The left child node of 5 is 9, and it has no child nodes
// This represents the end of the decision tree

// The right child node of 5 is 11, and it has no child nodes
// This also represents the end of the decision tree

// The left child node of 7 is null, and it has no child nodes
// This represents the end of the decision tree

// The right child node of 7 is null, and it has no child nodes
// This also represents the end of the decision tree

// Overall, this decision tree represents a problem with multiple decisions and outcomes, and the goal is to reach the end of the tree to find the best solution.

In this example, we have a starting point (the top of the tree) and multiple paths that lead to different outcomes. Each outcome has its own reward value associated with it. For instance, if we choose path A (which leads to 30), our reward is 60. If we choose path B (which leads to 25), our reward is 10 + 25 = 35.

Now that we have defined our problem in terms of a decision tree, it’s time to apply the Ferrer-Mestres algorithm! This involves calculating something called the “value function” for each node on the tree. The value function tells us how good (or bad) each outcome is based on all possible paths that lead to it.

To calculate the value function, we start at the bottom of the tree and work our way up. For instance, let’s say we want to find the value function for node 80. To do this, we need to add up all of the rewards associated with each path that leads to node 80:

# The following code creates a tree structure with nodes and their associated rewards.

# The tree is represented using nested lists, with each list representing a node and its children.

# The rewards are represented as integers.

# The goal is to calculate the value function for a given node by adding up all the rewards associated with each path that leads to it.

# To do this, we start at the bottom of the tree and work our way up.

# For instance, let's say we want to find the value function for node 80.

# To do this, we need to add up all of the rewards associated with each path that leads to node 80.

tree = [
    [10, [30, [60], [80]], [40, [9], [11]]], # The root node with its children and their associated rewards.
    [25, [5], [7]] # The second level nodes with their children and their associated rewards.
]

def calculate_value_function(node): # A function to calculate the value function for a given node.
    if len(node) == 1: # If the node has no children, return its reward.
        return node[0]
    else:
        left_child = node[1] # The left child of the node.
        right_child = node[2] # The right child of the node.
        return node[0] + max(calculate_value_function(left_child), calculate_value_function(right_child)) # Add the reward of the current node to the maximum value function of its children.

value_function = calculate_value_function(tree[0]) # Calculate the value function for node 80.

print(value_function) # Print the value function for node 80.

To get to node 80, we can either take path C (which leads through nodes 30 and 80) or path D (which leads through nodes 40 and 5). Let’s say that the reward for taking path C is 60 + 20 = 80. The reward for taking path D is 10 + 25 + 30 = 65.

Now, let’s calculate the value function for node 80:

// Calculate the value function for node 80
// The value function is the maximum reward for taking either path C or D, divided by the cost of choosing that path

// Define the value function for node 80
let valueFunction = 0;

// Calculate the reward for path C
let rewardC = 60 + 20; // The reward for taking path C is 60 + 20 = 80

// Calculate the reward for path D
let rewardD = 10 + 25 + 30; // The reward for taking path D is 10 + 25 + 30 = 65

// Calculate the cost of choosing path C or D
let cost = 0; // The cost is not specified in the context, so we will assume it is 0 for simplicity

// Compare the rewards for path C and D and assign the maximum value to the value function
if (rewardC > rewardD) {
  valueFunction = rewardC / cost; // If rewardC is greater, assign its value to the value function
} else {
  valueFunction = rewardD / cost; // If rewardD is greater, assign its value to the value function
}

// Print the value function for node 80
console.log("Value Function(node 80) = " + valueFunction);

In this case, the cost of choosing either path is zero. So our value function calculation looks like this:


// The value function calculates the maximum value between two paths, 
// where the cost of choosing either path is zero.
// In this case, the value function is being calculated for node 80.

// The first path has a cost of 60 and leads to node 20.
// The second path has a cost of 10 and leads to node 25, 
// which then leads to node 30.
// The cost of choosing either path is zero, so the value function calculation looks like this:

Value Function(node 80) = Max[60 + 20, 10 + 25 + 30] // Calculate the maximum value between the two paths
                          = Max[80, 65] // Compare the values of the two paths
                          = 80 // The maximum value is 80, so the value function for node 80 is 80.

So the value function for node 80 is 80. This tells us that if we start at the top of our decision tree and follow path C (which leads to node 30), then onward to node 80, we will receive a reward of 80.

Now let’s calculate the value function for node 70:

// Calculate the value function for node 70
Value Function(node 70) = Max[Reward for Path E, Reward for Path F] - Cost of Choosing Path E or F

In this case, the cost of choosing either path is zero. So our value function calculation looks like this:


// The value function for node 7 is calculated by taking the maximum value between the sum of 10 and 35, and the value of 25.
// The cost of choosing either path is zero in this case.

Value Function(node 7) = Max[10 + 35, 25] // Calculate the maximum value between 10 + 35 and 25
                          = Max[45, 25] // 10 + 35 = 45, so the maximum value is 45
                          = 45 // The value function for node 7 is 45.

So the value function for node 7 is 45. This tells us that if we start at the top of our decision tree and follow path B (which leads to node 25), then onward to node 70, we will receive a reward of 45.

And so on… We can continue calculating value functions for each node until we reach the bottom of the tree. Once we have calculated all of these values, we can use them to determine which path is best!

In this example, it’s clear that taking path C (which leads through nodes 30 and 80) will result in a higher reward than any other path. So if our goal is to maximize rewards, we should choose path C!

And there you have it solving K-MDPs with the Ferrer-Mestres algorithm! It may seem complicated at first, but once you understand how it works, it’s actually pretty simple. Plus, computers can do all of the heavy lifting for us, so we don’t even need to worry about doing any actual calculations ourselves!

So next time you have a decision tree with multiple paths and rewards involved, remember use the Ferrer-Mestres algorithm to find the best path!

SICORPS