If you don’t know what an MDP is (and let’s be real, who does?), it stands for Markov Decision Processes and basically means that we can model decision making situations where outcomes are probabilistic and dependent on previous actions.
Now, before we dive into the details of how to install and use this package, why you should care in the first place. Well, for starters, it’s a lot easier than trying to write your own code from scratch (trust us, we’ve been there). Plus, it comes with all sorts of cool features like visualization tools and optimization algorithms that can help you make better decisions faster.
So, without further ado, let’s get started! To kick things off head on over to the Matlab Central website (https://www.mathworks.com/matlabcentral/) and search for “K-MDP” in the File Exchange section. Once you find it, click on the download link and follow the instructions to install the package.
Now that we’ve got that out of the way, how to use it! First, open up a new Matlab script (or just copy and paste this code into an existing one) and type in:
% Load K-MDP package
addpath('C:\Users\YourName\Downloads\K_MDP'); % Change the path to match your download location! This line adds the K-MDP package to the current Matlab session, allowing us to use its functions.
mdp = kmdp; % Create a new MDP object using the kmdp function from the K-MDP package. This object will be used to define and solve our Markov Decision Process.
% Define the MDP parameters
mdp.States = {'S1', 'S2', 'S3'}; % Define the states of the MDP as a cell array of strings.
mdp.Actions = {'A1', 'A2', 'A3'}; % Define the actions of the MDP as a cell array of strings.
mdp.Rewards = [1, 2, 3; 4, 5, 6; 7, 8, 9]; % Define the reward matrix for each state-action pair.
mdp.Transitions = [0.2, 0.3, 0.5; 0.4, 0.5, 0.1; 0.6, 0.2, 0.2]; % Define the transition matrix for each state-action pair.
% Solve the MDP
[V, policy] = mdp.solve(); % Use the solve function to find the optimal value function (V) and policy for the MDP.
% Print the results
disp('Optimal value function:'); % Display a message to indicate the following output is the optimal value function.
disp(V); % Display the optimal value function.
disp('Optimal policy:'); % Display a message to indicate the following output is the optimal policy.
disp(policy); % Display the optimal policy.
This will load up the K-MDP package and create an empty MDP object that we can fill in with our own data. Next, let’s add some states and actions:
% Load K-MDP package
addpath('K-MDP'); % Add K-MDP package to current directory
% Create empty MDP object
mdp = mdp(2); % Create a 2x2 grid world (4 total states)
% Add states to MDP object
mdp.addState('S1'); % Add state S1 to MDP object
mdp.addState('S2'); % Add state S2 to MDP object
% Add actions to MDP object
mdp.addAction('A1'); % Add action A1 to MDP object
mdp.addAction('A2'); % Add action A2 to MDP object
% Set transition probabilities for each state-action pair
mdp.setTransition('S1', 'A1', 'S1', 0.8); % Set transition probability from state S1 to state S1 when taking action A1 to 0.8
mdp.setTransition('S1', 'A1', 'S2', 0.2); % Set transition probability from state S1 to state S2 when taking action A1 to 0.2
mdp.setTransition('S1', 'A2', 'S2', 1); % Set transition probability from state S1 to state S2 when taking action A2 to 1
mdp.setTransition('S2', 'A1', 'S1', 0.5); % Set transition probability from state S2 to state S1 when taking action A1 to 0.5
mdp.setTransition('S2', 'A1', 'S2', 0.5); % Set transition probability from state S2 to state S2 when taking action A1 to 0.5
mdp.setTransition('S2', 'A2', 'S1', 0.3); % Set transition probability from state S2 to state S1 when taking action A2 to 0.3
mdp.setTransition('S2', 'A2', 'S2', 0.7); % Set transition probability from state S2 to state S2 when taking action A2 to 0.7
% Set reward values for each state-action pair
mdp.setReward('S1', 'A1', 1); % Set reward value of 1 for taking action A1 in state S1
mdp.setReward('S1', 'A2', 2); % Set reward value of 2 for taking action A2 in state S1
mdp.setReward('S2', 'A1', 3); % Set reward value of 3 for taking action A1 in state S2
mdp.setReward('S2', 'A2', 4); % Set reward value of 4 for taking action A2 in state S2
% Set discount factor
mdp.setDiscount(0.9); % Set discount factor to 0.9
% Set convergence threshold
mdp.setThreshold(0.01); % Set convergence threshold to 0.01
% Solve MDP using value iteration algorithm
mdp.solve(); % Solve MDP using value iteration algorithm
% Get optimal policy
policy = mdp.getPolicy(); % Get optimal policy from MDP object
% Display optimal policy
disp(policy); % Display optimal policy
This will create a 2×2 grid world with four total states. Now, let’s add some transitions and rewards:
% Create a 2x2 grid world with four total states
mdp = mdp(2, 2); % mdp function creates a Markov Decision Process object with specified dimensions
% Add transitions to MDP object
% mdp function is used to add transitions to the MDP object
% [0.8 -0.1; 0.1 0.9] represents the probability of transitioning between states (row-major)
% 'P' indicates that the transitions are being added to the MDP object's transition matrix
mdp = mdp(mdp, [0.8 -0.1; 0.1 0.9], 'P');
% Add rewards to MDP object
% mdp function is used to add rewards to the MDP object
% [-2 4; 5 -3] represents the rewards for each state (-2 and 5 are arbitrary values)
% 'R' indicates that the rewards are being added to the MDP object's reward matrix
mdp = mdp(mdp, [-2 4; 5 -3], 'R');
This will set the probability of transitioning from one state to another based on a row-major matrix (i.e., first row represents transitions between states 1 and 2, second row represents transitions between states 3 and 4). The rewards for each state are also specified in this line using a similar format.
Finally, let’s add some actions:
% Create MDP object
mdp = mdp(); % Create an empty MDP object
% Define states and rewards
states = [1, 2, 3, 4]; % Define states as a vector
rewards = [0, 1, 2, 3]; % Define rewards for each state as a vector
% Add transitions to MDP object
transitions = [1, 2; 3, 4]; % Define transitions between states as a row-major matrix
mdp = mdp(mdp, transitions); % Add transitions to MDP object
% Add rewards to MDP object
mdp = mdp(mdp, rewards); % Add rewards to MDP object
% Add actions to MDP object
actions = ['A', 'B']; % Define actions as a character array
mdp = mdp(mdp, [1; 2], actions); % Add actions available at each state (row-major) to MDP object
This will set the number of actions available for each state based on a row-major matrix. In this case, we’re assuming that there are only two possible actions: “up” and “down”.
And that’s it! You now have a fully functional MDP object in Matlab using the K-MDP package. From here, you can use various optimization algorithms to find the best policy (i.e., sequence of actions) for your given situation.