Optimal Decision Making in MDPs

in

Alright! Are you ready for some optimal decision making?

First things first: what the ***** is an MDP? Well, it’s like playing a game where every move has consequences and rewards that affect your future moves. But instead of just having fun, we want to make optimal decisions based on those rewards. And by “optimal,” I mean the best possible decision given our current knowledge and resources.

Here’s an example: let’s say you have a vending machine with three options Coke, Pepsi, or Sprite. Each drink costs $1.50, but they all give you 20 points in your loyalty program. You want to maximize your points while minimizing the cost of drinks.

To model this as an MDP, we’ll create a table with three rows (for each drink) and two columns (one for rewards and one for costs).

| Drink | Reward Points | Cost ($) |
| — | — | — |
| Coke | 20 | 1.50 |
| Pepsi | 20 | 1.50 |
| Sprite | 20 | 1.50 |

Now, let’s add some decision-making logic to this MDP. We want to choose the drink that gives us the most points per dollar spent (i.e., highest reward/cost ratio). To do this, we can calculate the expected value of each option:

Expected Value = Reward Points / Cost ($)

For Coke, Pepsi, and Sprite, respectively:

– Expected Value for Coke: 20 points / $1.50 = 13.33 points per dollar spent
– Expected Value for Pepsi: 20 points / $1.50 = 13.33 points per dollar spent
– Expected Value for Sprite: 20 points / $1.50 = 13.33 points per dollar spent

Hmm, looks like all three drinks have the same expected value! But wait what if we’re really thirsty and want to drink more than one? Let’s say we can afford two drinks (total cost of $3). Which combination gives us the most reward points for our money?

To answer this question, let’s create a new table that shows the expected value for each possible pairing:

| Drink 1 | Drink 2 | Expected Value |
| — | — | — |
| Coke | Coke | (20 * 2) / $3 = 13.33 points per dollar spent for both drinks |
| Pepsi | Pepsi | (20 * 2) / $3 = 13.33 points per dollar spent for both drinks |
| Sprite | Sprite | (20 * 2) / $3 = 13.33 points per dollar spent for both drinks |
| Coke | Pepsi | (20 + 20) / $3 = 16.67 points per dollar spent for two different drinks |
| Pepsi | Sprite | (20 + 20) / $3 = 16.67 points per dollar spent for two different drinks |
| Coke | Sprite | (20 + 20) / $3 = 16.67 points per dollar spent for two different drinks |

Wow, it turns out that choosing two different drinks gives us the highest expected value! This is because we’re getting more variety in our rewards without sacrificing too much cost.

Remember to always calculate your expected values and choose the option(s) that give you the most reward per dollar spent. And if all else fails, just drink Coke twice!

SICORPS