Now, if you’ve ever heard of this guy before, chances are you’ve already fallen asleep. But don’t freak out!
So what is KL Divergence? Well, let me put it this way imagine you have two cats named Fluffy and Whiskers. You want to compare how much they like their food bowls by measuring something called “cat preference.” But here’s the catch: instead of just comparing which cat eats more out of each bowl (which would be too simple), we need to take into account how much each cat likes or dislikes each bowl.
That’s where KL Divergence comes in! It measures the difference between two probability distributions, and can help us figure out whether Fluffy prefers her food from Bowl A over Bowl B (or vice versa).
Now, if you’re still with me, Let’s kick this off with some math. The formula for KL Divergence is:
D(P || Q) = sum [ P(x) * log2(P(x)/Q(x)) ]
Where P and Q are the two probability distributions we want to compare (in our case, Fluffy’s preferences for Bowl A vs. Bowl B).
So let’s say that Fluffy eats from Bowl A with a probability of 0.75, and from Bowl B with a probability of 0.25. And let’s also assume that Whiskers has the opposite preference she likes Bowl B more than Bowl A (with probabilities of 0.25 and 0.75 respectively).
To calculate KL Divergence, we would plug these numbers into our formula:
D(P || Q) = sum [ P(x) * log2(P(x)/Q(x)) ]
For Fluffy’s preferences for Bowl A vs. Whiskers’ preferences for Bowl B (which is essentially comparing Fluffy to the opposite of herself), we get:
D(Fluffy || Whiskers) = 0.75 * log2(0.75/0.25) + 0.25 * log2(0.25/0.75)
We have our KL Divergence score, which tells us how different Fluffy’s preferences are from Whiskers’. In this case, the result is:
D(Fluffy || Whiskers) = 1.68 bits (or nats if you prefer)
Now go forth and calculate some KL Divergences of your own just don’t blame us if you fall asleep in the process!