What the heck is entropy?

I've read many articles online with differing, vague, descriptions of entropy. But I recently found out that it has a clear mathematical representation!

Suppose you're performing an experiment which has 5 possible outcomes. We'll label them i=1,...,5i = {1, ..., 5}. And suppose that they each occur with probability p(i)p(i). pp is called a distribution, which is just a fancy name for a function that gives you probabilities.

You can associate a value with the distribution pp called entropy. It tells you how many of the outcomes are "reasonably likely". If only one outcome is likely, the entropy is low. If many are likely, it's high.

I'm going to show you the formula for entropy and then try to make sense of it.

Here it is... E(p)=i=1i=Np(i)logp(i) E(p) = - \sum_{i=1}^{i=N} p(i) \log p(i)

Let's pick this apart. There's a sum over all possible outcomes. For each term in the sum, you multiply the probability of the outcome with the log of the probability. And there's a negative sign out in front.

In case you've forgotten what a log graph looks like, here it is...

Here's an important detail: probabilities are always between 00 and 11. And from the graph above, we can see that log(x)\log(x) is negative when x<1x < 1. So logp(i)\log p(i) is always negative.

This explains why there's a minus sign in our formula for entropy. Each term in the sum is negative, so the result of the sum negative. It feels good to have entropy be a positive number, so we stick a minus sign in front.

Next, let's find the entropy for some simple distributions.

Suppose you have an experiment with NN outcomes. And let's say that one outcome has probability 11 and all other outcomes have probability 00. What's the entropy of this distribution?

Consider the outcome with probability 11. It's term in the sum is 1log11 \log 1 , which is zero because log1\log 1 is zero. And all the other terms are zero because p(i)p(i) is zero. So the entropy is 00.

Okay, let's consider another distribution. Suppose all outcomes have equal probability. Then p(i)=1Np(i) = \dfrac{1}{N} for each outcome. The sum becomes...

E(p)=i=1i=N1Nlog1N=log1N=logN \begin{aligned} E(p) & = - \sum_{i=1}^{i=N}\dfrac{1}{N} \log \dfrac{1}{N} \\ & = - \log \dfrac{1}{N} \\ & = \log N \end{aligned}

In words: when all NN outcomes have equal probability, the entropy is logN\log N. This is the maximum possible entropy for a set of NN outcomes.

These examples match what I said earlier! When entropy is high, many outcomes are likely. When the entropy is low, very few outcomes are likely. In the first example only one outcome is likely, so the entropy is low. In the second example, all outcomes are equally likely, so the entropy is high.

The demo below lets you play with a probability distribution with 55 outcomes and see its entropy. You can drag the gray handles to move a bar up or down.

Try making all the bars equal. You should see that the entropy is log5\log 5 (or roughly 2.322.32), as per the formula above.

p = 0.05
p = 0.16
p = 0.31
p = 0.31
p = 0.16
Entropy 2.11

Something tricky to be aware of: the sum of probabilities of all outcomes should equal 11. When you move a bar around, the demo will shift the probabilities of other bars so that they all sum to 11.

That's all for now! I'd like to talk about the applications of entropy, but I don't know enough about that yet. When I learn more, I'll write a part 2.


Interested in more posts like this one? Follow me on Twitter!