[CST-2] Information Theory
Anthony Jones
amj30@cam.ac.uk
Sun, 26 May 2002 12:26:18 +0100
Disclaimer: I might be talking crap here ;)
Firstly, strictly speaking, entropy is a property of the distribution you
are drawing from, not the string that you obtain. I think this might make a
difference to how you look at some of the problems:
On Sun, May 26, 2002 at 11:59:58AM +0100, Timothy Hospedales wrote:
> Good point! I'm still slightly confused though. You're right in that
> looking only at the frequency of symbol occurance from a source is not
> how you want to work out its entropy. (An obvious case being,
> H(0101010...) will have an entropy of 1 rather than 0).
Err, I don't follow your reasoning here -- as far as I can see, if you're
drawing from a distribution that always starts off with 0, then gives 1,
then 0 etc, then the entropy of that distribution is 0, as is the case with
the digits of Pi...
> But the notes say that english has an entropy of 4 bits - which it
> does based on distribution of letters - But actually its much less
> than that if you know about spelling / grammar constraints of the
> source. (Obviously so, or you would only be able to get 8:4
> compression for ASCII since H=K). So is (a) JDG using "entropy"
> loosely there, or (b) do you get different measures of entropy
> depending on how much you let yourself know about the source - Like
> with english or pi or (c) something else entirely?
The *distribution of english letters* has an entropy of 4 bits, not English
language itself. You're right to say a sentence in English has a lower
entropy than this (I think), but bear in mind that most of JDG's notes are
talking about *memoryless* sources -- a source producing correct English
isn't memoryless (q tends to be followed by u etc.)
> And what about the exercise questions that give you alphabets and the
> probabilities of symbols and ask you to calculate the entropy from
> that? - If the alphabets were actually being used to communicate
> something, there would be more gramatical constraints that would
> reduce the actual entropy to below the value you calculate?
Grammar => non-memoryless.
In all the questions you assume (or are told) that the source is
memoryless. If this isn't the case, the entropy you calculate will be wrong
(see page 25 of the notes).
> It looks like entropy suffers from the same impossibility to
> determine that kolmogorov complexity suffers from. Any source you try
> to calculate the entropy of could be discovered to actually be
> deterministic (like pi) thereby making your earlier entropy
> calculation wrong.
If you get an arbitrary string, it's pretty difficult to work out if it's
actually the binary digits of some irrational number or whatever. But that
doesn't change the entropy of the distribution itself. If you try and work
out the entropy of the source based purely on symbol frequencies, you're
assuming it's memoryless, so you'll calculate it wrong. Basically, you can
only calculate the entropy of a source if you know exactly how the source
behaves, so yes, your entropy calculation may be wrong if you don't know
this. That doesn't mean that it's impossible to decide though (if you do
know exactly how the source behaves).
Again, I'll just re-iterate that this might all be crap...
HTH,
Ant