[CST-2] Kolmogorov complexity - cost of sorting

Alan Lawrence acl33@hermes.cam.ac.uk
Fri, 31 May 2002 14:25:39 +0100


Consider a permutation of N things, e.g. a permutation of a,b,c is 
c,a,b. This can be considered as a series of N numbers, the ith one of 
which tells you which of the original numbers goes in the ith position 
(e.g. 3 1 2 for the example above). Each number ranges between 0 and N-1 
so can be stored in log(N) bits, so the total amount of data is Nlog(N). 
(pedants will point out it can be stored in SUM(i=1...N)log(i) but this 
is not much different).

Clearly the permutation 1,2,3....N is highly compressible. Kolmogorov 
complexity says that some permutations are incompressible, i.e. cannot 
be reduced beyond Nlog(N) size. However, if we could sort an arbitrary 
permutation into order with only Nlog(log(N)) comparisons, we could 
compress it by writing out the constant cost of a decompressor followed 
by the list of the results of the comparisons it made, i.e. Nlog(log(N)) 
space. This is impossible as the permutation was uncompressible => can't 
sort in less than Nlog(N) by comparisons only.

The decompression algorithm (i.e. the reverse sort) is simple. Unless 
your sorting alg ignores some of the comparisons it makes (=> strictly 
better alg that doesn't make them), the sequence of comparison results 
produced by sorting every permutation is different - we are sorting 
ordinals not numbers. The reverse sort is to generate every possible 
permutation and sort each one to see which results in the correct 
sequence of comparisons. We aren't interested in how long the 
decompression process takes but it definitely terminates.

HTH
--Alan

On Friday, May 31, 2002, at 01:51  pm, Jamie Shotton wrote:

>
> I seem to be missing something:  What exactly are we trying to prove?
> That we can't sort faster than nlogn for arbitrary data?  And in the
> argument below, what's to stop you substituting e.g. nloglogn for nlogn?
> AFAICS that makes no difference to the argument but gives a different
> result...?
>
> Puzzled Jamie
>
>>> Imagine we could create a program (constant Kolmogorov complexity)
>>> which sorts in less than nlog(n). Consider the *inverse* of this
>>> program (still constant) which takes a sorted list and
>> outputs it in
>>> the original random ordering in less than
>>> nlog(n)
>>> This would allow us to compress the original random
>> permutation as the
>>> sorted bits + progran(constant). But since it was a *random*
>>> permutation, it's uncompressible! Contradiction -- so we
>> can't sort in
>>> less than nlog(n).
>>>
>>> This ignores the size of the constant of the program, but it still
>>> works if you include that. (Basically, the program would need to be
>>> able to sort in <<nlog(n)-c)
>
>
>
> _______________________________________________
> CST-2 mailing list
> CST-2@srcf.ucam.org
> http://www.srcf.ucam.org/mailman/listinfo/cst-2