Going In Circles

Alright, some time ago I promised to do more on the dots on a line puzzle, so I guess I'll do that now.

On the puzzle page over at cut-the-knot, they had a few writeups of possible ways to arrive at the solution. They varied in their degree of rigour, but they certainly were all more elegant than my method. One in particular led me to examine a few other things about the puzzle (that I honestly do not understand the implications of), and I will go over them here.

This solution was put forward by Stuart Anderson, and it beings by assuming that the dots do not necessarily follow a uniform distribution. Suppose x is the length of an interval between adjacent points, and let f(x)dx be the probability that a particular interval has length between x and x+dx. F(x) will be the cumulative probability distribution, so F'(x)=f(x) and F(x) ranges from 0 to 1, as x ranges from 0 to 1.

We will call an interval "small" if it is smaller than both its neighbours, "large" if it is larger than both its neighbours, and "medium" otherwise. Intervals that are small get painted twice, and medium ones get painted once, while large ones are unpainted. Lets try to find out how many of each region we have and how large each type is. Suppose we have an interval with size x, that occurs with chance f(x)dx, and its left hand neighbour will be smaller than it with chance F(x) (F(x) being the cumulative distribution function, it is the integral of f(y) from 0 to x). Similarly, the right hand neighbour will be smaller with chance F(x), so the total chance that we have a large region is
∫F(x) F(x) f(x) dx
=∫F(x)2 dF

Where we have used f(x)=F'(x). The limits of integration are 0 to 1, since that is what F ranges over. Thus we get 1/3 for the chance an interval is large. A similar calculation shows that you also get 1/3 for small and medium intervals. This means that if an interval is selected at random, it is equally likely to be small, medium, or large, and this is independent of the probability distribution that the points are selected with.

The fact that P(small)=P(large) is somewhat obvious if you consider doing a plot of interval lengths. Small intervals will be a local minimum of the plot, while large intervals will be local maxima. For any two local minima, there must be a local maxima in between and vice-versa, so the number of small intervals must be the same as the number of large intervals (well, within 1 anyway), so in the infinite limit they will have the same probability. It is not clear why medium intervals would be equally likely, but there is probably some underlying reason.

Moving on, the expected length of a large interval is given by
∫x F(x) F(x) f(x) dx

which we can integrate by parts. Let u=x and v=F(x)3-1, so then we have
x(F(x)3-1)+∫(1-F(x)3) dx

v was chosen so that the first term would vanish at the endpoints, so that last integral gives us the average size of a large interval.

One can do a similar calculation for small and medium intervals, and the expressions are only slightly uglier, but this is as far as we can go without giving a specific form to the function F(x).

To proceed, assume that we have distributed the N dots on the line, and we scale things up by a factor of N, so there is on average 1 dot per unit length, then one can show that the function F(x) will tend to 1-e-x for large N (I cannot really prove this, it makes sense, but people who wrote the solutions quoted at cut-the-knot took this result as obvious, so I suspect it is some elementary thing that I simply am unfamiliar with (to be honest, its why I found their solutions unconvincing and did my own awful one, I couldn't get past this point)).

Anyway, using that F for the integral, one gets that the average size of a large interval is 11/18. This must be multiplied by N to account for the fact that there are N intervals, but then divided by N because we scaled everything up by a factor of N.

A similar calculation gives that the average size of a medium interval is 5/18 and small intervals are 2/18.

So the ratio of small:medium:large intervals is 2:5:11. the small+medium intervals account for (2+5)/18 of the total length, giving the answer to the original problem. Note that if the small intervals are counted twice, then the total amount painted is (2+2+5)/18, which is exactly 1/2. I have not yet found a way to explain why this answer would be so clean.

When I first found out about the fact that small, medium, and large segments all appear with equal probability, I was inspired to try a slightly different problem. Suppose we take the unit circle and place 3 points on it, that will divide the circle into 3 parts, one small, one large, and one medium. On average, what is the relative sizes of these three parts?

This is simple enough to calculate, let the points be located at 0, x and y where x and y vary from 0 to 1 (1 being the full circle). By the symmetry of the problem, lets assume that x is smaller than y, and lets assume that the interval between 0 and x is the smallest. So y varies from 2x to 1-x and x varies from 0 to 1/3. The average length can be found from the integral
∫∫x dy dx

over the regions specified. We also need to divide by the integral of 1 over that same region, to divide by the probability that the region is actually a small one, essentially just normalizing properly.

Naturally, you can do a similar calculation for the largest interval and for the other one. In the end, you get the answers 2/18, 5/18, 11/18, unsurprising given that I chose to bring this whole thing up. So one does not need to work in the infinite case, somehow only using 3 points and 3 intervals gives the correct answer, but I have no idea why.

No comments: