Standard Hat Rules Apply: November 2009

Its All Red

Time to post the solution to the balls in a bag problem from last time. The solution actually gets pretty crazy, and I actually had to make up a tiny amount of new math (new to me, anyway) in order to justify the solution that Matt came to me with, but all in all it was a good adventure.

First of all, we are going to select a single colour of ball from the bag (let it be called "red") and only watch the progress of that colour. If red is ever eliminated we will declare the experiment a failure and try again, if red is ever the only colour left we ask how many steps it took. Naturally this means that only 1/N of our experiments even end in success, so I will have to account for that at the end, but we will cross that bridge when we come to it. Next, let us define a function F(k,m) as the probability that we are in a state with k red balls after m steps. Naturally F(1,0)=1, F(k,0)=0 for k ≠ 1, and the asymptotic behaviour of F in m is

lim_m->∞F(N,m)=1/N
lim_m->∞F(0,m)=(N-1)/N
lim_m->∞F(k,m)=0 for all other k

In words, this is that our initial condition has exactly 1 red ball, and in the long run we will either have all red balls (probability 1/N) or no red balls (otherwise). It is worth noting that we expect that there are no experiments that run forever (that would be lim_m->∞F(k,m)=nonzero for k neither N nor 0), that have a nonzero probability. This is to be contrasted with the chance that the expectation value of the path length being infinity, which I have not ruled out yet. Of course, summing F(k,m) over k always gives 1.

Alright, now that we have that, we need to find the recursion relation on F. If we are entering the (m+1)^th step, the chance that we get to having k red balls is given by:

F(k,m+1)=[(k-1)/N]*[(N-(k-1))/(N-1)]F(k-1,m)
+[(N-(k+1))/N]*[(k+1)/(N-1)]F(k+1,m)
+[k/N*(k-1)/(N-1)+(N-k)/N*(N-k-1)/(N-1)]F(k,m)

that is, the chance we drew a red then non-red from k-1, plus the chance we drew a non-red then red from k+1 and the chance we drew two red or two non-red from k. This can be written as:

F(k,m+1)=1/(N(N-1))*
[(k-1)(N-k+1)F(k-1,m)+
(N-k-1)(k+1)F(k+1,m)-
2k(N-k)F(k,m)]
+F(k,m)

Why I separated out that F(k,m) term will be apparent later.

Let us now make a series of vectors, V_m with N-1 elements so that the k^th element of the vector V_m is given by F(k,m). Then the recursion relation gives us

V_m+1=MV_m

and the matrix M is specified by the recursion relation. Note that V only keeps track of having 1 through N-1 red balls left, so information "leaks" out the ends of it and the sum of the elements of V is not conserved. It is also worth noting that V_m tends to the zero vector as m tends to infinity, no matter how the initial data for V looked (that point will turn out to be extremely important later, make sure you convince yourself of it now).

Ok, let us try to get a feel for the matrix M that iterates on V. The diagonal of M is the term that keeps F(k,m) at F(k,m+1), that is to say that the k^th element on the diagonal is 1-2k(N-k)/(N(N-1)). There are also off-diagonal terms adjacent to the diagonal, but everything else is zero. For notational convenience, I will denote N(N-1)=Z. For some large N, the first part of the matrix looks like:

[1-2(N-1)/Z, 2(N-2)/Z, 0, 0, 0, ....]
[(N-1)/Z, 1-2*(2(N-2))/Z, 3(N-3)/Z, 0, 0, ....]
[0, 2(N-2)/Z, 1-2*(3(N-3))/Z, 4(N-4)/Z, 0, ....]
....

The k^th diagonal element is 1-2*k(N-k)/Z, above and below the k^th diagonal element is k(N-k)/Z. Every column adds to 1, except for the first and last columns, which represent probability leaking out the ends.

Now that we have this, what the heck do we do with it. Well, we want to find the probability that we end on the m^th step, times m, summed over all m. That gives us the expectation value of what step we end on. The probability that we end on the m^th step is the last element of V_m-1 (the penultimate state) times (N-1)/N (the chance we draw a red ball) times 1/(N-1) (the chance we draw the final non-red ball). So we want

Σ m/N [0,0,0,....0,1]V_m-1

summed over m. Actually this is not quite right, since only 1/N of the experiments even end the way we want, we must also multiply by N. Anyway, we know that V_m-1=M^m-1V₀, and V₀ is given as the column vector [1,0,0,0,...] (which I will write as [1,0,0,0,...]^T, for lack of ability to write column vectors).

To put it together, we seek to find

Σ m [0,0,0,....0,1]M^m-1[1,0,0,0,...]^T

summed over m=0 to ∞ of course.

Much of that is just constants relative to the sum, and we really just need to evaluate

Σ m M^m-1

Now, if instead of a matrix M, we had a real number, x, we could easily use

d Σ x^m/dx = Σ m x^m-1

so we would like to say:

Σ m M^m-1=d Σ M^m/dM

Ok, so can we take the derivative with respect to a matrix? Well, its not at all uncommon to do that in physics, we often consider scalar function S(M) as define the derivative (dS/dM)_ij as the derivative of S with respect to the ij^th component of M. This means dS/dM is a matrix. Following this up, we would expect that if we have a matrix function T(M) with two component indices (such as M² or M^-1), then dT/dM would have four indices. If we define the derivative dT/dM to have two indices by using the definition

dT/dM_ij=Σ (d/dM_ik)T_kj

summing over k (as we do in matrix multiplication), then using

(d/dM_ij)M_kl=δ _ikδ _jl

it is easy to show that

dMⁿ/dM=nM^n-1

for all integers n (to get the inverses, you must use that d/dM (M^-1M)=0).

Alright, so we can now say with confidence that

Σ m M^m-1=d Σ M^m/dM

So now we just need a way to evaluate Σ M^m. Again, if this were a real number x, we would simply say that

Σ x^m=1/(1-x)

(as long as |x|<1) so how can we do something similar for a matrix? First of all, let us consider the p^th partial sum, Σ M^m summed m from 0 to p. It is easy to see that

(1-M)Σ M^m=1-M^p+1

so as long as we have (1-M) being invertible and lim_p->∞M^p+1=0, then we can say that Σ M^m=(1-M)^-1

Remember when I said that V_m tends to zero no matter what the initial data is? this fact can prove both of the statements that we need. A matrix is really just defined in terms of how it acts on vectors, thus if a matrix obeys the relationship lim_p->∞M^pV=0 for all vectors V, then it means that lim_p->∞M^p=0, since it really apparently is the zero matrix. We knew from before that this is true for our matrix M. I'll admit that I have not really proven it, but it is apparent from the physics of the problem (people who know me will understand that when I say something is "apparent from the physics of the problem", I really mean "I don't know how to prove this, but it is so obvious from the way things are set up, I feel no need to").

So we can now say with confidence that

(1-M)Σ M^m=1

when we take the sum to infinity. Now, can we invert the matrix (1-M) (which I will call W)? Well, first let us assume that W has no inverse. This means that W must have a null space, that is to say there is a nonzero vector U such that WU=0. This naturally implies that (1-M)U=0 so MU=U, U is an eigenvector with eigenvalue 1. But this means that M^pU=U for all p, and so that is even true in the limit of large p, but for large p we have M^p must vanish. This is a contradiction proving that W (=(1-M)) is invertible.

So, earlier, we had seen that the answer to the question we seek was given by

Σ m [0,0,0,....0,1]M^m-1[1,0,0,0,...]^T

which we can now rewrite as

[0,0,0,....0,1]W^-2[1,0,0,0,...]^T

That is, we seek the inverse of the matrix (1-M) and we want the element in the last row and first column in the square of said matrix.

W looks like

[2*(N-1)/Z, 2(2-N)/Z, 0, 0, 0, 0,.....]
[(1-N)/Z, 2*2(N-2)/Z, 3(N-3)/Z, 0, 0, 0,.....]
[0, 2(N-2)/Z, 2*3(N-3)/Z, 4(N-4)/Z, 0, 0,.....]
.....

Z being N(N-1) as before. We can actually just factor out 1/Z now, and get

[2*(N-1), 2(2-N), 0, 0, 0, 0,.....]
[(1-N), 2*2(N-2), 3(3-N), 0, 0, 0,.....]
[0, 2(2-N), 2*3(N-3), 4(4-N), 0, 0,.....]
[0, 0, 3(3-N), 2*4(N-4), 5(5-N), 0,.....]
.....

as the expression for WZ.

Now, if we want the last element in W^-2, we want the last row and the first column of W^-1 and we need no other information about W^-1 except that it exists (and it does). Find a row vector that gives 1 when it acts on the last column of W and gives 0 for all other columns (you might want to write W on paper now, to get a feel for it). We find that the last row in the matrix W^-1 is [1,2,3,4,5...,N-1], you might want to try out W in a few simple cases of N=3,4,5 to convince yourself of this. We also need the first column in the matrix W^-1, that is, we need a column vector that gives 1 when it acts on the first row of W, and 0 when it acts on any other row. The vector is [Z/N, Z/2N, Z/3N, Z/4N, ....., Z/(N(N-1))]^T. Naturally, the last element of the first column and the first element of the last row agree.

Finally, we multiply this last row and first column to get the element of the last row and first column of W^-2 Z/N+Z/N+Z/N+..., and we have a total of N-1 terms, but Z is N(N-1).

So we have (N-1)^2, our final answer. I have a bit more to write about this puzzle, but this seems like quite enough as it is.

Balls In A Bag

I have been pretty consistent in posting just once a week, and thats mostly just been so I don't run out of blogging material, but Matt recently solved a problem that we have been working on for nearly 3 years so I really wanted to post the solution as soon as I was able to work out the wrinkles. This is the problem as I first learned it on the forums at xkcd:

Consider you have a bag filled with N balls, each with a different colour. At each step you will randomly select a ball from the bag, look at its colour, and select another ball from the bag, and repaint the second ball to the colour of the first one. After doing that, place both balls back into the bag and begin a new step.
After doing this enough times, it will eventually happen that all the balls in the bag will have the same colour. On average, how many steps does it take for this to happen?

Doing the problem "properly" is very difficult, and I would only suggest it if you are crazy. However, solving it in the cases of N=2,3,4 is not particularly hard, and from that you will be able to guess what the answer should be.

Equilateral Roads

Time for the solution to the Houses on a Square problem, in spite of the fact that all (both, it seems) of my readers have already solved it.

Not much is gained by building up to the solution, so I'll just sort of blurt it out. I should use pictures here, but rather than figure out how pictures work on blogger, I'd prefer to just use words:

Place a node at the center top and center bottom of the square, connect the top node to each of the top two houses with roads, and connect the bottom node to the bottom two houses with roads and connect the top node and the bottom node to eachother. I assume that you have a picture in your head that looks like a capital i. Now we are going to move the top and bottom nodes, and the edges that terminate on them will move with them. Move the top node down a distance x, and the bottom node up a distance x. Now it is simply a matter of finding the x which optimizes the solution.

The total length as a function of x is

L=1-2x+4√(1/4+x²)

1-2x is the length of the center piece and √(1/4+x²) is the length of one of the diagonals. Maximizing L(x), we find the derivative is

L'(x)= -2+4*x/√(1/4+x²)=0
x=1/(2√3)

Giving a value of L=1+√3. At this point we haven't exactly shown that this is the optimal solution, however, we have shown that it is the optimal solution of this form. Before I go into more about that, first let us consider the rectangular case.

Consider a rectangle, with side lengths A and B. We will assume that A>B for defeniteness, and we will visualize that the longer side is the vertical one (I do this so that when I refer to "the top side" or something, we all have the same picture in mind). Trying the same form of solution as last time, we can see that we want our middle line to be vertical rather than horizontal, it will save more space that way. Letting x once represent the amount that the top node is down from the top, the total road length is given by

L=A-2x+4√(B²/4+x²)

Extremizing this gives that x=B/2√3, and L=A+B√3 (we can see that the length is minimized when A>B rather than B>A, as anticipated). Note that the ratio of x to B is the same in the rectangle case as it is in the square one. This means that the angle that the diagonal line hits the vertical line at is independent of the measurements of the rectangle. In fact, the vertex where the lines meet has angles entirely of 120 degrees. This is not surprising, because if we just had 3 houses we were trying to connect, we would place a point somewhere in the middle of them and have the lines connecting them to the point meet at 120 degrees. There is an exception if the triangle has an internal angle of greater than 120, then it is optimal to just draw straight lines (as the relevant point is outside of the triangle).

Actually, this is a feature of any solution, all angles involved in connections are at least 120 degrees. If you draw a solution with an angle of less than that, you are better to add a point and draw 3 lines connecting to it. I found a paper that talks about this problem at http://www.denison.edu/academics/departments/mathcs/herring.pdf. The full proof is nontrivial, but one conclusion is that to connect N nodes together, you will never need more than N-2 extra nodes, so for the square there are no solutions that involve needing 3 extra nodes, 2 is optimal.

You can also see that for any regular polygon with 6 or more sides, the internal angles are 120 degrees or larger, so it is optimal to just connect along the edges. A pentagon is a bit funny though, since the internal angles are 108 degrees, you do need to find a solution with extra verticies on the inside. The actual solution looks sort of funny, and isn't hard to find after some thinking, anyway, thats all I got on this problem.

Houses On A Square

Guess we once again find ourselves in the "new puzzle" part of the cycle. This particular one I heard a long time ago, can't recall when exactly, but I was recently reminded of it by James:

Consider 4 houses located at the corners of a square with side length 1. We must build roads so that one can drive from any house to any other house along the roads. What is the minimum total length of road that needs to be built?

Just to get you started through the first few layers of solution, you can easily solve the problem with road length 3, connect 3 of the houses along sides of the square, and you have connected them all. You can do better with a 2√2 solution, by drawing in the diagonals of the square, this will connect all the houses to eachother by roads. As amazing as it might seem, you can do even better than that.

As a generalization of this problem, try solving it on a rectangle. You can further generalize it to other shapes, and even arbitrary distributions of points, but I wouldn't suggest it.

Standard Hat Rules Apply

Its All Red

Balls In A Bag

Equilateral Roads

Houses On A Square

Blog Archive

About Me