Time to post the solution to the
balls in a bag problem from last time. The solution actually gets pretty crazy, and I actually had to make up a tiny amount of new math (new to me, anyway) in order to justify the solution that Matt came to me with, but all in all it was a good adventure.
First of all, we are going to select a single colour of ball from the bag (let it be called "red") and only watch the progress of that colour. If red is ever eliminated we will declare the experiment a failure and try again, if red is ever the only colour left we ask how many steps it took. Naturally this means that only 1/N of our experiments even end in success, so I will have to account for that at the end, but we will cross that bridge when we come to it. Next, let us define a function F(k,m) as the probability that we are in a state with k red balls after m steps. Naturally F(1,0)=1, F(k,0)=0 for k &ne 1, and the asymptotic behaviour of F in m is
limm->&infinF(N,m)=1/N
limm->&infinF(0,m)=(N-1)/N
limm->&infinF(k,m)=0 for all other k
In words, this is that our initial condition has exactly 1 red ball, and in the long run we will either have all red balls (probability 1/N) or no red balls (otherwise). It is worth noting that we expect that there are no experiments that run forever (that would be lim
m->&infinF(k,m)=nonzero for k neither N nor 0), that have a nonzero probability. This is to be contrasted with the chance that the expectation value of the path length being infinity, which I have not ruled out yet. Of course, summing F(k,m) over k always gives 1.
Alright, now that we have that, we need to find the recursion relation on F. If we are entering the (m+1)
th step, the chance that we get to having k red balls is given by:
F(k,m+1)=[(k-1)/N]*[(N-(k-1))/(N-1)]F(k-1,m)
+[(N-(k+1))/N]*[(k+1)/(N-1)]F(k+1,m)
+[k/N*(k-1)/(N-1)+(N-k)/N*(N-k-1)/(N-1)]F(k,m)
that it, the chance we drew a red then non-red from k-1, plus the chance we drew a non-red then red from k+1 and the chance we drew two red or two non-red from k. This can be written as:
F(k,m+1)=1/(N(N-1))*
[(k-1)(N-k+1)F(k-1,m)+
(N-k-1)(k+1)F(k+1,m)-
2k(N-k)F(k,m)]
+F(k,m)
Why I separated out that F(k,m) term will be apparent later.
Let us now make a series of vectors, V
m with N-1 elements so that the k
th element of the vector V
m is given by F(k,m). Then the recursion relation gives us
Vm+1=MVm
and the matrix M is specified by the recursion relation. Note that V only keeps track of having 1 through N-1 red balls left, so information "leaks" out the ends of it and the sum of the elements of V is not conserved. It is also worth noting that V
m tends to the zero vector as m tends to infinity, no matter how the initial data for V looked (that point will turn out to be extremely important later, make sure you convince yourself of it now).
Ok, let us try to get a feel for the matrix M that iterates on V. The diagonal of M is the term that keeps F(k,m) at F(k,m+1), that is to say that the k
th element on the diagonal is 1-2k(N-k)/(N(N-1)). There are also off-diagonal terms adjacent to the diagonal, but everything else is zero. For notational convenience, I will denote N(N-1)=Z. For some large N, the first part of the matrix looks like:
[1-2(N-1)/Z, 2(N-2)/Z, 0, 0, 0, ....]
[(N-1)/Z, 1-2*(2(N-2))/Z, 3(N-3)/Z, 0, 0, ....]
[0, 2(N-2)/Z, 1-2*(3(N-3))/Z, 4(N-4)/Z, 0, ....]
....
The k
th diagonal element is 1-2*k(N-k)/Z, above and below the k
th diagonal element is k(N-k)/Z. Every column adds to 1, except for the first and last columns, which represent probability leaking out the ends.
Now that we have this, what the heck do we do with it. Well, we want to find the probability that we end on the m
th step, times m, summed over all m. That gives us the expectation value of what step we end on. The probability that we end on the m
th step is the last element of V
m-1 (the penultimate state) times (N-1)/N (the chance we draw a red ball) times 1/(N-1) (the chance we draw the final non-red ball). So we want
&Sigma m/N [0,0,0,....0,1]Vm-1
summed over m. Actually this is not quite right, since only 1/N of the experiments even end the way we want, we must also multiply by N. Anyway, we know that V
m-1=M
m-1V
0, and V
0 is given as the column vector [1,0,0,0,...] (which I will write as [1,0,0,0,...]
T, for lack of ability to write column vectors).
To put it together, we seek to find
&Sigma m [0,0,0,....0,1]Mm-1[1,0,0,0,...]T
summed over m=0 to &infin of course.
Much of that is just constants relative to the sum, and we really just need to evaluate
&Sigma m Mm-1
Now, if instead of a matrix M, we had a real number, x, we could easily use
d &Sigma xm/dx = &Sigma m xm-1
so we would like to say:
&Sigma m Mm-1=d &Sigma Mm/dM
Ok, so can we take the derivative with respect to a matrix? Well, its not at all uncommon to do that in physics, we often consider scalar function S(M) as define the derivative (dS/dM)
ij as the derivative of S with respect to the ij
th component of M. This means dS/dM is a matrix. Following this up, we would expect that if we have a matrix function T(M) with two component indices (such as M
2 or M
-1), then dT/dM would have four indices. If we define the derivative dT/dM to have two indices by using the definition
dT/dMij=&Sigma (d/dMik)Tkj
summing over k (as we do in matrix multiplication), then using
(d/dMij)Mkl=&delta ik&delta jl
it is easy to show that
dMn/dM=nMn-1
for all integers n (to get the inverses, you must use that d/dM (M
-1M)=0).
Alright, se we can now say with confidence that
&Sigma m Mm-1=d &Sigma Mm/dM
So now we just need a way to evaluate &Sigma M
m. Again, if this were a real number x, we would simply say that
&Sigma xm=1/(1-x)
(as long as |x|<1) so how can we do something similar for a matrix?
First of all, let us consider the p
th partial sum, &Sigma M
m summed m from 0 to p. It is easy to see that
(1-M)&Sigma Mm=1-Mp+1
so as long as we have (1-M) being invertible and lim
p->&infinM
p+1=0, then we can say that &Sigma M
m=(1-M)
-1Remember when I said that V
m tends to zero no matter what the initial data is? this fact can prove both of the statements that we need. A matrix is really just defined in terms of how it acts on vectors (this is an important point, understand it), thus if a matrix obeys the relationship lim
p->&infinM
pV=0 for all vectors V, then it means that lim
p->&infinM
p=0, since it really apparently is the zero matrix. We knew from before that this is true for our matrix M. I'll admit that I have not really proven it, but it is apparent from the physics of the problem (people who know me will understand that when I say something is "apparent from the physics of the problem", I really mean "I don't know how to prove this, but it is so obvious from the way things are set up, I feel no need to").
So we can now say with confidence that
(1-M)&Sigma Mm=1
when we take the sum to infinity. Now, can we invert the matrix (1-M) (which I will call W)? Well, first let us assume that W has no inverse. This means that W must have a null space, that is to say there is a nonzero vector U such that WU=0. This naturally implies that (1-M)U=0 so MU=U, U is an eigenvector with eigenvalue 1. But this means that M
pU=U for all p, and so that is even true in the limit of large p, but for large p we have M
p must vanish. This is a contradiction proving that W (=(1-M)) is invertible.
So, earlier, we had seen that the answer to the question we seek was given by
&Sigma m [0,0,0,....0,1]Mm-1[1,0,0,0,...]T
which we can now rewrite as
[0,0,0,....0,1]W-2[1,0,0,0,...]T
That is, we seek the inverse of the matrix (1-M) and we want the element in the last row and first column in the square of said matrix.
W looks like
[2*(N-1)/Z, 2(2-N)/Z, 0, 0, 0, 0,.....]
[(1-N)/Z, 2*2(N-2)/Z, 3(N-3)/Z, 0, 0, 0,.....]
[0, 2(N-2)/Z, 2*3(N-3)/Z, 4(N-4)/Z, 0, 0,.....]
.....
Z being N(N-1) as before. We can actually just factor out 1/Z now, and get
[2*(N-1), 2(2-N), 0, 0, 0, 0,.....]
[(1-N), 2*2(N-2), 3(3-N), 0, 0, 0,.....]
[0, 2(2-N), 2*3(N-3), 4(4-N), 0, 0,.....]
[0, 0, 3(3-N), 2*4(N-4), 5(5-N), 0,.....]
.....
as the expression for WZ.
Now, if we want the last element in W
-2, we want the last row and the first column of W
-1 and we need no other information about W
-1 except that it exists (and it does). Find a row vector that gives 1 when it acts on the last column of W and gives 0 for all other columns (you might want to write W on paper now, to get a feel for it). We find that the last row in the matrix W
-1 is [1,2,3,4,5...,N-1], you might want to try out W in a few simple cases of N=3,4,5 to convince yourself of this. We also need the first column in the matrix W
-1, that is, we need a column vector that gives 1 when it acts on the first row of W, and 0 when it acts on any other row. The vector is [Z/N, Z/2N, Z/3N, Z/4N, ....., Z/(N(N-1))]
T. Naturally, the last element of the first column and the first element of the last row agree.
Finally, we multiply this last row and first column to get the element of the last row and first column of W
-2 Z/N+Z/N+Z/N+..., and we have a total of N-1 terms, but Z is N(N-1).
So we have (N-1)^2, our final answer. I have a bit more to write about this puzzle, but this seems like quite enough as it is.