Its All Red

Time to post the solution to the balls in a bag problem from last time. The solution actually gets pretty crazy, and I actually had to make up a tiny amount of new math (new to me, anyway) in order to justify the solution that Matt came to me with, but all in all it was a good adventure.

First of all, we are going to select a single colour of ball from the bag (let it be called "red") and only watch the progress of that colour. If red is ever eliminated we will declare the experiment a failure and try again, if red is ever the only colour left we ask how many steps it took. Naturally this means that only 1/N of our experiments even end in success, so I will have to account for that at the end, but we will cross that bridge when we come to it. Next, let us define a function F(k,m) as the probability that we are in a state with k red balls after m steps. Naturally F(1,0)=1, F(k,0)=0 for k ≠ 1, and the asymptotic behaviour of F in m is
limm->∞F(N,m)=1/N
limm->∞F(0,m)=(N-1)/N
limm->∞F(k,m)=0 for all other k

In words, this is that our initial condition has exactly 1 red ball, and in the long run we will either have all red balls (probability 1/N) or no red balls (otherwise). It is worth noting that we expect that there are no experiments that run forever (that would be limm->∞F(k,m)=nonzero for k neither N nor 0), that have a nonzero probability. This is to be contrasted with the chance that the expectation value of the path length being infinity, which I have not ruled out yet. Of course, summing F(k,m) over k always gives 1.

Alright, now that we have that, we need to find the recursion relation on F. If we are entering the (m+1)th step, the chance that we get to having k red balls is given by:
F(k,m+1)=[(k-1)/N]*[(N-(k-1))/(N-1)]F(k-1,m)
+[(N-(k+1))/N]*[(k+1)/(N-1)]F(k+1,m)
+[k/N*(k-1)/(N-1)+(N-k)/N*(N-k-1)/(N-1)]F(k,m)

that is, the chance we drew a red then non-red from k-1, plus the chance we drew a non-red then red from k+1 and the chance we drew two red or two non-red from k. This can be written as:
F(k,m+1)=1/(N(N-1))*
[(k-1)(N-k+1)F(k-1,m)+
(N-k-1)(k+1)F(k+1,m)-
2k(N-k)F(k,m)]
+F(k,m)

Why I separated out that F(k,m) term will be apparent later.

Let us now make a series of vectors, Vm with N-1 elements so that the kth element of the vector Vm is given by F(k,m). Then the recursion relation gives us
Vm+1=MVm

and the matrix M is specified by the recursion relation. Note that V only keeps track of having 1 through N-1 red balls left, so information "leaks" out the ends of it and the sum of the elements of V is not conserved. It is also worth noting that Vm tends to the zero vector as m tends to infinity, no matter how the initial data for V looked (that point will turn out to be extremely important later, make sure you convince yourself of it now).

Ok, let us try to get a feel for the matrix M that iterates on V. The diagonal of M is the term that keeps F(k,m) at F(k,m+1), that is to say that the kth element on the diagonal is 1-2k(N-k)/(N(N-1)). There are also off-diagonal terms adjacent to the diagonal, but everything else is zero. For notational convenience, I will denote N(N-1)=Z. For some large N, the first part of the matrix looks like:
[1-2(N-1)/Z, 2(N-2)/Z, 0, 0, 0, ....]
[(N-1)/Z, 1-2*(2(N-2))/Z, 3(N-3)/Z, 0, 0, ....]
[0, 2(N-2)/Z, 1-2*(3(N-3))/Z, 4(N-4)/Z, 0, ....]
....

The kth diagonal element is 1-2*k(N-k)/Z, above and below the kth diagonal element is k(N-k)/Z. Every column adds to 1, except for the first and last columns, which represent probability leaking out the ends.

Now that we have this, what the heck do we do with it. Well, we want to find the probability that we end on the mth step, times m, summed over all m. That gives us the expectation value of what step we end on. The probability that we end on the mth step is the last element of Vm-1 (the penultimate state) times (N-1)/N (the chance we draw a red ball) times 1/(N-1) (the chance we draw the final non-red ball). So we want
Σ m/N [0,0,0,....0,1]Vm-1

summed over m. Actually this is not quite right, since only 1/N of the experiments even end the way we want, we must also multiply by N. Anyway, we know that Vm-1=Mm-1V0, and V0 is given as the column vector [1,0,0,0,...] (which I will write as [1,0,0,0,...]T, for lack of ability to write column vectors).

To put it together, we seek to find
Σ m [0,0,0,....0,1]Mm-1[1,0,0,0,...]T

summed over m=0 to ∞ of course.

Much of that is just constants relative to the sum, and we really just need to evaluate
Σ m Mm-1

Now, if instead of a matrix M, we had a real number, x, we could easily use
d Σ xm/dx = Σ m xm-1

so we would like to say:
Σ m Mm-1=d Σ Mm/dM

Ok, so can we take the derivative with respect to a matrix? Well, its not at all uncommon to do that in physics, we often consider scalar function S(M) as define the derivative (dS/dM)ij as the derivative of S with respect to the ijth component of M. This means dS/dM is a matrix. Following this up, we would expect that if we have a matrix function T(M) with two component indices (such as M2 or M-1), then dT/dM would have four indices. If we define the derivative dT/dM to have two indices by using the definition
dT/dMij=Σ (d/dMik)Tkj

summing over k (as we do in matrix multiplication), then using
(d/dMij)Mklikδ jl

it is easy to show that
dMn/dM=nMn-1

for all integers n (to get the inverses, you must use that d/dM (M-1M)=0).

Alright, so we can now say with confidence that
Σ m Mm-1=d Σ Mm/dM

So now we just need a way to evaluate Σ Mm. Again, if this were a real number x, we would simply say that
Σ xm=1/(1-x)

(as long as |x|<1) so how can we do something similar for a matrix? First of all, let us consider the pth partial sum, Σ Mm summed m from 0 to p. It is easy to see that
(1-M)Σ Mm=1-Mp+1

so as long as we have (1-M) being invertible and limp->∞Mp+1=0, then we can say that Σ Mm=(1-M)-1

Remember when I said that Vm tends to zero no matter what the initial data is? this fact can prove both of the statements that we need. A matrix is really just defined in terms of how it acts on vectors, thus if a matrix obeys the relationship limp->∞MpV=0 for all vectors V, then it means that limp->∞Mp=0, since it really apparently is the zero matrix. We knew from before that this is true for our matrix M. I'll admit that I have not really proven it, but it is apparent from the physics of the problem (people who know me will understand that when I say something is "apparent from the physics of the problem", I really mean "I don't know how to prove this, but it is so obvious from the way things are set up, I feel no need to").

So we can now say with confidence that
(1-M)Σ Mm=1

when we take the sum to infinity. Now, can we invert the matrix (1-M) (which I will call W)? Well, first let us assume that W has no inverse. This means that W must have a null space, that is to say there is a nonzero vector U such that WU=0. This naturally implies that (1-M)U=0 so MU=U, U is an eigenvector with eigenvalue 1. But this means that MpU=U for all p, and so that is even true in the limit of large p, but for large p we have Mp must vanish. This is a contradiction proving that W (=(1-M)) is invertible.

So, earlier, we had seen that the answer to the question we seek was given by
Σ m [0,0,0,....0,1]Mm-1[1,0,0,0,...]T

which we can now rewrite as
[0,0,0,....0,1]W-2[1,0,0,0,...]T

That is, we seek the inverse of the matrix (1-M) and we want the element in the last row and first column in the square of said matrix.

W looks like
[2*(N-1)/Z, 2(2-N)/Z, 0, 0, 0, 0,.....]
[(1-N)/Z, 2*2(N-2)/Z, 3(N-3)/Z, 0, 0, 0,.....]
[0, 2(N-2)/Z, 2*3(N-3)/Z, 4(N-4)/Z, 0, 0,.....]
.....

Z being N(N-1) as before. We can actually just factor out 1/Z now, and get
[2*(N-1), 2(2-N), 0, 0, 0, 0,.....]
[(1-N), 2*2(N-2), 3(3-N), 0, 0, 0,.....]
[0, 2(2-N), 2*3(N-3), 4(4-N), 0, 0,.....]
[0, 0, 3(3-N), 2*4(N-4), 5(5-N), 0,.....]
.....

as the expression for WZ.

Now, if we want the last element in W-2, we want the last row and the first column of W-1 and we need no other information about W-1 except that it exists (and it does). Find a row vector that gives 1 when it acts on the last column of W and gives 0 for all other columns (you might want to write W on paper now, to get a feel for it). We find that the last row in the matrix W-1 is [1,2,3,4,5...,N-1], you might want to try out W in a few simple cases of N=3,4,5 to convince yourself of this. We also need the first column in the matrix W-1, that is, we need a column vector that gives 1 when it acts on the first row of W, and 0 when it acts on any other row. The vector is [Z/N, Z/2N, Z/3N, Z/4N, ....., Z/(N(N-1))]T. Naturally, the last element of the first column and the first element of the last row agree.

Finally, we multiply this last row and first column to get the element of the last row and first column of W-2 Z/N+Z/N+Z/N+..., and we have a total of N-1 terms, but Z is N(N-1).

So we have (N-1)^2, our final answer. I have a bit more to write about this puzzle, but this seems like quite enough as it is.

6 comments:

McAnerbot said...

That was a lot of work.

I told this problem to a friend, and he came back with:

How about this:
Negative binomial distribution.

P = 1/N probability long term 'victory' for each color
R = N-1 failure conditions

E(X) = R(1-p)/p = (N-1)^2

I feel that his treatment is incomplete... but totally produced the right answer.

Kory Stevens said...

I have no idea what you wrote there. I mean, I can see that if P = 1/N and R=N-1 then R(1-P)/P = (N-1)^2, but I do not see how those relate in any way, perhaps you know some secrets about the binomial distribution that I do not.

To summarize, I feel your post was incomplete, but totally produced the right answer.

McAnerbot said...

I worked it out by approaching it stochasticly, my friend without prompting jumped to the negative binomial distribution and got the right answer, which was frustrating.

I also have been doing some thinking along the lines of a 'a given color has a N-1/N chance of not being eliminated first', and then telescoping that series to give a probability that a given color survives to the kth step. Then I realized for that to be helpful, I would need to figure out some relationship between the number of steps each color elimination takes.

Which as previously discussed is hard.

Kory Stevens said...

Upon further consideration, this method is complete bunk. You seem to be talking about number of trials it takes, not number of steps (a trial being one experiment, you perform a trial by performing many steps until all the balls are the same colour). Really, perhaps I am not seeing what you are talking about (your second post totally cleared up exactly nothing), but near as I can tell you basically just got the answer by multiplying stuff until it said (N-1)^2.

McAnerbot said...

*shrug* that's the funny part, i gave the problem to a friend with no background and he came straight up with the answer, which I correctly) thought of as being both right, but incorrectly acquired.

I agree with bunkness.

I'm still annoyed with the consideration of your 'reverse' problem.

I think the (N-1)^2 answer is too easily generated by random other formulas that involve things of the form ~1/N.

Kory Stevens said...

Ah ha! I figured out what you were saying (thanks, Wikipedia). The formula E(X)=R(1-P)/P is if you have an event that happens with probability P and you want R successes, on average how many failures happen?

That is, if we rolled a d6, calling 1 a success, and were going to stop when we had 9 success, on average how many failures to we have? The answer is 9(5/6)/(1/6)=45.

The problem you seemed to solve is if we run the experiment, with all balls turning red being a success, and we want N-1 success (for some reason), on average how many failures will there be? Its (N-1)^2.