First of all, we are going to select a single colour of ball from the bag (let it be called "red") and only watch the progress of that colour. If red is ever eliminated we will declare the experiment a failure and try again, if red is ever the only colour left we ask how many steps it took. Naturally this means that only 1/N of our experiments even end in success, so I will have to account for that at the end, but we will cross that bridge when we come to it. Next, let us define a function F(k,m) as the probability that we are in a state with k red balls after m steps. Naturally F(1,0)=1, F(k,0)=0 for k ≠ 1, and the asymptotic behaviour of F in m is
limm->∞F(N,m)=1/N
limm->∞F(0,m)=(N-1)/N
limm->∞F(k,m)=0 for all other k
In words, this is that our initial condition has exactly 1 red ball, and in the long run we will either have all red balls (probability 1/N) or no red balls (otherwise). It is worth noting that we expect that there are no experiments that run forever (that would be limm->∞F(k,m)=nonzero for k neither N nor 0), that have a nonzero probability. This is to be contrasted with the chance that the expectation value of the path length being infinity, which I have not ruled out yet. Of course, summing F(k,m) over k always gives 1.
Alright, now that we have that, we need to find the recursion relation on F. If we are entering the (m+1)th step, the chance that we get to having k red balls is given by:
F(k,m+1)=[(k-1)/N]*[(N-(k-1))/(N-1)]F(k-1,m)
+[(N-(k+1))/N]*[(k+1)/(N-1)]F(k+1,m)
+[k/N*(k-1)/(N-1)+(N-k)/N*(N-k-1)/(N-1)]F(k,m)
that is, the chance we drew a red then non-red from k-1, plus the chance we drew a non-red then red from k+1 and the chance we drew two red or two non-red from k. This can be written as:
F(k,m+1)=1/(N(N-1))*
[(k-1)(N-k+1)F(k-1,m)+
(N-k-1)(k+1)F(k+1,m)-
2k(N-k)F(k,m)]
+F(k,m)
Why I separated out that F(k,m) term will be apparent later.
Let us now make a series of vectors, Vm with N-1 elements so that the kth element of the vector Vm is given by F(k,m). Then the recursion relation gives us
Vm+1=MVm
and the matrix M is specified by the recursion relation. Note that V only keeps track of having 1 through N-1 red balls left, so information "leaks" out the ends of it and the sum of the elements of V is not conserved. It is also worth noting that Vm tends to the zero vector as m tends to infinity, no matter how the initial data for V looked (that point will turn out to be extremely important later, make sure you convince yourself of it now).
Ok, let us try to get a feel for the matrix M that iterates on V. The diagonal of M is the term that keeps F(k,m) at F(k,m+1), that is to say that the kth element on the diagonal is 1-2k(N-k)/(N(N-1)). There are also off-diagonal terms adjacent to the diagonal, but everything else is zero. For notational convenience, I will denote N(N-1)=Z. For some large N, the first part of the matrix looks like:
[1-2(N-1)/Z, 2(N-2)/Z, 0, 0, 0, ....]
[(N-1)/Z, 1-2*(2(N-2))/Z, 3(N-3)/Z, 0, 0, ....]
[0, 2(N-2)/Z, 1-2*(3(N-3))/Z, 4(N-4)/Z, 0, ....]
....
The kth diagonal element is 1-2*k(N-k)/Z, above and below the kth diagonal element is k(N-k)/Z. Every column adds to 1, except for the first and last columns, which represent probability leaking out the ends.
Now that we have this, what the heck do we do with it. Well, we want to find the probability that we end on the mth step, times m, summed over all m. That gives us the expectation value of what step we end on. The probability that we end on the mth step is the last element of Vm-1 (the penultimate state) times (N-1)/N (the chance we draw a red ball) times 1/(N-1) (the chance we draw the final non-red ball). So we want
Σ m/N [0,0,0,....0,1]Vm-1
summed over m. Actually this is not quite right, since only 1/N of the experiments even end the way we want, we must also multiply by N. Anyway, we know that Vm-1=Mm-1V0, and V0 is given as the column vector [1,0,0,0,...] (which I will write as [1,0,0,0,...]T, for lack of ability to write column vectors).
To put it together, we seek to find
Σ m [0,0,0,....0,1]Mm-1[1,0,0,0,...]T
summed over m=0 to ∞ of course.
Much of that is just constants relative to the sum, and we really just need to evaluate
Σ m Mm-1
Now, if instead of a matrix M, we had a real number, x, we could easily use
d Σ xm/dx = Σ m xm-1
so we would like to say:
Σ m Mm-1=d Σ Mm/dM
Ok, so can we take the derivative with respect to a matrix? Well, its not at all uncommon to do that in physics, we often consider scalar function S(M) as define the derivative (dS/dM)ij as the derivative of S with respect to the ijth component of M. This means dS/dM is a matrix. Following this up, we would expect that if we have a matrix function T(M) with two component indices (such as M2 or M-1), then dT/dM would have four indices. If we define the derivative dT/dM to have two indices by using the definition
dT/dMij=Σ (d/dMik)Tkj
summing over k (as we do in matrix multiplication), then using
(d/dMij)Mkl=δ ikδ jl
it is easy to show that
dMn/dM=nMn-1
for all integers n (to get the inverses, you must use that d/dM (M-1M)=0).
Alright, so we can now say with confidence that
Σ m Mm-1=d Σ Mm/dM
So now we just need a way to evaluate Σ Mm. Again, if this were a real number x, we would simply say that
Σ xm=1/(1-x)
(as long as |x|<1) so how can we do something similar for a matrix? First of all, let us consider the pth partial sum, Σ Mm summed m from 0 to p. It is easy to see that
(1-M)Σ Mm=1-Mp+1
so as long as we have (1-M) being invertible and limp->∞Mp+1=0, then we can say that Σ Mm=(1-M)-1
Remember when I said that Vm tends to zero no matter what the initial data is? this fact can prove both of the statements that we need. A matrix is really just defined in terms of how it acts on vectors, thus if a matrix obeys the relationship limp->∞MpV=0 for all vectors V, then it means that limp->∞Mp=0, since it really apparently is the zero matrix. We knew from before that this is true for our matrix M. I'll admit that I have not really proven it, but it is apparent from the physics of the problem (people who know me will understand that when I say something is "apparent from the physics of the problem", I really mean "I don't know how to prove this, but it is so obvious from the way things are set up, I feel no need to").
So we can now say with confidence that
(1-M)Σ Mm=1
when we take the sum to infinity. Now, can we invert the matrix (1-M) (which I will call W)? Well, first let us assume that W has no inverse. This means that W must have a null space, that is to say there is a nonzero vector U such that WU=0. This naturally implies that (1-M)U=0 so MU=U, U is an eigenvector with eigenvalue 1. But this means that MpU=U for all p, and so that is even true in the limit of large p, but for large p we have Mp must vanish. This is a contradiction proving that W (=(1-M)) is invertible.
So, earlier, we had seen that the answer to the question we seek was given by
Σ m [0,0,0,....0,1]Mm-1[1,0,0,0,...]T
which we can now rewrite as
[0,0,0,....0,1]W-2[1,0,0,0,...]T
That is, we seek the inverse of the matrix (1-M) and we want the element in the last row and first column in the square of said matrix.
W looks like
[2*(N-1)/Z, 2(2-N)/Z, 0, 0, 0, 0,.....]
[(1-N)/Z, 2*2(N-2)/Z, 3(N-3)/Z, 0, 0, 0,.....]
[0, 2(N-2)/Z, 2*3(N-3)/Z, 4(N-4)/Z, 0, 0,.....]
.....
Z being N(N-1) as before. We can actually just factor out 1/Z now, and get
[2*(N-1), 2(2-N), 0, 0, 0, 0,.....]
[(1-N), 2*2(N-2), 3(3-N), 0, 0, 0,.....]
[0, 2(2-N), 2*3(N-3), 4(4-N), 0, 0,.....]
[0, 0, 3(3-N), 2*4(N-4), 5(5-N), 0,.....]
.....
as the expression for WZ.
Now, if we want the last element in W-2, we want the last row and the first column of W-1 and we need no other information about W-1 except that it exists (and it does). Find a row vector that gives 1 when it acts on the last column of W and gives 0 for all other columns (you might want to write W on paper now, to get a feel for it). We find that the last row in the matrix W-1 is [1,2,3,4,5...,N-1], you might want to try out W in a few simple cases of N=3,4,5 to convince yourself of this. We also need the first column in the matrix W-1, that is, we need a column vector that gives 1 when it acts on the first row of W, and 0 when it acts on any other row. The vector is [Z/N, Z/2N, Z/3N, Z/4N, ....., Z/(N(N-1))]T. Naturally, the last element of the first column and the first element of the last row agree.
Finally, we multiply this last row and first column to get the element of the last row and first column of W-2 Z/N+Z/N+Z/N+..., and we have a total of N-1 terms, but Z is N(N-1).
So we have (N-1)^2, our final answer. I have a bit more to write about this puzzle, but this seems like quite enough as it is.