Standard Hat Rules Apply: 2009

Walking In Puzzletown

Figure its about time to finish of the discussion of the follow-up puzzle for the puzzletown problem, and of course, by discussion I mean my typing and you reading.

The basic solution was posted in the comments last time, it has to do with infinite family sizes. Consider each family measuring the variable #girls-#boys, which I will call x. At any given time, x is performing a random walk on the integers, increasing by 1 with probability 1/2 and decreasing by 1 with probability 1/2. The random walk starts at 0 and stops if it ever hits -1, on average how many steps does the random walk run for?

Initially, I tried to let F(x) be the number of steps it takes to get from x to -1 on average, write down the recursion relation, and solve. This sort of fails, and it is related to the fact that F(x) is actually infinite everywhere in our problem, so you have started with something of a non-function.

In an attempt to make the function converge, let us consider a random walk on the integers between a and b, starting at some point and stopping when we ever hit either a or b. In effect, a=-1 and b is some large number where the family will "give up" if the girls outnumber the boys by at least b.

Let f(x) be the average number of steps it takes to get from x to either a or b (I initially defined this as f(a,x,b), but that was far too cumbersome). Clearly we have f(a)=0=f(b), and we also have the recursion relation:

f(x)=1/2 (f(x+1)+1)+1/2 (f(x-1)+1)

That is, half the time f(x) is 1 more than f(x-1) and the other half the time it is 1 more than f(x+1). This can be written as:

f(x)-f(x-1)=f(x+1)-f(x)+2

So, defining d(x) as the derivative function, d(x)=f(x+1)-f(x), we have

d(x+1)-d(x)=-2

So, it appears that the second derivative of f is -2, giving the impression f is a quadratic with leading coefficient -1. Anyway, lets see what it actually is. Our last equation tells us:

d(x)=d(a)-2(x-a)

I didn't have to use 'a' for my constant in that equation, I could have used any value, but 'a' made the most sense.

Next, f is the "integral" of d:

f(x)=f(a)+Σ d(n)
=Σ (d(a)-2(n-a))

The sum over n runs from a to x-1. You might expect it to run to x (an intuition you get from integrals), but you can confirm that running to x-1 is the correct expression, from d(x)=f(x+1)-f(x). Naturally, f(a)=0 is one of our boundary conditions.

Performing the sum, we get:

f(x)=(x-a)(d(a)-2a)-2Σ n

The sum &Sigma n from a to x-1 can be broken into two sums, from 1 to x-1 and from 1 to a-1, giving x(x-1)/2-a(a-1)/2, so that

f(x)=(x-a)(d(a)-2a)-x(x-1)+a(a-1)

So, indeed f is a quadratic with leading coefficient -1. We know that f(b) must be 0, so that:

0=(b-a)(d(a)-2a)-b(b-1)+a(a-1)
(d(a)-2a)(b-a)=b²-a²-(b-a)=(b+a-1)(b-a)
d(a)-2a=b+a-1

Might be worth noting that I assumed (b-a) was nonzero there, so this whole thing collapses if we don't have that.

Anyway, new we can see that f is given by

f(x)=(x-a)(b+a-1)-x(x-1)+a(a-1)
=x(b+a)-x²-ab
=(b-x)(x-a)

Demonstrating the (well known) result that if you have a random walk between -a and b starting at 0, the average time it takes is ab.

Anyway, if we have a family that will stop having children when they have more boys than girls, or they will quit if the girls outnumber the boys by 100, the average family size is 100. If instead they will give up when the girls outnumber by a million, then the average family size is a million. If they will never give up, then the average family size diverges to infinity.

This means that the average family size in our puzzle scenario is infinite. Had we worked out that the average family was size n, it would have (n+1)/2 boys and (n-1)/2 girls (every family has exactly one more boy than girl). And the fraction of children that are boys would be (n+1)/(2n). As n tends to infinity, this is 1/2.
Some might say that it is absurd to believe that the average family size is infinity, you can run this experiment with as many families as you like, and never once will you find that the average family size is infinity. Whats more, you will never end the experiment with a town were the boys are exactly 1/2 of the children, they will always be more than 1/2 of them. This is quite correct, and actually raises an interesting point about probability.

When we say that something happens with probability 1/2, we do not mean it will happen exactly 1/2 the time. We really mean that if you were to perform the experiment n times, and you found that you got k occurrences of the event, in the limit as n goes to infinity, k/n would go to 1/2. For example, consider the first problem, when families stopped when they had 1 boy. If you ran this with some number of families, you probably would not get that exactly half the children were boys (they might be, but that would be a coincidence, really). You would get something reasonably close to 1/2, but probably not 1/2. All you can do is find the fraction of children that are boys when you have n families, and then take the limit as n goes to infinity. I will say with confidence that that will tend to 1/2 (subject to replicating the conditions of the problem, odds are that humans actually do not produce each gender equally likely).

In this problem you have something similar, if you run the experiment with any finite number of families, you will not get that boys are 1/2 the population, they will always be slightly greater than 1/2. But if you take more and more families for the experiment, you will find that the fraction of children that are boys gets closer and closer to 1/2. In the limit as the number of families goes to infinity, the fraction will tend to exactly 1/2. There is nothing wrong with a sequence of numbers that is strictly greater than 1/2 tending to 1/2, that happens all the time, and it is happening here.

I would also like to mention that this "n goes to infinity" interpretation of probability is also important when we talk about something happening with probability zero or probability one. If a mathematician says an event in an experiment will happen with probability zero, they do not mean that it cannot happen (though it might mean that), they mean that if you were to try the experiment n times and measure k occurrences of the event, that k/n will get arbitrarily small as n goes to infinity. So, for example, if you were to spend $1 per experiment and won $X every time the event occurred, no matter how large you set $X to, you will wind up losing money in the long run. They also call this probability zero because they have nothing else to call it. There is no positive number that is small enough to correctly represent the situation, and for most anything that matters in probability, we always have to clarify things by saying "after a large number of trials" anyway, so zero is the most correct number we can use to describe the probability of such events.

The Children In Puzzletown

OK, time for the simple solution to the puzzletown problem. It was an interesting problem to pose to my officemate, who immediately thought that there would be more boys than girls (every family has at least one boy, after all), but then he realized that since the parents stop as soon as they have one boy, they will never have two boys, but sometimes they will have multiple girls before having a boy. Naturally this led him to conclude that there would be more girls than boys. Lets see what the math gives us.

First, since every family has exactly one boy, we need only ask how many girls a family has on average. We could similarly ask what is the average family size. A family is size 1 with probability 1/2 (they had a boy right away) size 2 with probability 1/4 (girl then boy) size 3 with probability 1/8 (GGB) and size N with probability 1/2^N. The average family size is therefore

Σ N/2^N

N summed 1 to infinity.
A sum of the form

Σ N/x^N+1

Is just

d/dx Σ -1/x^N
=d/dx Σ -(1/x)^N
=-d/dx 1/x*1/(1-1/x))
=d/dx 1/(1-x)
=1/(1-x)²

(If my sum confused you, remember the sum is running 1 to infinity, not 0 to infinity). So the average family size is

2 Σ N/2^N+1
=2

So, an average family has 1 girl and 1 boy. Thus the ratio is 1:1.
When this was posted on the xkcd forums, some people said that this answer is obvious, that since each child has expectation 1:1 value of boy:girl ratio, that no matter what criterion the parents use to decide to stop having children, the ratio will be 1:1, simply by linearity of expectation. This argument is quite correct, and is quite similar to the logic from the card flipping problem some time ago.

Some people on the forums then refuted this method of arriving at an answer by posing the following problem:

Each family will have children until they have had more male children than female children, then they will stop having children. On a given birth, it is equally likely that the child will be a boy or a girl. What is the ratio of boys to girls among the children in puzzletown?

This is a slight difference, as now every family will be guaranteed to have more boys than girls. Since it is true for each family, it must be true for the average of the families. One might ask the question of "what if they keep having children and they never have more boys than girls?" however this is not an issue, random walk theory will guarantee that given enough children, every family will at some point have more boys than girls.

The question here is, in this new problem, what happened to the linearity of expectation argument? Every family will have more boys than girls for certain. Therefore, the town will have more boys than girls, what went wrong?

The People In Puzzletown

Time for a new problem. This one is not very difficult, you should be able to solve it in a matter of minutes, but it has a pretty neat follow-up, so I want to post it. I first learned this on the forums at xkcd.

The parents in puzzletown all want to have boys. Each family will have children until they have had at least one male child, then they will stop having children. On a given birth, it is equally likely that the child will be a boy or a girl. Assuming a large number of families, what is the ratio of boys to girls among the children in puzzletown?

Forward Is Backward

Some time ago, I had presented Jon with the balls in a bag puzzle, and he had suggested the following reverse puzzle:

Consider a bag with N red balls. At each step, you select two balls from the bag and if they are the same colour, you repaint one of them a new unique colour. If they are different, then do nothing. Either way, place the two balls back in the bag and begin a new step. Given long enough, eventually all the balls in the bag will be of different colours, on average how many steps does it take?

Its clear that this is some sort of reverse of the original problem, however, it is not clear that this is actually the "best" reverse. One could also give the newly coloured balls "random colours from the list of N colours" instead of new unique ones, but it turns out this problem is much more awful.
Anyway, I'm going to just solve this reverse problem on the spot, it seems like the thing to do. Let F(k) be the average number of steps it takes to get from a state with k red balls to a state with one red ball. Note that knowing the number of red balls totally specifies a state, since all non-red balls are of unique colours. F(1)=0 trivially, and we seek F(N), the state with N red balls. The probability that we move from a state with k red balls to a state with k-1 red balls (I will call this P(k)) is the chance that we draw exactly two red balls from the bag, that is:

P(k)=k/N*(k-1)/(N-1)

Of course, if something happens with probability p, then on average we need 1/p tries to have it happen once (I proved this back in the post where I gave the solution to the prisoner problem). Naturally then, we have:

F(k)-F(k-1)=(N-1)N/(k-1)k

and since F(1)=0, we can just express F(m) as F(m)-F(m-1)+F(m-1)-F(m-2)+...+F(2)-F(1), to give

F(m)=Σ N(N-1)/k(k-1)

summing k from 2 to m.
Therefore, for F(N) we have:

N(N-1)Σ 1/k(k-1)
=N(N-1)Σ [1/(k-1)-1/k]

summed k from 2 to N. This series telescopes, the terms look like 1/1-1/2+1/2-1/3+1/3-1/4+...-1/(N-1)+1/(N-1)-1/N. The final result is then that:

F(N)=N(N-1)[1-1/N]
=N(N-1)[(N-1)/N]
=(N-1)^2

Weird.
Alright, so this (much easier) problem has the same answer as the "forward" ball problem. I guess in some sense it is the correct reversal of the initial problem, since it has the same answer. I would love to find a way to show that these two problems have the same answer (that does not rely on using the fact that they both give (N-1)^2), so that way we can solve the harder one just by doing the easier one. Anyway, thats all I got here, if you can find a way to show that these problems are equivalent, feel free to give it in the comments.

Its All Red

Time to post the solution to the balls in a bag problem from last time. The solution actually gets pretty crazy, and I actually had to make up a tiny amount of new math (new to me, anyway) in order to justify the solution that Matt came to me with, but all in all it was a good adventure.

First of all, we are going to select a single colour of ball from the bag (let it be called "red") and only watch the progress of that colour. If red is ever eliminated we will declare the experiment a failure and try again, if red is ever the only colour left we ask how many steps it took. Naturally this means that only 1/N of our experiments even end in success, so I will have to account for that at the end, but we will cross that bridge when we come to it. Next, let us define a function F(k,m) as the probability that we are in a state with k red balls after m steps. Naturally F(1,0)=1, F(k,0)=0 for k ≠ 1, and the asymptotic behaviour of F in m is

lim_m->∞F(N,m)=1/N
lim_m->∞F(0,m)=(N-1)/N
lim_m->∞F(k,m)=0 for all other k

In words, this is that our initial condition has exactly 1 red ball, and in the long run we will either have all red balls (probability 1/N) or no red balls (otherwise). It is worth noting that we expect that there are no experiments that run forever (that would be lim_m->∞F(k,m)=nonzero for k neither N nor 0), that have a nonzero probability. This is to be contrasted with the chance that the expectation value of the path length being infinity, which I have not ruled out yet. Of course, summing F(k,m) over k always gives 1.

Alright, now that we have that, we need to find the recursion relation on F. If we are entering the (m+1)^th step, the chance that we get to having k red balls is given by:

F(k,m+1)=[(k-1)/N]*[(N-(k-1))/(N-1)]F(k-1,m)
+[(N-(k+1))/N]*[(k+1)/(N-1)]F(k+1,m)
+[k/N*(k-1)/(N-1)+(N-k)/N*(N-k-1)/(N-1)]F(k,m)

that is, the chance we drew a red then non-red from k-1, plus the chance we drew a non-red then red from k+1 and the chance we drew two red or two non-red from k. This can be written as:

F(k,m+1)=1/(N(N-1))*
[(k-1)(N-k+1)F(k-1,m)+
(N-k-1)(k+1)F(k+1,m)-
2k(N-k)F(k,m)]
+F(k,m)

Why I separated out that F(k,m) term will be apparent later.

Let us now make a series of vectors, V_m with N-1 elements so that the k^th element of the vector V_m is given by F(k,m). Then the recursion relation gives us

V_m+1=MV_m

and the matrix M is specified by the recursion relation. Note that V only keeps track of having 1 through N-1 red balls left, so information "leaks" out the ends of it and the sum of the elements of V is not conserved. It is also worth noting that V_m tends to the zero vector as m tends to infinity, no matter how the initial data for V looked (that point will turn out to be extremely important later, make sure you convince yourself of it now).

Ok, let us try to get a feel for the matrix M that iterates on V. The diagonal of M is the term that keeps F(k,m) at F(k,m+1), that is to say that the k^th element on the diagonal is 1-2k(N-k)/(N(N-1)). There are also off-diagonal terms adjacent to the diagonal, but everything else is zero. For notational convenience, I will denote N(N-1)=Z. For some large N, the first part of the matrix looks like:

[1-2(N-1)/Z, 2(N-2)/Z, 0, 0, 0, ....]
[(N-1)/Z, 1-2*(2(N-2))/Z, 3(N-3)/Z, 0, 0, ....]
[0, 2(N-2)/Z, 1-2*(3(N-3))/Z, 4(N-4)/Z, 0, ....]
....

The k^th diagonal element is 1-2*k(N-k)/Z, above and below the k^th diagonal element is k(N-k)/Z. Every column adds to 1, except for the first and last columns, which represent probability leaking out the ends.

Now that we have this, what the heck do we do with it. Well, we want to find the probability that we end on the m^th step, times m, summed over all m. That gives us the expectation value of what step we end on. The probability that we end on the m^th step is the last element of V_m-1 (the penultimate state) times (N-1)/N (the chance we draw a red ball) times 1/(N-1) (the chance we draw the final non-red ball). So we want

Σ m/N [0,0,0,....0,1]V_m-1

summed over m. Actually this is not quite right, since only 1/N of the experiments even end the way we want, we must also multiply by N. Anyway, we know that V_m-1=M^m-1V₀, and V₀ is given as the column vector [1,0,0,0,...] (which I will write as [1,0,0,0,...]^T, for lack of ability to write column vectors).

To put it together, we seek to find

Σ m [0,0,0,....0,1]M^m-1[1,0,0,0,...]^T

summed over m=0 to ∞ of course.

Much of that is just constants relative to the sum, and we really just need to evaluate

Σ m M^m-1

Now, if instead of a matrix M, we had a real number, x, we could easily use

d Σ x^m/dx = Σ m x^m-1

so we would like to say:

Σ m M^m-1=d Σ M^m/dM

Ok, so can we take the derivative with respect to a matrix? Well, its not at all uncommon to do that in physics, we often consider scalar function S(M) as define the derivative (dS/dM)_ij as the derivative of S with respect to the ij^th component of M. This means dS/dM is a matrix. Following this up, we would expect that if we have a matrix function T(M) with two component indices (such as M² or M^-1), then dT/dM would have four indices. If we define the derivative dT/dM to have two indices by using the definition

dT/dM_ij=Σ (d/dM_ik)T_kj

summing over k (as we do in matrix multiplication), then using

(d/dM_ij)M_kl=δ _ikδ _jl

it is easy to show that

dMⁿ/dM=nM^n-1

for all integers n (to get the inverses, you must use that d/dM (M^-1M)=0).

Alright, so we can now say with confidence that

Σ m M^m-1=d Σ M^m/dM

So now we just need a way to evaluate Σ M^m. Again, if this were a real number x, we would simply say that

Σ x^m=1/(1-x)

(as long as |x|<1) so how can we do something similar for a matrix? First of all, let us consider the p^th partial sum, Σ M^m summed m from 0 to p. It is easy to see that

(1-M)Σ M^m=1-M^p+1

so as long as we have (1-M) being invertible and lim_p->∞M^p+1=0, then we can say that Σ M^m=(1-M)^-1

Remember when I said that V_m tends to zero no matter what the initial data is? this fact can prove both of the statements that we need. A matrix is really just defined in terms of how it acts on vectors, thus if a matrix obeys the relationship lim_p->∞M^pV=0 for all vectors V, then it means that lim_p->∞M^p=0, since it really apparently is the zero matrix. We knew from before that this is true for our matrix M. I'll admit that I have not really proven it, but it is apparent from the physics of the problem (people who know me will understand that when I say something is "apparent from the physics of the problem", I really mean "I don't know how to prove this, but it is so obvious from the way things are set up, I feel no need to").

So we can now say with confidence that

(1-M)Σ M^m=1

when we take the sum to infinity. Now, can we invert the matrix (1-M) (which I will call W)? Well, first let us assume that W has no inverse. This means that W must have a null space, that is to say there is a nonzero vector U such that WU=0. This naturally implies that (1-M)U=0 so MU=U, U is an eigenvector with eigenvalue 1. But this means that M^pU=U for all p, and so that is even true in the limit of large p, but for large p we have M^p must vanish. This is a contradiction proving that W (=(1-M)) is invertible.

So, earlier, we had seen that the answer to the question we seek was given by

Σ m [0,0,0,....0,1]M^m-1[1,0,0,0,...]^T

which we can now rewrite as

[0,0,0,....0,1]W^-2[1,0,0,0,...]^T

That is, we seek the inverse of the matrix (1-M) and we want the element in the last row and first column in the square of said matrix.

W looks like

[2*(N-1)/Z, 2(2-N)/Z, 0, 0, 0, 0,.....]
[(1-N)/Z, 2*2(N-2)/Z, 3(N-3)/Z, 0, 0, 0,.....]
[0, 2(N-2)/Z, 2*3(N-3)/Z, 4(N-4)/Z, 0, 0,.....]
.....

Z being N(N-1) as before. We can actually just factor out 1/Z now, and get

[2*(N-1), 2(2-N), 0, 0, 0, 0,.....]
[(1-N), 2*2(N-2), 3(3-N), 0, 0, 0,.....]
[0, 2(2-N), 2*3(N-3), 4(4-N), 0, 0,.....]
[0, 0, 3(3-N), 2*4(N-4), 5(5-N), 0,.....]
.....

as the expression for WZ.

Now, if we want the last element in W^-2, we want the last row and the first column of W^-1 and we need no other information about W^-1 except that it exists (and it does). Find a row vector that gives 1 when it acts on the last column of W and gives 0 for all other columns (you might want to write W on paper now, to get a feel for it). We find that the last row in the matrix W^-1 is [1,2,3,4,5...,N-1], you might want to try out W in a few simple cases of N=3,4,5 to convince yourself of this. We also need the first column in the matrix W^-1, that is, we need a column vector that gives 1 when it acts on the first row of W, and 0 when it acts on any other row. The vector is [Z/N, Z/2N, Z/3N, Z/4N, ....., Z/(N(N-1))]^T. Naturally, the last element of the first column and the first element of the last row agree.

Finally, we multiply this last row and first column to get the element of the last row and first column of W^-2 Z/N+Z/N+Z/N+..., and we have a total of N-1 terms, but Z is N(N-1).

So we have (N-1)^2, our final answer. I have a bit more to write about this puzzle, but this seems like quite enough as it is.

Balls In A Bag

I have been pretty consistent in posting just once a week, and thats mostly just been so I don't run out of blogging material, but Matt recently solved a problem that we have been working on for nearly 3 years so I really wanted to post the solution as soon as I was able to work out the wrinkles. This is the problem as I first learned it on the forums at xkcd:

Consider you have a bag filled with N balls, each with a different colour. At each step you will randomly select a ball from the bag, look at its colour, and select another ball from the bag, and repaint the second ball to the colour of the first one. After doing that, place both balls back into the bag and begin a new step.
After doing this enough times, it will eventually happen that all the balls in the bag will have the same colour. On average, how many steps does it take for this to happen?

Doing the problem "properly" is very difficult, and I would only suggest it if you are crazy. However, solving it in the cases of N=2,3,4 is not particularly hard, and from that you will be able to guess what the answer should be.

Equilateral Roads

Time for the solution to the Houses on a Square problem, in spite of the fact that all (both, it seems) of my readers have already solved it.

Not much is gained by building up to the solution, so I'll just sort of blurt it out. I should use pictures here, but rather than figure out how pictures work on blogger, I'd prefer to just use words:

Place a node at the center top and center bottom of the square, connect the top node to each of the top two houses with roads, and connect the bottom node to the bottom two houses with roads and connect the top node and the bottom node to eachother. I assume that you have a picture in your head that looks like a capital i. Now we are going to move the top and bottom nodes, and the edges that terminate on them will move with them. Move the top node down a distance x, and the bottom node up a distance x. Now it is simply a matter of finding the x which optimizes the solution.

The total length as a function of x is

L=1-2x+4√(1/4+x²)

1-2x is the length of the center piece and √(1/4+x²) is the length of one of the diagonals. Maximizing L(x), we find the derivative is

L'(x)= -2+4*x/√(1/4+x²)=0
x=1/(2√3)

Giving a value of L=1+√3. At this point we haven't exactly shown that this is the optimal solution, however, we have shown that it is the optimal solution of this form. Before I go into more about that, first let us consider the rectangular case.

Consider a rectangle, with side lengths A and B. We will assume that A>B for defeniteness, and we will visualize that the longer side is the vertical one (I do this so that when I refer to "the top side" or something, we all have the same picture in mind). Trying the same form of solution as last time, we can see that we want our middle line to be vertical rather than horizontal, it will save more space that way. Letting x once represent the amount that the top node is down from the top, the total road length is given by

L=A-2x+4√(B²/4+x²)

Extremizing this gives that x=B/2√3, and L=A+B√3 (we can see that the length is minimized when A>B rather than B>A, as anticipated). Note that the ratio of x to B is the same in the rectangle case as it is in the square one. This means that the angle that the diagonal line hits the vertical line at is independent of the measurements of the rectangle. In fact, the vertex where the lines meet has angles entirely of 120 degrees. This is not surprising, because if we just had 3 houses we were trying to connect, we would place a point somewhere in the middle of them and have the lines connecting them to the point meet at 120 degrees. There is an exception if the triangle has an internal angle of greater than 120, then it is optimal to just draw straight lines (as the relevant point is outside of the triangle).

Actually, this is a feature of any solution, all angles involved in connections are at least 120 degrees. If you draw a solution with an angle of less than that, you are better to add a point and draw 3 lines connecting to it. I found a paper that talks about this problem at http://www.denison.edu/academics/departments/mathcs/herring.pdf. The full proof is nontrivial, but one conclusion is that to connect N nodes together, you will never need more than N-2 extra nodes, so for the square there are no solutions that involve needing 3 extra nodes, 2 is optimal.

You can also see that for any regular polygon with 6 or more sides, the internal angles are 120 degrees or larger, so it is optimal to just connect along the edges. A pentagon is a bit funny though, since the internal angles are 108 degrees, you do need to find a solution with extra verticies on the inside. The actual solution looks sort of funny, and isn't hard to find after some thinking, anyway, thats all I got on this problem.

Houses On A Square

Guess we once again find ourselves in the "new puzzle" part of the cycle. This particular one I heard a long time ago, can't recall when exactly, but I was recently reminded of it by James:

Consider 4 houses located at the corners of a square with side length 1. We must build roads so that one can drive from any house to any other house along the roads. What is the minimum total length of road that needs to be built?

Just to get you started through the first few layers of solution, you can easily solve the problem with road length 3, connect 3 of the houses along sides of the square, and you have connected them all. You can do better with a 2√2 solution, by drawing in the diagonals of the square, this will connect all the houses to eachother by roads. As amazing as it might seem, you can do even better than that.

As a generalization of this problem, try solving it on a rectangle. You can further generalize it to other shapes, and even arbitrary distributions of points, but I wouldn't suggest it.

Triangular Hypermarbles

Due to some discussion in the solution to the breaking marbles puzzle, I have decided to make a post of the extension problem. Now let us suppose we have K marbles and N floors, what is the minimum number of moves it takes to find the critical floor, assuming you are minimizing the worst case scenario.

For some definiteness, we will assume that if you eliminate all the floors but the end one, you must still check the end one. This is so that, say, in the 2 marble 3 floor case, you can do it in 2 steps optimally by checking floor 2, then checking floor 3 if it does not break, and checking floor 1 if it does (specifically you cannot just assume that if it didn't break on floor 2 it will then break on 3, and that if it broke on floor 2 it can't break on floor 1). The reason for this "handling of the endpoints" is so that in the two marble case, you cannot do better than Triangle(M) floors with M steps. Triangle(M), the M^th triangle number, will be denoted T_M, and is given by M(M+1)/2.

Now, lets try the 3 marble case. Let us suppose we have N floors, and we have a solution that works in never more than 5 steps (I'll generalize 5 to M later). Then with our first move, we might as well go to floor 11. If we go any higher, then if it breaks we cannot solve the remaining case with over 10 floors with only 2 marbles and 4 steps (2 marbles, 4 steps, 10 floors is optimal), and nothing is to be gained by going any lower. Next, we move 7 more floors, since if it breaks we can solve through the 6 we skipped with 2 marbles and 3 steps. Next, we will move up another 4 floors, so that if it breaks, there are 3 floors unchecked which we do can with the remaining 2 steps and 2 marbles. Next up 2 more floors, because we will only have 1 step left (now steps is more limiting than marbles, one gets the impression that when you have at least as many steps as you have marbles, binary search will be best). And finally we get 1 last step.

The result is 11+7+4+2+1=25 (looks like powers of 2 at the end, neat), this is better written as 10+1+6+1+3+1+1+1+0+1, that is T₄+1+T₃+1+T₂+1+T₁+1+T₀+1. Which is the same as P₄+4+1 (P_K is the K^th pyramidal number, the sum of the first K triangle numbers). The general statement, which is easy enough to verify by induction, is that with 3 marbles and M steps, you can do at most P_M-1+M floors. We will denote this as F(3,M), the maximum you can do with 3 marbles in M steps.

We already know that

F(1,M)=M
F(2,M)=T_M=M(M+1)/2

And now we know

F(3,M)=P_M-1+M
=(M-1)M(M+1)/6+M
=(M²+5)*M/6

The formula P_M=M*(M+1)*(M+2)/6 can be found either by induction or by realizing the connection between Pascal's triangle and the generalized triangle numbers (more on that in a moment).

Next we want to try 4 marbles. With M steps, we cannot go higher than F(3,M-1)+1 for our first step, and there is no sense in going lower. We then move up another F(3,M-2)+1 steps and so on until we are out of moves. The result is that

F(4,M)=Σ F(3,M-i)+M

The sum being i from 1 to M-1. This argument will hold quite generally, to get the result of

F(N,M)=Σ F(N-1,i)+M

i now being summed from 1 to M-1, or if we define F(N,0)=0 (the most number of floors you can do with N marbles and zero steps) we can start i from 0.

Before going further into F, I would like to consider another function. Let us define the "hypertriangular numbers" H(N,M) so that each one is the sum of the first M previous ones. Specifically,

H(0,M)=1
H(N,M)=Σ H(N-1,i)

i being summed from 0 to M-1. These generalize the triangle numbers cleanly, H(1,M)=M+1, H(2,M)=T_M+1, H(3,M)=P_M+1. One can easily confirm that

H(N,M)=H(N-1,M)+H(N,M-1)
H(N,0)=1
H(0,M)=1

So these numbers basically form up Pascal's triangle, so that

H(N,M)=(N+M)!/(N!M!)

Note that in the definition, N and M were not symmetric, but in the end they are.

I'll come back to that function later, but the connection between hypertriangle numbers and Pascal's triangle will be important in a bit. Recall F's recursion relation is:

F(0,M)=0
F(N,0)=0
F(N,M)=Σ F(N-1,i)+M

(i summed from 0 to M-1). One can start filling this in, its slightly illustrative to do this. Labeling x-axis as M value and y-axis as N value, we get (apologies for formatting issues):

0 0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7 8 9
0 1 3 6 10 15
0 1 3 7 14 25
0 1 3 7 15 30
0 1 3 7 15
0 1 3 7
0 1 3 7

and so on. Each number is the sum of the numbers in the row above it from 0 to M-1 and then adds the number M. This results in the first non-zero row coming up just as M, the second row comes up as triangle numbers, but then the third row comes up funny. Note that when N gets bigger than M, the numbers just turn into 2^M-1, somewhat demonstrating the fact that having more marbles than steps means the marbles breaking isn't an issue, and then you get the result of a binary search.

One can construct Pascal's triangle in a similar way, starting with 1's and letting each element be the sum of the row above it, from that one can prove the relation H(N,M)=H(N-1,M)+H(N,M-1) (I'm going to start calling this the Pascal relation). In this case, one can find by inspection of the graph that F will obey a similar linear relation:

F(N,M)=F(N-1,M)+F(N,M-1)-F(N-2,M-1)

It isn't exactly clear to me how to prove this formally, but I am confident in it. You should confirm for yourself that it is true, it is somewhat pivotal. Anyway, we now have a semi-Pascal relation that F obeys, and now our relation is linear in F. At this point, I asked myself, "what would Matt do?" and got the only obvious answer, define the derivative function:

D(N,M)=F(N,M)-F(N-1,M)

I suppose its the derivative with respect to N, but its one nonetheless.

Using our semi-Pascal relation on F, we can find a similar one on D, rewrite the F one as:

F(N,M)-F(N-1,M)=F(N,M-1)-F(N-2,M-1)
=F(N,M-1)-F(N-1,M-1)+F(N-1,M-1)-F(N-2,M-1)
=> D(N,M)=D(N,M-1)+D(N-1,M-1)

This is almost a Pascal relation on D, but things are shifted over slightly, to fix this we will define a new function K according to

K(N,L)=D(N,N+L)
K(N,M-N)=D(N,M)

I'm really just defining L=M-N and considering a function of N,L rather than N,M (this is logical, since D sort of "breaks down" if N>M, thats were F stopped varying). Anyway, from D's relation, we can find that K obeys the relation:

K(N,M-N)=K(N,M-N-1)+K(N-1,M-N)
K(N,L)=K(N,L-1)+K(N-1,L)

Hurrah, K obeys the Pascal relation, now we need to find its boundary conditions. Trying to check K(0,L) immediately causes problems involving D(-1,L), so lets try K(1,L)

K(1,L)=D(1,L+1)=F(1,L+1)-F(0,L+1)
=L+1

Good, next lets check K(N,0)

K(N,0)=D(N,N)=F(N,N)-F(N-1,N)

Uh oh, this could put a stop to all of our fun right here. Fortunately, we can handle this, it turns out (one can inspect the chart) that F(N,N)-F(N-1,N)=1. We can prove this using the earlier recursion relation for F to show that

F(N,N)=F(N-1,N)+F(N,N-1)+F(N-2,N-1)
F(N,N)-F(N-1,N)=F(N,N-1)+F(N-2,N-1)

Next, we must handle F(N,N-1) by considering the "physics" of the problem. Consider F(N,M) when N>M, so you have more marbles than you have steps remaining. In this case, the extra marbles are irrelevant, so you can just ignore their existence, F(N,M)=F(M,M) when N>M. This is also present in the table of F. Thus we have F(N,N-1)=F(N-1,N-1), therefore

F(N,N)-F(N-1,N)=F(N-1,N-1)+F(N-2,N-1)

So apparently F(N,N)-F(N-1,N) is constant in N (shifting N by 1 has no effect, so shifting it by 100 won't either). Evaluating when N=1 gives us

F(N,N)-F(N-1,N)=F(1,1)-F(0,1)=1-0=1

Proving that K(N,0)=1.

Ok, current data on K:

K(N,L)=K(N,L-1)+K(N-1,L)
K(1,L)=L+1
K(N,0)=1

We can extend K to K(0,L) using its Pascal relation on N=1:

K(1,L)=K(1,L-1)+K(0,L)
L+1=L+K(0,L)

Demonstrating that it is consistent to take K(0,L)=1. so, K has the same initial conditions as H (the hypertriangle function) and the same Pascal relation, so K=H must be the case.

K(N,M)=(N+M)!/(N!M!)

Now we can find D as D(N,M)=K(N,M-N)

D(N,M)=M!(N!(M-N)!)

So it seems D is just M choose N, naturally if N>M then D(N,M)=0.

Finally we are ready to construct F. D was the "derivative" of F, so F is the "integral" of D. Specifically,

F(N,M)-F(0,M)=Σ D(i,M)

The sum running i from 1 to N. We know F(0,M)=0, and D(N,M) is zero if N>M, and so we have

F(N,M)=Σ M!/((M-i)! i!)

Summing i from 1 to min(N,M). Note that the summation starts at 1, where a typical summation over choose functions would start at 0.

So when N is less than M, F(N,M) is the sum of the first N elements in the M^th row of Pascal's triangle, minus 1. In words, one can consider a set of M elements and ask how many nonempty subsets with at most N elements are there. I'm not sure if there is a clean form for that.

Finally, if N>M, F(N,M)=F(M,M), the number of nonempty subsets of size at most M in the set of M elements. A set with M elements has 2^M subsets, and one of those is the empty set. So, F(M,M)=2^M-1, as one would expect from a binary search solution.

This completes the solution for how many floors you can cover with N marbles in M steps. If somebody asked your the reverse (and more standard) problem of solving K floors with N marbles, you simply look for the minimal value of M such that F(N,M)>K, and that gives you the worst case M.

Invisible Ants

Time for the solution to the ants on a line problem. The solution is actually stupidly simple, it really just is about one realization: Two pointlike ants bouncing off eachother is identical to two pointlike ants passing through eachother. At this point, you can easily see that there theoretical maximum is that you have an ant right on the left end initially walking to the right. No matter how the other ants are distributed, it will take exactly 100s for the last ant to fall off.

Pretty dumb puzzle, to be honest, but I thought it was a bit of a cute trick.

Ants On A Line

New puzzle time, though this one is a bit strange. I first heard it somewhere random on the blagosphere:

You have a one dimensional meterstick that you distribute 100 pointlike ants on, at random locations. Each ant will select randomly to begin walking right or left, and keep walking that direction at until it bumps into another ant. If two ants bump into eachother, they will each turn around and go the other direction. If an ant reaches the end of the meterstick, it will fall off. The ants have a constant speed of 1 cm/s. Given enough time, eventually all of the ants will fall off the stick, and for different initial distributions and choices of initial directions this will take a varying amount of time. What is the theoretical maximum length of time it will take for all the ants to fall of the stick?

So, find the distribution and set of left/right choices that maximizes the time the ants spend on the stick. Or the class of distributions, I suppose, no guarantee that it is unique.

Gambling Solution

Time to solve the gambling puzzle from last time. When I told this problem to Matt, his immediate reaction (which I found rather amusing) was to say "OK....hmm, what would Kory do?................. ah yes, consider a simpler case." That seems to be the right approach, so lets assume that the frungy tournament is best or 3, first team to 2 games wins.

We already know that we should not bet all $400 on Blue in the first game, because then Red might win the first game and suddenly we are out of money. Blue then might go on to win the tournament and we are in trouble. Taking the other extreme strategy of not betting on the first game at all, if Blue wins the game, we have no choice but to bet everything on Blue now (else if Blue wins again we are done). But now if Red wins we are out of money for the final game (which, if we are unlucky, Blue will win).

The next obvious thing is to take the middle ground and bet $200 on Blue in the first game. Now if Blue loses we have $200 left and Red is up 1-0. Since Red winning again is auto-victory for us anyway we might as well bet everything we have left on Blue. If Blue wins now we have $400 and they are tied 1-1. At this point it is simple. OK, instead suppose Blue wins the first game, now we have $600 and Blue is up 1-0. We need only bet $200 on Blue this next game, so that if they win we end with $800, so now if they lose we have $400 left and the teams are 1-1 again. Good, this is a full solution, it is also nice to see that it is 100%, rather than just a probabilistic solution.

A few things to notice about that solution. First of all, we never ended with any excess money, in all strategies we came out with exactly $800. Another thing is that the result was "path independent", that is, no matter if Blue won first or Red won first if the game got to being tied at 1-1 it did not matter how we got there, we had exactly $400. Both of these facts are general about the full solution (assuming it is a 100% solution, that is).

First of all, the "no excess money" result. Let us begin by supposing that we have a full strategy for how to bet in the games as the tournament progresses. Note that every game is a bet on a coin flip, therefore no matter what your planned strategy is you will have an expectation of $400 after each game. The side bet is an even chance split of $800 and $0, so it has an expectation of $400. Thus, at the end of the week, no matter your strategy, you have an expected final amount of money equal to $800. If there is any possible path through the game tree that ends with more than $800, there is also a path through the game tree with less than $800, so no solution that works 100% of the time can ever end with excess money.

Now for the "path independent" result. Suppose we have a betting strategy for the whole week. Further suppose there was a node in the game tree (for example, Red is up 3-1 on Blue) that depending on how you got there you will have different amounts of money (so for one path you have $x, and for the other you have $y). Assuming x>y then if we get there with $x, we can simply throw away $(x-y) to leave ourselves with $y and still have a planned out strategy for the rest of the game. This strategy will apparently have had excess money, which we know is impossible.

Alright, we are almost ready to make the full solution. The final step is that if your strategy knows how much money it has at each step, you know how much you need to bet. So if you know you have $400 now and if Blue wins you need to have $600, but if Red wins you need to have $200, the it is time to bet $200 on Blue. We will describe a strategy as a chart of the amount of money we need at each step in the game tree. Consider the tree for the three game case as follows (This is going to format like crap, but it will be readable):

AAA BBB
CCC DDD

OK, AAA represents the amount of money we start with (so, 400), BBB is the amount of money we have if Red is up 1-0, CCC is the amount of money we have if Blue is up 1-0 and DDD is the amount of money we have if the game is tied 1-1. So, my notation is that if Red wins we move a step right, if Blue wins we take a step down.

We know that if Blue wins two games we must have exactly 800, and if Red wins we must have exactly 0 (no excess money), so we can add those values in:

AAA BBB 0
CCC DDD 0
800 800

Now, we know that DDD must be exactly 400, there is no other amount of money it can be that will be consistent with this. But then we know that CCC is 600, and BBB is 200. Finally we get that AAA is 400, which is really just a consistency check.

You can now easily see the general answer, but lets just go through it anyway, for the full 7 game case we have:

AAA BBB CCC DDD 0
EEE FFF GGG HHH 0
KKK LLL NNN PPP 0
QQQ RRR SSS TTT 0
800 800 800 800

Great, we can now see that TTT is 400, PPP is 200, HHH is 100 and DDD is 50. Also we get that SSS is 600, RRR is 700 and QQQ is 750. Sort of a neat symmetry here:

AAA BBB CCC 050 0
EEE FFF GGG 100 0
KKK LLL NNN 200 0
750 700 600 400 0
800 800 800 800

Its simple enough to continue filling this all in now:

400 275 150 050 0
525 400 250 100 0
650 550 400 200 0
750 700 600 400 0
800 800 800 800

So, apparently the correct solution begins with placing a $125 bet on Blue in the first game.

Gambling Puzzle

I suppose I could have titled this one "gambling problem", but that carries all sorts of other implications that we won't bother with here. Anyway, this puzzle I first learned on the forums at xkcd:

Alex has decided to start a widget company with his $800 savings. He has put in an order for a widget machine that will be delivered next week and will cost him $800, to be paid on delivery. However, when he went back to check on his money he found out that last night, in a drunken stupor, he placed a $400 bet on Red in the upcoming Red vs. Blue Frungy tournament.

A Frungy tournament is a seven game tournament, one game played each day, and the first team to win 4 games wins the tournament. Red and Blue are equally skilled teams, and each game is 50-50 between them (and there are no ties in Frungy). Alex decided to go to the betting office to simply bet his remaining $400 on Blue so that they would cancel and he would be guaranteed to get back his $800 at the end of the week, but he has found out that that part of the betting is closed, now he can only bet on the individual games each morning.

Each morning, Alex may place as much money as he likes on an even bet for either Red or Blue. The bet is resolved that evening, and he will receive any winnings before the next day. Find a betting strategy that guarantees that Alex will end the week with at least $800 so he can make the payment for his widget machine.

Assuming that all made sense through the "theme", you basically have to guarantee that if Blue wins 4 games, you have made at least $800 from your bets. If Red wins 4 games, then your side bet kicks in and you are safe no matter what.

Alex would just like to bet all $400 on Blue in the first game, but Blue might lose the first game and then go on to win the tournament. Find a strategy that maximizes the probability of ending the week with at least $800.

Triangular Marbles

Time to post the solution to the marble problem from last time. First of all, lets just try some naive strategies. When I introduced the problem, I had suggested the idea of just moving up the floors one at a time until the marble broke. Since we have two marbles, we can try to do better by moving up two floors at a time. We try floors 2,4,6,8,10,... and when the marble breaks on floor k, we simply use the other marble to try floor k-1 and confirm which was the critical floor. This strategy is much less wasteful that the last one (at least it makes use of the second marble), and the worst case scenario is if the critical floor, N, is 99 or 100, in which case we need to try all the even floors (50 of them) plus floor 99 for a total of 51 tries.

Alright, if two floors at a time improved our position, lets try three. Now we will test floors 3,6,9,12,... and when the marble breaks on floor k, we try k-1 and k-2 to confirm. I suppose if we get to floor 99 and the marble doesn't break, we don't actually need to try floor 100, but whatever. The worst case scenario is if we go all the way to 99 and must do 97 and 98. In this case we did 3,6,9,12,...99 for 33 floors and 97 and 98 for a total of 35 tries.

So taking larger steps seems to be better, we could try something silly like 25 floors at a time and then your worst case is 27 (easy to confirm). It is pretty clear that you will get the optimal strategy of this form if you just do 10 at a time (the square root of 100), and the worst case is 19.

Now the next thing to notice is that if you were to make a bigger step at the beginning, you don't make your life any worse. In particular, if you start with floor 15 and then go 25,35,45,..95, then go back, the worst case won't be N=100 anymore (that would take only 11 tries), the worst case will be N=94 or 95 for a total of 18 floors (its one better than the last case because you basically skipped a set of 10).

This idea of skipping a bunch of floors at the beginning will only work so far, of course. If you try to move all the way up to 20, then if it breaks right away you must try all 20 initial floors. So we see that if the actual worst case scenario is that it takes M tries, then you cannot start on a floor higher than M (because if the critical floor N is M-1 you are in trouble). Next, you can move up another M-1 floors at most, because if you move up M more and N is 2M-1 then you will need M+1 total tries to find it. Clearly we must move up by M-1 floors to floor 2M-1. Next we can move up by M-2 floors at most. You can see what is happening here.

If the total number of floors is the K^th triangle number, then you can solve the problem most efficiently in K steps by moving up K floors then K-1 then K-2 and so on. The 13^th triangle number is 91, so we cannot do more than 91 floors in 13 steps. Thus we can just treat our 100 floor building as a 105 floor building (the 14^th triangle number) and use that solution.

Breaking Marbles

Time for a new puzzle, I'm not sure where I first learned this one, it was somewhere randomly on the internets:

Consider there is a building with 100 floors. You have a pair of identical marbles which you are going to drop from the windows of this building. There is a height at which the marbles are broken if they are dropped from that height. The marbles will break if they are dropped from that height or more, and take no damage if they are dropped from any lesser height. The objective is to determine what floor is that height using as few drops as possible. You may assume that a broken marble is useless, and that you are to minimize the worst case scenario.

To put this problem into "math", there is an integer X between 1 and 100. You may guess an integer and I will tell you if X is " ≥ " or " < " your guess. If I ever say " ≥ " twice, you cannot guess again and must tell me what X is. You are to minimize the worst case for the number of guesses.

A clear naive strategy is just to go up the floors, one at a time, and when the marble breaks you know exactly what floor it happened on. Of course, the worst case here will need 100 drops and you didn't even try to make use of the fact that you have two marbles (actually, that solution is optimal if you only have one marble).

Two Spheres In One

I suppose in recent posts involving the Axiom of Choice, I promised that I would give a proof of the Banach-Tarski paradox here. Not that anybody actually cares to read it, but the point of this blog is for me to feel smart while I rant about math, so I will do that.

First of all, we need to do a bit of group theory, because really all of the important math is just group theory. First, consider rotations of the three dimensional unit ball. These rotations form a non-commuting group, as you should know. Specifically, let A be a rotation of the sphere around the x-axis by 1 radian, and let B be a rotation of the sphere around the y-axis by 1 radian. A and B do not commute, and Aⁿ and Bⁿ are never equal to the identity (since there are 2π radians in the full rotation and 2kπ cannot ever be rational for rational k). It is also important to know that no string of A's and B's and their inverses (for example AB³A^-2BA²) will ever be equal to the identity, unless the inverses cancel exactly (so A^-2BB^-1AB^-1BA is the identity trivially, but ABA^-1B^-1 will not be).

Alright, now let us consider S to be the set of all strings involving A, A^-1, B, B^-1 with no term right next to its own inverse (so if there were to be any of those, cancel them). The empty string is also in S, to be the identity element. The strings in S can be arbitrarily long, though none of them are actually infinitely long. S clearly forms a group (the free group on two elements, to be specific).

Next, let us decompose S into 5 sets. S(A) will be elements of S staring with A, S(B) will be elements of S starting with B, S(A^-1) and S(B^-1) defined similarly, and S(e) has the last element, the empty string. Clearly S is the union of these 5 disjoint sets. Next, let us denote multiplication of an element x by a set P as the set you get when you multiply x by each element in P (so A times {A^-1, BA, A, B^-1} = {e, ABA, A², AB^-1} with e as the empty string).

Now, one can see that A times S(A^-1) gives you every element of S, except the ones that start with A (as S(A^-1) never has an element that starts A^-1A, but it will have plenty that start A^-1B , for example). Even the empty string is in A S(A^-1). So we can say that S is the union of the disjoint sets S(A) and A S(A^-1). Similarly, S is the union of the disjoint sets S(B) and B S(B^-1).

Something a bit weird has happened here, since now we see S = S(A) ∪ S(B) ∪ S(A^-1) ∪ S(B^-1) ∪ S(e) = S(A) ∪ A S(A^-1) = S(B) ∪ B S(B^-1). Technically, this isn't a problem, as all of those sets (except S(e)) are infinitely large anyway, so there isn't a problem with S being one-to-one with some of its own subsets. Anyway, enough group theory, back to the sphere.

Now, consider a point on the surface of the sphere. Consider all the places you can reach from that point by using elements of S. There is an element that is A away, an element that is B away, an element that is ABA^-3B²A away, and so on. These elements cannot cover the entire sphere (there are only countably many elements of S, and uncountably many places on the sphere to reach). We will consider an equivalence class on the sphere (as we always do when we are about to use the axiom of choice), two points on (or in) the sphere will be equivalent if you can move one into the other just by using rotations in S. Now use the Axiom of Choice to select one element from each equivalence class and put them into a set V. So for each point on the unit ball (the solid sphere, that is), there is exactly one element of V that can reach that point with exactly one of the rotations in S, and no two elements of V can reach eachother using rotations in S.

We will break the unit ball U up into 5 pieces. Places in U that can be reached from an element of V and a rotation in S(A) will be called U(A). Places in B that can be reached from an element of V and a rotation in S(A^-1) will be called U(A^-1). Similarly we define U(B), and U(B^-1). U(e) is just V itself. So U is the union of the 5 disjoint sets U(A), U(B), U(A^-1), U(B^-1), and U(e).

Now is where the "magic" happens. Since we know that A^-1S(A) is S(A) ∪ S(B) ∪ S(B^-1), then A^-1 applied to U(A) gives us U(A) ∪ U(B) ∪ U(B^-1) ∪ U(e). Similarly, B^-1 applied to U(B) gives us U(B) ∪ U(A) ∪ U(A^-1) ∪ U(e). Thus, U(B) and U(B^-1) can construct the entire unit ball, and U(A) and U(A^-1) can also give us the entire unit ball.

I suppose I have dropped off U(e), as well as the issue of the fixed points under rotations A and B (specifically, the x-axis and the y-axis). I'm not going to handle those properly, but the main point has already been made anyway.

It may seem very strange that you can rotate U(A) and get U(A) unioned with other sets, but this actually can happen without the Axiom of Choice. Consider the unit circle on the 2-plane. Starting with the point (1,0) on the x-axis, consider the set of points that you hit by rotating that point counter-clockwise 1 radian at a time. So you get a countably infinite set of points, as that list never crosses itself, but does not cover the whole circle (it never hits the point at -1 radian, or the point at π radians, for example). Now rotate that set clockwise by 1 radian. The point at 1 radian moves to the point at 0, the point at 6 radians moves to the point at 5 radians, and the initial point at zero radians moves to the point at -1 radians. This new set covers all the points in the old set, and has one extra point also, very strange (actually, this is how you can handle the set U(e), you basically "rotate it out of existence" by combining it with U(A) or something). Cardinality and measure are not an issue here, as the set is countable with measure zero both before and after the rotation.

Anyway, thats really it for the proof. You can also extend the proof to take any finite sized object, cut it into finitely many pieces, and reassemble them into any other finite object (the standard example is to arrange a pea into the sun). Note that this does require three dimensions or more, as you need two non-commuting rotations.

Infinitely Powerful Logicians

Time to post the solution to my last problem involving the Axiom of Choice. First off, the mere existence of a solution to this problem should terrify you. The reason is that if only finitely many people are wrong, that means that there is a last person that is wrong. So, if you were to look at the guesses of the logicians, you would find that they have some right and some wrong for a while, but then suddenly there is a last person who is wrong and everybody after that is just magically right, in spite of the fact that those people had no special information. Somehow, they were able to use the infinite hats in front of them to deduce their own hat colour, even though it was independent.

It is also worth saying that in most problems like this, where everybody must guess simultaneously, you try to arrange the peoples guesses so that when something goes wrong, it goes maximally wrong. This is needed because each person on their own has a fixed probability of being wrong, so you simply try to make those fixed probabilities non-independent so that you minimize the chance that things go wrong.

In this problem, each person has a 50% chance of being wrong at the time they make their guess. This is a bit of a problem, as that means that on average half of the people will be wrong and there is very little one can do about it. Fortunately, the Axiom of Choice destroys usual notions of probability. It does this by creating sets of non-defined length. Simply put, length can be defined in terms of the probability that you were to pick an element of the set. Specifically, if you have a subset S of [0,1], and you were to pick an element of [0,1] at random, the probability that you would get an element of S would be L(S), the sets length. If the Axiom of Choice allows for sets with no length, it also allows for sets that we have no well defined probability of picking an element of.

Alright, time to start constructing the solution. Consider the sequence of hats that the logicians are wearing to be an infinite binary sequence, starting from the back of the line (x₁, x₂, x₃...). Use that binary sequence to construct a real number, y, between zero and one, represented in binary as y=0.x₁x₂x₃...

Next, construct an equivalence class on the reals between zero and one. We will consider two real numbers, a and b, to be equivalent if, when expressed in binary, a and b differ in only finitely many decimal places. Now we use the axiom of choice to select one element from each equivalence class and call the set V. So, for all z between zero and one, there is exactly one element of V that differs from z in only finitely many decimal places. Also, no two elements of V differ from eachother in finitely many decimal places. Have the logicians agree on the set V and memorize it.

Now, each logician is to look at the hats in front of them, and they know "most" of the binary sequence of the actual number they are in (calling that y). That is to say, logician number 6 can consider y=0.x₁x₂x₃x₄x₅x₆x₇x₈x₉.... The first 6 digits are unknown to him, but all the rest are known. Every single logician knows all but finitely many of the digits in the binary expansion of y. As a result, every logician is aware of what equivalence class y is in, and they all have agreed on an element in V (call it k) that differs from y in only finitely many places. Thus, if everybody guesses assuming that y actually is k, then only finitely many of them will be wrong.

Naturally, this requires that the logicians be capable of an infinite amount of work, they must find a choice function, then they must memorize all the elements in the set V (or just memorize the choice function, its the same). Next they must examine all of the (infinite) hats in front of them to figure out what equivalence class they are in to select an element of V.

But, if they can do all of that, they can accomplish the rather miraculous task of solving this problem.

It is also worth saying that they can solve this problem even if the hats are not restricted to being white or black. As long as they know the set of colours that the hats can come from, they can do the exact same thing. Instead of considering binary sequences, they simply consider trinary sequences, or hexadecimal, or whatever it takes to get as many digits are there are hat colours. If that doesn't make you think the Axiom of Choice is crazy, I don't know what will.

The Axiom Of Choice

So, when I started this blog, the intention was to do mostly logic puzzles with the occasional math rant. I suppose I have done a few mathematical rants, but they have always been about logic puzzles themselves, as opposed to just random rants about stupid things that exist in set theory. For those who just want a logic puzzle, you can skip to the end of this rant and read the one I am putting up this week, but I'm going to warn you, you aren't going to be happy with it. Anyway, back to stupid things in set theory.

By far, the stupidest thing that exists in set theory is the Axiom of Choice. Anybody who reads this blog is already familiar with it, but I might as well give my formal statement of it:

Given a collection of nonempty sets S, there exists a function f such that for all sets X in S, f(X) is an element of X.

Specifically, f is a function that chooses exactly one element from each set X in S, and f(X) tells you specifically which element the function has chosen.

This can sound innocent enough, all it says is that if you have a bunch of sets, it is possible to choose one element from each of those sets. The problem can come if the collection is infinitely big, its not clear that there would be a "rule" to tell you exactly how to select exactly one element from each of those sets.

For example, consider that we are only using the natural numbers, {0,1,2,3,...}. Then if we are told that S contains only nonempty subsets of the naturals, it is easy to find a choice function, simply always choose the smallest element of X (as it turns out, the Axiom of Choice is trivially true in a universe that only has the natural numbers). However, now let us suppose that we are using the real numbers, S might be the set of all nonempty subsets of real numbers, the function that would allow us to select exactly one element from each of these sets would be difficult to construct indeed. The Axiom of Choice is the postulate that there is such a function that would allow us to do just that, even though it gives no hint as to what such a function would look like.

How about some sort of middle ground, let us consider the integers, {....-3,-2,-1,0,1,2,3,.....}. If S contains nonempty subsets of the integers, you can still come up with a choice function without the Axiom of Choice. Specifically one can just use "If X contains any nonnegative integers, pick the smallest nonnegative one. If X contains only negative integers just use the largest negative integer." This rule will always pick out an element, so the Axiom of Choice is also not needed for the integers.

This is actually as if we had rearranged the order of the integers to be {0,1,2,3,....-1,-2,-3,....}, lining up all the positive integers in correct order, and then saying that -1 is greater than all the positive ones and then moving 'up' with the negative integers. Now under this new ordering of the integers, the rule is simply to "pick the smallest one". One could imagine that if there was some way to line up the reals such that every subset had a smallest element that the Axiom of Choice would then be unnecessary.

This leads us to a pretty general statement about the Axiom of Choice. Specifically, we ("we" being basically every set theorist on the planet) will consider orderings on sets, a concept of "greater than". First, an ordering will be called a "partial order" if whenever a>b and b>c then a>c follows. This is a reasonable demand to make on something you would want to call a concept of "greater than". Next, a partial ordering will be called a "total order" if for all a and b in the set, exactly one of these is true: a>b, b>a, a=b. Note that a partial ordering did not demand this, partial orderings can have elements where two elements are incomparable. Most orderings that one deals with in math are total orderings (like the usual one on the reals). Finally a total ordering is called a "well order" if for any subset there is a smallest element. The only well ordering that one deals with normally is that on the naturals, but one can see that you can give the integers a well ordering as well.

We can see that if you can find a well ordering for a set, you get a choice function right away and the Axiom of Choice is not needed. The "Well-ordering theorem" is the idea that every set can be well ordered (and needs that Axiom of Choice to prove). We have seen that if the Well-ordering theorem were to be true on its own, the Axiom of Choice would come for free, so you can see that the Axiom of Choice will be true if and only if the well-ordering theorem were true.

Alright, that is all fun and good (and probably most of you stopped reading ages ago), but what is really the problem with the Axiom of Choice? I mean, it seems intuitive enough, and probably helps prove some things (like the Well-ordering theorem), what could go wrong?

One of the basic problems shows up with the Banach-Tarski paradox. You can go look it up if you like, and I'll give a proper proof of it in the next few weeks, but the paradox is that it is possible to take a solid sphere, divide the sphere up into a number of disjoint pieces and, using only translation and rotation of the pieces, reassemble them into two copies of the original sphere, each the size of the original sphere. It seems like it might be a trick using an infinite number of pieces or something, but you can actually do it using just 5 pieces.

One might be concerned about conservation of volume in the Banach-Tarski paradox, how you could cut up a sphere of fixed volume and make something with twice as much volume using just translation and rotation. The trick is that the pieces do not have any well defined volume, and so conservation goes out the window. Actually, it is convenient to show how to use the Axiom of Choice to construct an object with no well-defined volume.

We will constrain ourselves to the one dimensional real line here, and we will have a concept of length. The length of a set S will be denoted L(S). We will denote adding a number to a set to mean that we construct a new set by adding that number to each element of the set, se {1,2,4}+5 is the set {6,7,9}. It is clear that we would want our length function to be translation invariant, so that L(S+x)=L(S) for any real number x. Next, if we take the union of two disjoint sets we want the lengths to add, so that L(S ∪ P)=L(S)+L(P) if S and P are disjoint (this also means that if P is a subset of Q then L(P) ≤ L(Q)). Finally, we want the length of intervals to be what we expect them to be, that being L([a,b])=b-a for b>a. Note that there can be nonempty sets with zero length, such as a set with only single point elements.

Alright, this is enough properties of length to prove that (with the Axiom of Choice) there exists a set that cannot be evaluated with this function. Let us consider an equivalence relation on the real numbers, two real numbers will be said to be equivalent if they differ by a rational number (this means that for each number, the set of numbers equivalent to it is a pattern that looks like the rationals). The set V will be constructed by choosing one element from each equivalence class that is also between 0 and 1. So, every real number differs from exactly one element of this set by a rational number, and does not differ from any other elements of this set by a rational number.

Next, list all the rational numbers between -1 and 1 and assign then each to a natural number, so that x_n represents the list of all rationals between -1 and 1. It is clear that for every real number y between 0 and 1 there is an n such that y+x_n is an element of V, so that y is an element of V-x_n. V-x_n will also spill out past (0,1) and hit some reals between -1 and 2, but will not get all of them. We also know that V+x_n is disjoint from every V+x_m for n and m different, as V had only one number from each equivalence class.

Now, we can consider E to be the union of the disjoint sets V+x_n. It is clear that (0,1) is a subset of E and E is a subset of (-1,2). Thus 1 ≤ L(E) ≤ 3, but we also know that L(V+x_n)=L(V). So If L(V) is zero, then E is the disjoint union of sets of length zero, and it has length zero, but if L(V) is nonzero, then E is the disjoint union of infinitely many sets with nonzero length and has infinite length.

Thus, we cannot have a function L that consistently assigns length to every set that exists in a world with the Axiom of Choice. This sort of thing will come up again later when I show the proof of the Banach-Tarski paradox and have to deal with the issue of volume conservation.

For now, its time for a logic puzzle (using the word "logic" loosely here), I first learned this one on the forums at xkcd:

You have a countably infinite number of logicians standing in a semi-infinite line. Each of them is wearing a hat which is either white or black. They can all see the (infinite number of) hats in front of them, none of the (finitely many) hats behind them and not their own hat and they all know their own position in the line. They must each simultaneously write down a guess about their own hat colour. They win the game if only finitely many of them are wrong, they lose if an infinite number of them guess wrong. Before the hats are assigned they may strategize. Using the Axiom of Choice, prove the existence of a strategy that guarantees their victory.

Numbers Are Crazy

Probably enough time has passed since my silly math problem. If you have spent some time working on it, you will see that you can basically create every number around 21 easily, but getting to 21 is a bit harder.

To start, we might as well try to make 21 by constructing 7x3, thats basically all 21 is, getting there by any sort of addition is going to be pretty tricky. We already have a 7, so how can we make a 3 from {1,5,6}? Well, basically we can't. We do have a 6 though, so if one considers 21 to be 7x6/2, we just need to use the 5 and the 1 to make a 2 (also clearly impossible). The real trick though is to use grade school division trick that a/(b/c)=ac/b, so 21 is also 6/(2/7). We now need to use {1,5,7} to make 2/7, and that one is pretty easy to do. Of course, the solution is:

6 ÷ (1 - (5 ÷ 7))

The neat blind spot is that people tend to dismiss division right out when they are given this puzzle. There is no division that you can do with these numbers that will not exit the integers, and people tend to assume that once you leave the integers you won't get back.

The double division is also the solution for the "{3,3,8,8} makes 24" puzzle, so I won't bother saying the solution specifically, its pretty simple now.

Silly Math

Alright, I've once again run out of standard "find the optimal strategy" type problems, its time for a seemingly simple math problem. I can't recall when I first learned this one, I've known it since at least grade 9:

Using the numbers 1, 5, 6, and 7, each once and only once, and +, -, x, and ÷ as much as you like, construct a formula for the number 21. You may also use as many brackets as you like.

To be clear, there are no stupid tricks in this problem, like sliding the 1 and 5 together to make 15. The solution is exactly of the form _*_*_*_, where _ is replaced by the numbers from {1,5,6,7} each used once, and * is replaced by {+,-,x,÷} (and can use repeats). You also have as many brackets as you need to control order of operations.

I always find this problem rather interesting because typically this sort of problem is either trivial or impossible, but this one is neither. There is a funny blind spot people tend to have with this.

Also, if you are in the mood, try using 3,3,8,8 to make 24 with the same rules. This one will be much easier after you have solved the first one.

Two Is Not Enough

Alright, I'm going to be gone for a week soon, so I better post the solution to the latest hats in a line problem.

The solution starts with the standard thing of spending the ones you can afford to get wrong right off the bat and then having everybody else correct. Surprisingly, you can use the two people in the back to tell the next three people their hat colours. Numbering the people from back to front, person 1 can see all the way to person 5. First, person 1 will volunteer if person 3's hat colour is black, and person 2 will volunteer if person 3's hat colour is white. Person 1 will make his guess equal to person 4's hat colour, and person 2 will make his guess equal to person 5's hat colour. In this way, we now have the back three people knowing their hat colour and we cannot afford any more incorrect answers.

If you had tried solving this problem but did not notice a way to get three people to know their hat colours, you might want to stop reading and go try again, that first step is crucial.

Alright, now for the rest of the solution:

Call the current back people A,B,C,D,E,F, and G. A can see all the way to E. A and B will guess their hat colours next, A will guess first if D and E have the same colour hat, B will guess first if D and E have different colours of hat.

Now D knows his own hat colour (he can see E's), and E knows his hat colour relative to D's. Next C and D will guess. C will guess first if F and G have the same hat colour, and D will guess first if F and G have different hat colours. When D makes his guess (either first or second), E will learn his hat colour. F knows his hat colour (he can see G's) and G knows his hat colour relative to F. The part of the solution in this paragraph is stable and can propagate all the way forward.

And at the end, when everybody knows their own hat colour, everybody can just volunteer and finish things off.

Slightly Less Myopic Hats In A Line

So, you can tell that I've been wandering through the xkcd forums for puzzles when I manage to make four posts in one month after having done about one post a month for so long. Anyway, here you go:

One hundred people are standing in a line, each of them wearing either a black hat or a white hat. They all can see the hats on the four people directly in front of them, no hats behind them and not their own hat. Each round, a man in a black suit will ask each person in the line if they would like to guess their own hat colour now, the answer they give is secret, nobody else in the line can hear. The man will then select one of the people who volunteered and ask their hat colour. That person is then removed from the line and a hole in left in the line. Everybody in the line is made aware of who was selected to guess their hat colour and what they guessed. The people may only guess "black" or "white". Then a new round starts with the man asking who would like to guess their hat colour now.

The players lose the game if ever more than two people get their hat colour wrong, or if during any round nobody volunteers to guess. The players win if everybody has finished guessing their hat colour and no more than two people are wrong. Before the hats are assigned the players may strategize, find a strategy that is certain to win assuming the man in black (who also assigned the hat colours) is an adversary.

Its the same as the other hats in a line problem (I basically copy-pasted it), but now holes will be left when people leave and they have a sight range of 4.

Just to be clear, if we number the people starting from the back, person 1 can see the hats on each of {2,3,4,5}. If person 3 now makes a guess and leaves the line, person 1 can now see the hats of {2,4,5}, but still cannot see the hat on 6.

Two Is Enough

Time for the solution to the hat problem from last time. Not really much to the solution, as Mark pointed out in the commentary its just about parity.

To start, we will number the people from 1 to 100, with the person at the back on the line numbered 1, so N can see the hats of people N+1 and N+2. First thing we need to do is have the two people at the back of the line know their hat colours:

First, person 1 volunteers to guess, and guesses the hat colour of person 3. Next, person 2 volunteers to guess and guesses the hat colour of person 4.

The adversary will ensure that both of those guesses were wrong, so now we have a new problem where the back two people know their own hat colour and we cannot afford anybody to guess wrong. Fortunately, they can now see the hat colour of person 5. They use that to decide who guesses next:

If person 5 has a white hat, person 3 guesses what he knows his own hat to be. If person 5 has a black hat, person 4 guesses what he knows his own hat colour to be.

They are no longer transmitting information by what they guess, it is instead done by who makes the guess. Naturally, whatever happens, the line will shift back and you have the two people at the back of the line with full knowledge of their own hat colour. The person at the very back guesses if the next "unknown" hat is white, and the second person from the back guesses if it is black. The solution works by induction, until only two people are left in the line, when they both know their own hat colour and both of them volunteer to guess (letting the adversary helplessly decide how it ends).

If one states the problem differently, so that instead of a "win-loss" scenario there was some sort of bet based on how many you got wrong, this solution also has the advantage that a mistake by one of the players only results in one extra person being wrong, rather than an entire chain.

Actually come to think of it, that would be a better way to word the problem, to minimize the number of people who guess wrong rather than saying outright how many you can afford. When I post the harder version of this problem I might word it that way.

Myopic Hats In A Line

That counter thing on the right for my blog posts seems to suggest that its been some time since I've had more than one post in a given month... I guess its time to do something about that. Anyway, this puzzle is one I found on the forums at xkcd:

One hundred people are standing in a line, each of them wearing either a black hat or a white hat. They all can see the hats on the two people directly in front of them, no hats behind them and not their own hat. Each round, a man in a black suit will ask each person in the line if they would like to guess their own hat colour now, the answer they give is secret, nobody else in the line can hear. The man will then select one of the people who volunteered and ask their hat colour. That person is then removed from the line and the line is shifted back a step to fill the hole. Everybody in the line is made aware of who was selected to guess their hat colour and what they guessed. The people may only guess "black" or "white". Then a new round starts with the man asking who would like to guess their hat colour now.

The players lose the game if ever more than two people get their hat colour wrong, or if during any round nobody volunteers to guess. The players win if everybody has finished guessing their hat colour and no more than two people are wrong. Before the hats are assigned the players may strategize, find a strategy that is certain to win assuming the man in black (who also assigned the hat colours) is an adversary.

Its some sort of variation on "standard line hat rules", the people are also short-sighted.

Math Is Hard

Apparently, blogging is also hard, given how often I actually do it. Anyway, time to finish off the second part of the tying strings problem.

First of all, its interesting to wonder why the method of calculating the average was worded the way it was. It was "Specifically, consider one were to select an initial piece of string and ask "what is the expected value of the length of the loop this piece of string will end up in?"". This is different than if it had been "Specifically, consider one were to randomly select a final loop from all of the loops and ask "what is the expected length of this loop?"". The first wording basically gives a heavier weighting to longer loops, and will have a larger answer. Also, it gives an answer with an actual clean mathematical form, rather than the second one which gives a total mess (feel free to work it out sometime, maybe somebody smarter than me can point out a nice form of it that I just missed).

Anyway, lets start by figuring out a few simple cases. Suppose we have N strings, and G(N) is the answer we seek. It is pretty easy to see that G(1)=1. For N=2, consider where the first end of one of the strings ends up. With 1/3 chance, it loops to itself, and we have two loops of length 1, with 2/3 chance it does not loop to itself and we have 1 loop of length 2. So,

G(2)=1/3 1+2/3 2
=1+2/3
=5/3

The 1+2/3 line is going to turn into an attempt to establish the overall pattern, the final loop the first string is in (which is as good as any other string) is at least length 1, and 2/3 of the time it is an extra length.

Next, N=3: the first end of the first string (which is the random string we are going to choose at the end) has a 1/5 chance of looping to itself. It has a 4/5 chance of not looping to itself and then a 1/3 chance of closing with length 2 and a 2/3 chance of closing with length 3. So we have

G(3)=1/5+4/5 1/3+4/5 2/3
=1+4/5+4/5 2/3
=7/3

The 1+4/5+4/5 2/3 is again an attempt at the overall pattern, the final loop of the first string starts at length 1, is 4/5 chance to be length +1 and is 4/5 2/3 chance to be another length +1.

We can already guess the overall formula, G(N)=1 + 2/3 (N-1), and if you check the next few terms you will find this still works. To prove it in general is quite the trick though. First, we can see the general formula for G(N):

G(N)=1+Σ Π (2N-2m)/(2N-2m+1)

where the sum has k from 1 to N-1 and the product has m from 1 to k. You can fiddle with this a bit using the formula:

Π (m-N)=Γ (k+1-N)/ Γ (1-N)

(product over m going from 1 to k) (Γ being the Gamma function), or you can just shove it all into Maple at once. Basically you get a bunch of garbage whatever you do.

After spending some time simplifying that garbage, you can find

G(N) = 1+2/3 (N-1)-1/(3 √π ) Γ (1/2-N)/Γ (1-N)

So, we see the term we want, and them some Gamma function garbage. This extra term shows up for a few reasons. First of all, in the first formula Π (m-N)=Γ (k+1-N)/ Γ (1-N), the math doesn't "know" if k is greater than N (in which case you need to get zero), or even if the things you are using are integers at all.

Anyway, in our case everything is an integer, so Γ (1/2-N) is simply a number one can find using the analytic continuation of the Gamma function, and Γ (1-N) is infinitely large, as Gamma has simple poles at every nonpositive integer. So the extra garbage term is just zero.

If one can find a way to solve this problem that doesn't involve the Gamma function, I would love to hear it.

And Then I Blogged Again

So, where did that month go anyway? Well, whatever, I hope the one or two people who still read this have had more than enough time to solve the tying strings problem.

The solution isn't particularly hard:

Consider we have N strings, let f(N) be the solution we seek. Select an end of one of the strings at random (it does not matter which one), and select another end. With 1/(2N-1) probability, they second end is on the same string as the first end, when you tie them together you get 1 loop, plus the problem for f(N-1). With (2N-2)/(2N-1) probability, the second end is on a different string, and when you tie them together you simply have made one string longer and have the f(N-1) case.

So, from this argument, we get the recursion relation for f:

f(N)=1/(2N-1)(f(N-1)+1)+(2N-2)/(2N-1)f(N)
f(N)=f(N-1)+1/(2N-1)

f(1) is 1, of course. f(2) is 1+1/3, f(3) is 1+1/3+1/5. It is a pretty easy thing to prove that:

f(N)=Σ 1/(2k-1)

where k goes from 1 to N.

I guess I'll post the next part of the problem now, in case I forget to blog for the next year:

How long is a loop on average? Specifically, consider one were to select an initial piece of string and ask "what is the expected value of the length of the loop this piece of string will end up in?".

Tying Strings

Man, I still blog, right? I think I had some feeling there was more to write about my last post, but I couldn't come up with anything so I just put it off. Now it seems apparent that that one is just plain done so its time to move on to something more interesting. So, new puzzle, I first learned this one on the forums at xkcd:

You have a bowl of N strings, each the same length. A string is a line segment with two indistinguishable ends, and all the strings are indistinguishable. You randomly select two ends of the strings from the bowl (they might be on the same string, they might not) and tie them together (fuse them, if you prefer). You repeat this process until there are no more ends left and you just have loops of string. On average, how many loops of string do you have?

There are other questions you can ask in this scenario, but to save myself blogging material I'll hold onto them for later.

Keep The Right One

Alright, time to finish off the stuff I've been writing about the two envelope paradox. The analysis last time showed that we had best assume that the smaller cheque is coming from a normalized distribution function P(x) with a finite expectation value (∫ xP(x)dx converges). Now, we must come up with a strategy on wether to keep the cheque or not, and this strategy can only be based on the value of the cheque that we see (known as A). Let us choose to accept the switch when cheque A has a value x with probability S(x). S(x) must be a function from the positive reals to [0,1], but is otherwise arbitrary. Now, we know that the expected gain from switching is E(B|A=x)-x (from last time), times Q(x)=(P(x)+P(x/2)/2)/2 (the distribution function on the cheque A), times S(x) (the chance we actually take the switch). So, the expected gain is

∫(E(B|A=x)-x) Q(x) S(x) dx
=1/2 ∫(xP(x)-x/4P(x/2))S(x) dx
=1/2 ∫xP(x)S(x)dx-1/2∫x/2P(x/2)S(x)d(x/2)

The split into two integrals is valid this time as ∫ xP(x)dx converges and the bounded function S(x) is not going to make that any worse. Now, change variables x/2 -> x in the second integral and recombine to get

=1/2∫xP(x)(S(x)-S(2x))dx

So we see that as long as S(x)-S(2x) is positive, we will have a positive expected gain for any distribution function P(x). Specifically you may choose any decreasing function to get this effect, like S(x)=1-tanh(x). In effect, this means that you are more likely to keep a larger valued cheque than a smaller cheque. This sort of solution also solves a similar game where I write down any two random numbers (from some probability distribution) on cheques and offer you a "keep or switch" scenario.

Anyway, one can also find that the optimal choice of S(x) is to set it to 1 whenever P(x)-1/4P(x/2)>0 and 0 whenever P(x)-1/4P(x/2)<0 (when P(x)-1/4P(x/2)=0, choose S(x) arbitrarily, it doesn't matter), however this assumes that you know the function P.

Wanting Too Much

Alright, time to go into a bit of detail about the two envelope paradox. Much of this analysis I actually originally read on a paper by David Chalmers. Anyway, the first thing to realize is that this paradox can show up even if the amount of money in the first envelope is taken from a well defined probability distribution. That is, suppose the smaller cheque has value between x and x+dx with probability P(x)dx, and the larger cheque is exactly twice the smaller one. Now, we have to ask the following thing: given that we have opened an envelope with value between y and y+dy, what are the odds that it is the smaller cheque? Well, if it is the smaller cheque, then there was a P(y)dy chance of it being written down, but if it is the larger cheque, then there was a P(y/2)d(y/2)=1/2 P(y/2)dy chance of it happening. Thus the ratio of probabilities is 2P(y):P(y/2), though one might naively have expected P(y):P(y/2). In particular, if the first cheque is randomly chosen from 0 to 10 with uniform distribution, and you see a cheque of value 2, then it is twice as likely that the other cheque has 4 than it has 1. This may seem nonintuitive, but this must occur in order to balance out the case that if you see a cheque with value over 10 and then the other cheque must be smaller. Note that this actually will make the initial paradox somewhat worse.
Now we can see that if the distribution function for the smaller cheque is P(x), then the distribution function for a random cheque will be Q(x)=(P(x)+P(x/2)/2)/2. The relative ratio between the two terms was derived last paragraph, and the overall factor of 1/2 comes from normalizing Q(x) assuming P(x) already was properly normalized. Let the first cheque be given a value A and the unknown cheque a value B (B=2A or B=A/2 always). Let E() be the expectation value operator, so that E(B) denotes the expectation value of B, and E(B|A=x) will denote the expectation value of B given that A is the specific value x. The paradox will be at its strongest if E(B|A=x) is greater than x for all values of x (and the paradox will still exist if that had been true on average). Now lets calculate how much money we expect to get by switching blindly without even looking at the value of the first cheque by integrating over the distribution function for the first cheque Q(A). The specific value of E(B|A=x) is (2xP(x)+x/2P(x/2)/2)/(P(x)+P(x/2)/2) being (value if B is larger)(chance B is larger)+(value if B is smaller)(chance B is smaller) divided by normalization, so E(B|A=x) = x/2(8P(x)+P(x/2))/(2P(x)+P(x/2)). Now, the expected gain for switching cheques blindly is:

∫ (E(B|A=x)-x) Q(x)dx
=∫ (x/2(8P(x)+P(x/2))/(2P(x)+P(x/2))-x)(2P(x)+P(x/2))/4 dx
=∫ (x(4P(x)+P(x/2)/2-2P(x)-P(x/2))/4) dx
=1/2 ∫ (xP(x)-x/4P(x/2)) dx

Next is the trick that I really appreciate about this entire thing, assuming that the integral ∫ xP(x) dx converges, we may break this up into two integrals to find

=1/2 ∫ xP(x)dx-1/2 ∫ x/2P(x/2)d(x/2)

Which is simply zero. Note that if P(x) was something like 1/x^2 from 1 to infinity, it will be a properly normalized distribution function, but the paradox will still arise. In order to avoid the paradox, the first cheque needs to have a finite expectation value, otherwise the paradox that E(B) is greater than E(A) isn't really a problem.
More simply put, given infinite expectations, any finite value is disappointing.
Anyway, back to the puzzle I posted last time, you may further assume that the distribution function that the cheques are chosen from have finite expectation value, there is still a strategy that works.

Standard Hat Rules Apply