Cards and Birthdays

Time for an aside from puzzles that I got from "Games People Don't Play", and an aside from puzzles in general. This post is going to be about the birthday paradox and human intuition on probability.

You are probably familiar with the birthday paradox, but I will state it anyway, just to be clear. The paradox (as with most "paradoxes" it isn't really a paradox, but just something unexpected) is the answer to the question:
How many people do you need to have in a room before there is a greater than 50% chance that two of them have the same birthday?

If you ask this question of an average person, they will probably state something like 100 or so. If they have some basic math behind them, they might guess 365/2, so about 180. The actual answer is around 20 (i'll derive it later). The basic misunderstanding comes from the difference from the question posed and the following question:
How many people do you need to have in a room before there is a greater than 50% chance that one of them shares your birthday?

Note the difference, in the first question you just needed any two people to share a birthday, in the second question you needed somebody to land on a specific birthday. This results in a big difference since in the second case each person has a 1/365 chance of having your birthday, but in the first one each person has a k/365 chance of landing on somebodys birthday, where k is the number of people in the room before them.

I always say that humans have no good sense of probability, evolution never gave us a need to understand things like the Monty Hall scenario on an intuitive level. We also have a bad habit of noticing patterns where none exist, another byproduct of our evolutionary history. It is fortunate that we have developed the mathematical capabilities to handle probability for us, or science would never have gone anywhere.

The birthday paradox came up in a conversation I was having with my officemate the other day. He found a site that asked the question
What are the odds that you will ever see a shuffled deck of cards in the same order twice in your life?

Ignoring the fact that this question isn't really well-posed, it is something of an interesting question. The site basically gave an argument involving 52! (the number of ways to arrange the cards in the deck) and concluded the answer was "very small". Actually, I guess most of the post was on deriving 52! rather than the analysis of what to do with that number. My instinct was to agree with this conclusion, but then a part of my brain said "birthday paradox" and I had to wonder. The other part of me saying back "52! is like 1065" was a persuasive counterargument though. It basically depended on what the 20 was to the 365 in the birthday paradox... was it the logarithm? the root? just plain old linear with a coefficient? seems it is time to do the calculation.

Suppose we have a list of N outcomes of some experiment, all equally likely (N=365 for the birthday paradox, or N=52! for this card problem) and we check the experiment k times. What are the odds that we see the same result twice? Alternately, how large does k have to be for the probability we have seen the same result twice to be greater than P?

For the card situation lets assume that k is about 30000, this is an approximation of checking once a day for 80 years. So for approximation purposes, N>>k>>1. Now, what is the probability that none of our experiments ever get the same result? the chance that the first result is new is 1 (or N/N), the chance the second result is new is (N-1)/N, the chance the third one is new is (N-2)/N and so on up to the chance that the kth one is new is (N-k+1)/N. So the probability that we never see the same one twice is given by
p=N!/((N-k)!Nk)

We can use Stirling's approximation on this (since Stirling's approximation for n! is embarrassingly good by the time n is 10) to get
p=NNe-N√(2π N)/((N-k)N-ke-N+k√(2π (N-k))Nk)
=(1-k/N)k-N-1/2e-k

This is basically in a form that is ready to solve for the birthday paradox. Setting N=365 you find that for k=20, p=0.589 (remember p is the chance that none of them share a birthday) for k=21 p=0.556, for k=22 p=0.524 for k=23 p=0.492. So we see that at 23 people it is more than 50% that two of them share a birthday.

In the case of N=52! we need a few more approximations. I initially tried to use the fact that (1+x)m=1+mx for small x (since k/N is certainly small) but there is a problem with that. Writing out the next term it is actually (1+x)m=1+mx+m(m-1)x2/2, so if m is as big as x is small (specifically m2x2 is not negligible compared to mx) then this is not going to be valid. For our problem, x=k/N and m=k-N-1/2, so m2x2 is like k2 which is not small at all.

We can approximate the term (1-k/N)k-1/2 as 1-k(k-1/2)/N as long as the next term is small. The terms look like 1, k2/N and k4/N2, so the third term is small compared to the second as long as k2 is small compared to N. This is true for our case, but I'll come back to that later.

Given the approximation (1-k/N)k-1/2 = 1-k2/N, we have
p=(1-k/N)-Ne-k(1-k2/N)

We need to figure out what to do with (1-k/N)-N, which is a number very slightly smaller than 1 to a very large negative power, so its a bit unpredictable. If you remember your first year calculus course, you will see quickly what to do, the limit as m goes to infinity of (1+x/m)m is ex (this is sometimes a definition of the exponential function). So we may approximate (1-k/N)-N as ek, giving us
p=1-k2/N

Thus, the probability that you see the same event twice is approximately k2/N. For k=30000 and N=52! this is about 10-59, so we can confidently say it won't happen. Turns out the human intuition is correct in this case, the birthday paradox won't overcome crazy scenarios like 52!.

Asking the other question of "how large does k have to be to achieve some fixed P" like the birthday paradox's 50%, the answer is a bit funny. You find that to achieve a 50% chance of seeing the same result twice you get k=√(N/2), so it would appear that the birthday paradox goes as the square root on N, rather than the logarithm or something else. However, this is technically mistaken, since we needed k2 to be much smaller than N in our approximations, this answer is invalid. It is correct to say that the birthday paradox asymptotes to something similar to √N, since if it is much weaker than that (like logarithm or cube root or something) our approximations become good again and you get the answer of square root.

No comments: