Sending And Receiving

Alright, time for the solution to the Sam And Ray problem. First I'll do my usual thing of going through the thought process before getting to the solution, because I like hearing myself talk (or type, I suppose). (Edit after finishing this post: its crazy long and shows just how much I like hearing myself type)

Its pretty easy to see that right out you are down $2, and get to send a bit of information (I mean a technical "bit", not a small amount or something vague), and the real question is what do we do with this bit. We won't get very far trying to tell Ray what all the answers are ("We" is Sam, of course), because every bit costs us $2 and only makes $1. However, we can do something sort of neat, we can tell Ray what the most common one of the next three is. In that case we guarantee that Ray will get two of the three correct, and on the wrong one we can send out another bit. In the chance that there is no wrong one, we get three correct for only 1 bit of information. If we get two right and one wrong, we lose no money, and if we get 3 right at the cost of one bit, we gain $1. This gives us our starting strategy:
Sam tells Ray what the most common one of the next three will be, Ray guesses that for the next three and Sam guesses that when it will be correct. On the one that they get wrong Sam tells Ray what the most common one in the following block of three will be, and if there isn't one they get wrong, then they end off $1 ahead and quit.

Thats great if the numbers are assigned randomly, since a given block of three will all be the same with 1/4 probability, this will work within the first N blocks of three with probability 1-(3/4)N (proof left to the reader). So eventually, with probability 1, they will succeed if the list of numbers is infinitely long.

Of course, we might want a solution that is guaranteed to work in some finite amount of time, perhaps our players must specify beforehand how long they are going to play for. Or perhaps the numbers are being set by an all-knowing adversary, and we want a solution that works for all possible binary lists.

Alright, lets expand our solution. Rather than sending information about the most common one in the next three, lets do the next five. We will get 3 right and 2 wrong, costing $1 and our bit that we have sent, and we get to send 2 new bits. Of course, if we happen to get 4 right we gain $2 and still get to send a bit, and if we get 5 right we gain $4 at the cost of our bit, either of these situations is a gain.

So, using this, we can spend $1 and a bit to send 2 bits. This is much better than spending $2 to send a bit. Can we get any better with seven?

Sending information about seven bits, we get 4 right and 3 wrong, costing $2 and sending 3 bits at the cost of 1 bit. This is not any better than the five case, not really any worse overall, but perhaps we don't want to buy in bulk (actually, it comes out probabilistically better I think, since you are more likely to get 5 right and 2 wrong than 4 right and 1 wrong, but we already have a probabilistic solution anyway).

So, we have our information sending strategy:
Sam tells Ray the most common one in the next block of 5, Ray is to guess that for the entire block. On the ones they get two wrong, Sam will send out 1 bit of "useful information" and 1 bit telling the most common one in the next block of five. On the ones they get one wrong, Sam will send out a bit saying the most common one in the next block of five. On the ones they get all five right, Sam will just send additional data about the next block of five immediately after.

After Ray has reached the pre-agreed amount of data, the last bit Sam sends out does not say the next block of five, but is in fact the final bit of "useful information"

I realize that it is possible to handle the "get all five right" cases more elegantly, but those ones are just gravy anyway, and do not actually affect the solution.

This strategy gives Sam the ability to send Ray K bits of information at a cost of no more than $(K+1). Now what can we do with this information?

Clearly it is a bad idea to simply try to get a bunch right, as there is no way we can get K+1 answers correct with only sending K bits of information, but perhaps its is acceptable to get a few wrong in there, that might just give us enough wiggle room.

How about we try to get only one wrong in a list of N bits. Clearly, if getting one wrong is acceptable, then we can specify 3 bits using just 1 bit, as every three bit binary string is only one away from 000 and 111. This is a concept called Hamming distance, the number of bitflips needed to get from one binary string to another.

We seek to find a set S of binary strings of length N such that all binary strings of length N are within 1 Hamming distance from at least one member of S. (People familiar with Hamming codes can probably already see the solution now.) So, how do we do this? We want to be as efficient as possible, so we do not want to waste any elements of the set. If there were any binary string that was within distance 1 of two different members of S, we have wasted overlap.

An element of our set will cover N+1 elements (itself, and each of N individual bitflips), and we have 2N elements to cover. If we want exact coverage to be possible, we will have to demand that 2N/(N+1) be an integer, specifically there must exist an integer M such that N+1=2M. We see that S will then have 2N/(N+1) elements, so in terms of M the cardinality of S is |S|=2(2M-1-M) (if it exists, I haven't actually proven that yet). Lets look at a few simple cases first, cause thats how I roll.

For M=1, we have N=1 and |S|=1. Clearly all binary strings of length 1 are within distance 1 of either the string "0" or the string "1", so S does indeed exist here.

For M=2, we have N=3 and |S|=2. All binary strings of length 3 are within distance 1 of either 000 or 111, so using S={000, 111} gives our set.

For M=3, we have N=7 and |S|=16. Ok, S has gotten big pretty fast, before trying to find it, lets go back to Sam and Ray and make sure this will actually give us something.

If we have the set S for N=7, we could specify an element of it with 4 bits, costing us $5. This would allow us to specify a 7 bit string to within 1 bitflip. We would get 6 right and 1 wrong, making $4, thus we have lost $1. However, there is a ray of hope, we did get one of those wrong, so signal is still open. That signal could be used to specify the next list of 5 (as per our information sending strategy) and thus we can now successfully send out K bits of information at the cost of $K. Using our set S for N=7 again, we can specify an element of it using 4 bits costing us $4. We now get 6 right and 1 wrong again, losing us $4, so we have broken even. But once again, we still have signal, we can send a bit on the wrong one. Thus we finally get one right and win the game.

Not that it matters, but how many bits did it take to win? This took us 3 sets of 5 to specify 4 bits, then a set of 7 to use our set S, then another 3 sets of 5 to send out more information and then another set of 7. Also there is the first bit we send out, and the last one we get right, for 3x5+7+3x5+7+1+1=46 bits of information total. Actually a clever adversary might have us get some blocks of five totally correct to mess us up, but that just means we don't have to play for as long, so I think that doesn't make this any worse.

Anyway, before we move on to proving that S actually exists, lets try to see if larger sets can get us any further. Considering the case M=4, we have N=15 and |S|=211. Alright, I certainly hope we can find a general solution for how to find S now, because I am not in the mood to list out all of those.

We can specify an element of this S using 11 bits, costing $12 and we get 14 right and 1 wrong in a string of 15, making $12. Thus we break even and can send out signal on the wrong one, winning the game. The length of this strategy is 11x5+15+1+1=72. Oddly enough, the more convoluted strategy is shorter.

Alright, now to prove S exists, I will give the general method, and show with an example in the case N=7. N will be 2M-1, so each bit can be identified with a binary string of length M excluding the zero string. Note this can get a little confusing, so I will restate some of our notation here. We are attempting to construct S which contains binary strings of length N, while the n-th bit in a given string (n runs from 1 to N) is identified with a binary number that is, at most, M digits long.

We will call any bit that happens to have an n value that is a power of two (that is, its binary representation has exactly one 1) a "parity bit". Any bit with n value that is not exactly a power of two (and has more than one 1 in its binary representation) will be called a "data bit". Note that there are M parity bits and N-M data bits, and we know |S|=2(N-M).

Let the data bits run over all possible combinations, and let the parity bits be specified from the data bits as follows:
The parity bit corresponding to 2L will be given the value of the parity of all the bits whose n value contains a 1 in their 2L place.

Just to be specific, I should say that the parity of a list of bits is the value of XORing them together.

So, the first bit (with n=1=20) is a parity bit, it checks the parity of any data bit that has odd n value (the numbers with a 1 in the 20 place in their binary expansion).

The second bit (with n=2=21) is also a parity bit, it checks the parity of any data bit that have values such as 3,6,7,10,11,14,15...(the numbers with a 1 in the 21 place in their binary expansion).

The third bit is a data bit, its value runs over all possible values, as data bits do.

So, for the case N=7, the set S looks like
0000000
1101001
0101010
1000011
1001100
0100101
1100110
0001111
1110000
0011001
1011010
0110011
0111100
1010101
0010110
1111111

Note that bits 3,5,6,7 run over all possible values. Bit 1 checks the parity of bits 3,5,7. Bit 2 checks the parity of bits 3,6,7. Bit 4 checks the parity of bits 5,6,7.

Further note that every bit is being checked by a unique combination of at least two parity bits. Bit 5 is being checked by 4 and 1 (because 5 in binary is represented as 4+1) and only bit 5 is being checked by bits 4 and 1 alone. This idea will hold in general.

Now, we can specify an element of S uniquely just by giving its data bits (there are N-M data bits) and if we are given a binary string of length N, we can find an element of S that is different in no more than 1 spot through the following method: Find the element of S with the same data bits, it will differ in one or more parity bits. If it differs in one parity bit, we are done. If it differs in more than one parity bit, add of the n values of those parity bits to find a value j. You will find that if you flip data bit j, all of the parity bits will now be correct and you will have exactly an element of S. Thus you either differ in only one parity bit, or the incorrect parity bits specify what data bit is off from an element of S.

This gives us the general construction of the set S and completes the solution.

I am not sure if this is the most efficient solution, it might be possible to get a better solution using a Hamming distance of 2 for S rather than just 1. I suppose it is an interesting problem to try to construct those sets.

By the way, the elements of S are known as Hamming codes, and are very useful in coding theory.

Fun stuff

3 comments:

Anonymous said...

I got the probalistic coding system but coudln't think of a useful way to use the 'wasted' bits to send information. Although I thought of it, I figured it could be useful but my solutions works... in infinite time.


A friend of mine gave me this solution: on the first guess, Sam guesses a number who's binary representation is the arbitrarily long segment of the string starting at the 2nd bit.

Ray guesses the binary representation of Sam's number.
Sam does the correct string.

They quit after that or repeat the system for hilarious effect.

-!Bob

Anonymous said...

The problem never said Sam had to guess 1 or 0 but i feel it was implied.

-!Bob

Kory Stevens said...

Interesting point, I shall adjust the original problem so that Sam is unable to simply tell Ray what all the solutions are by guessing a single real number.