The Chow & Robbins Problem: Stop at h=5 t=3

Julian D. A. Wiseman

Contents: Introduction, Whether to stop at h=5 t=3, Failure to find a better estimate of the value of the game, Afterword.

Publication history: only here. Usual disclaimer and copyright terms apply.

• Introduction

1. The Chow & Robbins problem asks for the fair value of a game, in which tosses a coin, at least once, until choosing to stop, when one scores the proportion of tosses that are heads. An essay of late 2005, The expected value of s_n/n ≈ 0.79295350640…, improved the best known estimate of the value of the game to ≈11½ decimal places. But it did not address any outstanding questions about possible stopping states.

Luis A. Medina & Doron Zeilberger, in ‘An Experimental Mathematics Perspective on the Old, and still Open, Question of When To Stop?’, arXiv:0907.0032v1, ask:

If currently you have five heads and three tails, should you stop?

If you stop, you can definitely collect 5/8 = 0.625, whereas if you keep going, your expected gain is > 0.6235, but no one currently knows to prove that it would not eventually exceeds 5/8 (even though this seems very unlikely, judging by numerical heuristics).

The author provides strong evidence that at h=5 t=4 the value of the game is ≈ 0.580572488…, and hence the value from continuing from h=5 t=3 is the average of this and 6÷9, that average being ≈ 0.623619577, which is significantly less than 0.625. Further, it seems likely that the error in the estimate of EV_∞(h=5, t=4) is less than 10⁻¹⁰, so h=5 t=3 really is a stopping state. These further heuristics are consistent with the existing “numerical heuristics” cited by Medina & Zeilberger.

2. The previous code was constrained by memory. Code has been rewritten to remove this constraint. Extending the calculation by a factor of 4 has not improved the accuracy to which the value of the game is known. Current constraints definitely include CPU time, and also perhaps the accuracy of the hardware’s floating-point arithmetic.

• 1. Whether to stop at h=5 t=3

With five heads and three tails, should one stop (scoring ⅝ = 0.625), or should one continue? Rephrased, is 0.625 ≥ ½(EV_∞(6,3) + EV_∞(5,4))? For n as large as has been tested, h=5 t=3 is a stopping state, so it is safe to assume that, with an extra head, so at h=6 t=3, one should definitely stop, there scoring 6/(6+3) = ⅔ ≈ 0.666666. So the question is equivalent to asking whether EV_∞(5,4) ≤ ⁷⁄12 ≈ 0.583333.

The same technique previously used for the value of the whole game can be used to value the game at h=5 t=4. Results are output in the table that follows (new code available from jdawiseman.com/papers/easymath/chow_robbins.c; calculations run on a 1.83 GHz Intel Core 2 Duo processor under Mac OSX 10.6.4 using Xcode 3.2.3).

i	n = 2ⁱ	EV_n(h=5, t=4)	Diff EV	Ratio Diff EV	Estimated limit = EV + Diff / (¹⁄Ratio − 1)	Diff Estimated limit	Diff Estimated limit × 2ⁱ
5	32	0.576497675084966
6	64	0.578233093000500	0.001735417916
7	128	0.579196754382963	0.000963661382	0.55529067	0.58040003964231
8	256	0.579752779495916	0.000556025113	0.57699221	0.58051121035538	0.000111170713075	0.02845970
9	512	0.580078876181192	0.000326096685	0.58647834	0.58054136378755	0.000030153432162	0.01543856
10	1,024	0.580274733227757	0.000195857047	0.60061036	0.58056926708142	0.000027903293872	0.02857297
11	2,048	0.580393564822555	0.000118831595	0.60672617	0.58057689315882	0.000007626077403	0.01561821
12	4,096	0.580464868562560	0.000071303740	0.60004025	0.58057184211230	−0.000005051046521	−0.02068909
13	8,192	0.580507696666527	0.000042828104	0.60064316	0.58057211125895	0.000000269146652	0.00220485
14	16,384	0.580533458617586	0.000025761951	0.60151977	0.58057234717824	0.000000235919284	0.00386530
15	32,768	0.580548981018557	0.000015522401	0.60253204	0.58057251183093	0.000000164652695	0.00539534
16	65,536	0.580558328857386	0.000009347839	0.60221604	0.58057248080675	−0.000000031024181	−0.00203320
17	131,072	0.580563959800812	0.000005630943	0.60237917	0.58057249044815	0.000000009641397	0.00126372
18	262,144	0.580567350923834	0.000003391123	0.60222999	0.58057248513698	−0.000000005311168	−0.00139229
19	524,288	0.580569393522691	0.000002042599	0.60233700	0.58057248743098	0.000000002293999	0.00120272
20	1,048,576	0.580570624044048	0.000001230521	0.60242928	0.58057248862324	0.000000001192268	0.00125018
21	2,097,152	0.580571365285460	0.000000741241	0.60237997	0.58057248823938	−0.000000000383866	−0.00080503
22	4,194,304	0.580571811863659	0.000000446578	0.60247335	0.58057248867732	0.000000000437946	0.00183688
23	8,388,608	0.580572080882894	0.000000269019	0.60240118	0.58057248847340	−0.000000000203920	−0.00171060
24	16,777,216	0.580572242938665	0.000000162056	0.60239474	0.58057248846243	−0.000000000010969	−0.00018403
25	33,554,432	0.580572340562639	0.000000097624	0.60240973	0.58057248847781	0.000000000015373	0.00051584
26	67,108,864	0.580572399372356	0.000000058810	0.60241061	0.58057248847835	0.000000000000542	0.00003640
27	134,217,728	0.580572434800094	0.000000035428	0.60241300	0.58057248847924	0.000000000000888	0.00011919
28	268,435,456	0.580572456142390	0.000000021342	0.60241766	0.58057248848028	0.000000000001046	0.00028071

This table resembles those in the 2005 essay, columns having meanings as follows.

The first column contains i, a row counter,
and the second contains n = 2ⁱ, being max_t;
The third contains the computed value of EV_n(5,4), to fifteen decimal places. These increase with i, as expected.
The fourth column shows the increase between adjacent rows, these increases being positive, but decreasing.
The fifth column has the ratio between successive increases. This appears to be tending to a constant limit, unsurprisingly very close to the limit of the ratios for EV_n(0,0) shown in the second table.
If this ratio does indeed reach a constant limit, then the limiting value can be estimated as the current bound, plus the increase divided by one less than the reciprocal of the ratio, which is shown in the sixth column. This appears to be tending to 0.580572488…, the next digit probably being 4 or 5.
The seventh column contains the differences between the estimates of the limit, which are of generally decreasing magnitude but non-constant sign.
The eighth shows the same difference, × n. (Reminder: n = 2ⁱ.) The absolute size of the changes in the estimates of the limit are a small and decreasing proportion of ¹⁄n.

More on the eighth column. The absolute differences between consecutive estimates of the limit are a (mostly) decreasing proportion of 2⁻ⁱ. Let’s take them at their largest: assume that they are all positive, and that they are all equal to 2⁻ⁱ. This assumes to unity the coefficient which is already <10⁻³, and generally of decreasing magnitude, with non-constant sign. Then these assumed future changes, over all not-yet-computed rows, total only 2^(−n_max). Despite these overestimates of the possible error, this total is much smaller than the gap between ⁷⁄12 and the best estimate of the limit.

Number of ‘standard deviations’ to reach 7/12

As an alternative, let us assume that the changes in the estimate of the limit are happening pseudo-randomly, changes being centred on 0, and with standard deviation something the absolute value of the change in the estimate of the limit. (It doesn’t matter if this doesn’t hold for all rows: for some rows suffices.) Also assume that this will diminish by only a factor of 2 with each doubling of n, even though the eighth column suggests that the diminishment is slightly faster. Hence, summed over all future rows, the standard deviation of remaining distance of travel is of a similar order of magnitude to this most recent change. The chart on the right shows the distance to ⁷⁄12 as a multiple of this assumed σ. And behold! The distance is a huge and increasing number of standard deviations. (A one-billion standard deviation move happens, in a normal world, one time in ≈10^2×10¹⁸.) So, relative to the distance that needs to be travelled the changes in the estimated limit are just too small and diminishing too fast: it just isn’t happening.

Note that, for i ≥12, (⁷⁄12−Estimated limit) ≈ 0.002761, so this chart quickly becomes proportional to the reciprocal of the change in the estimated limit. In some sense, the whole point is the immobility of (⁷⁄12−Estimated limit) relative to the magnitude of changes in EV_n(5,4).

• 2. Failure to find a better estimate of the value of the game

When executing the 2005 version of the code, memory was a slighter tighter constraint than CPU time. This problem was fixable, as then acknowledged:

It is possible to rewrite the program such that the memory usage is linear in the widest gap between the red and grey lines (rather than being linear in n), albeit at the price of a loss of readability. This recoding has not yet been attempted.

This problem has been resolved in the new code version, reducing memory usage by a large factor. The following table shows results for n = 2ⁱ.

i	n = 2ⁱ	EV_n(0,0)	Diff EV	Ratio Diff EV	Estimated limit = EV + Diff / (¹⁄Ratio − 1)	Approx max j	Approx t at which j maximal
1	2	0.770833333333333
2	4	0.779947916666666	0.009114583333
3	8	0.785247657412574	0.005299740746	0.58145727	0.79261028151309
4	16	0.788190898971545	0.002943241559	0.55535576	0.79186697529802
5	32	0.790023224424476	0.001832325453	0.62255354	0.79304542974422
6	64	0.791196842160634	0.001173617736	0.64050725	0.79328787366908
7	128	0.791897537079360	0.000700694919	0.59703845	0.79293570515041
8	256	0.792318224291505	0.000420687212	0.60038570	0.79295027021847
9	512	0.792570438324007	0.000252214033	0.59952864	0.79294801722604
10	1,024	0.792722493847599	0.000152055524	0.60288289	0.79295333676356
11	2,048	0.792814414356913	0.000091920509	0.60451937	0.79295492118365
12	4,096	0.792869752675300	0.000055338318	0.60202363	0.79295346361236
13	8,192	0.792903056616781	0.000033303941	0.60182424	0.79295339398384
14	16,384	0.792923109615955	0.000020052999	0.60212090	0.79295345634652
15	32,768	0.792935195956992	0.000012086341	0.60271987	0.79295353233307
16	65,536	0.792942476379465	0.000007280422	0.60236778	0.79295350539518
17	131,072	0.792946862428220	0.000004386049	0.60244426	0.79295350891743
18	262,144	0.792949504079600	0.000002641651	0.60228500	0.79295350449952
19	524,288	0.792951095314407	0.000001591235	0.60236367	0.79295350581352
20	1,048,576	0.792952053932079	0.000000958618	0.60243634	0.79295350654503
21	2,097,152	0.792952631391198	0.000000577459	0.60238731	0.79295350624769
22	4,194,304	0.792952979294206	0.000000347903	0.60247210	0.79295350655746
23	8,388,608	0.792953188872027	0.000000209578	0.60240302	0.79295350640540	25689
24	16,777,216	0.792953315120851	0.000000126249	0.60239592	0.79295350639599	35349	3×10⁵
25	33,554,432	0.792953391174430	0.000000076054	0.60241019	0.79295350640739	48571	6×10⁵
26	67,108,864	0.792953436989929	0.000000045815	0.60241083	0.79295350640770	66624	14×10⁵
27	134,217,728	0.792953464589786	0.000000027600	0.60241310	0.79295350640835	91212	30×10⁵
28	268,435,456	0.792953481216427	0.000000016627	0.60241767	0.79295350640915	124599	646×10⁴

The first six columns have meaning similar to those in the h=5 t=3 table.

For each t, calculations are performed only over ‘useful’ h. The seventh column shows, for some n, an approximation to the largest number of useful h, that is, the largest value of j.
The eighth column shows, again approximately, the t with this maximal j.

These columns might be of interest to those attempting further optimisations of the code.

Alas the results have not improved the estimate of the expected value of the whole game. The author also suspects that the accuracy of the C double might be insufficient, or nearly so.

• Conclusion

With five heads and three tails, stop, thereby scoring 0.625. Choosing to continue lowers the expected payoff to 0.62361957757…, with the next digit probably being close to ‘3’.

Julian D. A. Wiseman
September 2010

Afterword

In January 2012 this page was cited (albeit with a punctuation problem in the URL): Olle Häggström and Johan Wästlund, ‘Rigorous computer analysis of the Chow-Robbins game’, arXiv:1201.0626v1 leading to PDF.

Main index

Top

About J. D. A. Wiseman