Probability experiment

A Trillion Coin Flips Later, π Showed Up Anyway

a place where curiosity wins and productivity negotiates
“At some point this stopped being an estimate of pi and became a side quest to witness a 400-billion-flip sequence.” — Jason

Stewart Labs exists for exactly this kind of problem: a question too unnecessary to justify, too interesting to ignore, and just mathematical enough to quietly take over a week.

The idea started innocently. Flip a fair coin until the number of heads first becomes greater than the number of tails. That block of flips is one sequence. For each completed sequence, compute the fraction of heads in that sequence. Average those fractions. Multiply by four. The suspicion — admittedly a slightly ridiculous one — was that something like π might appear.

At first it looked like a Pi Day hallucination. Then it kept happening. Then the Catalan numbers showed up. Then the sequences became so absurdly long that the real experiment stopped being “does π appear?” and started becoming “how big can one of these monsters get before the laptop gives up?”

The setup

Each sequence follows one rule: stop the first time heads > tails. A one-flip sequence of H ends immediately. A sequence like THH ends on the third flip, because that is the first moment heads finally pull ahead. For a completed sequence of length 2n+1, the number of valid arrangements follows the Catalan numbers: 1, 1, 2, 5, 14, ....

That was the first clue that this was not just a cute numerical coincidence. The sequences are really lattice paths in disguise: constrained walks that refuse to let heads get ahead until the very last step.

Total batches
2,073
Completed sequences
4,340,000
Total flips
1,380,465,057,770
Final π estimate
3.141365387924
Absolute error
0.000227266
Batch std. dev.
0.027755

The final run ended with an estimate of 3.141365387924, which is only 0.000227266 away from π. Not bad for an estimator that behaves, at times, like it was designed specifically to waste CPU time.

Cumulative pi estimate chart
The cumulative estimate does eventually drift toward π, but it does so slowly and noisily.

The real story: monster sequences

The most fascinating part was not π. It was the tail.

Most sequences are tiny. Many end immediately. Some are modest. And then, every so often, one sequence simply keeps going. Those ultra-long excursions are the reason this estimator converges so painfully slowly. A gigantic sequence contributes a fraction that is basically just 0.5, but it may consume billions or even hundreds of billions of flips before it finally does.

Longest sequence batch
2,050
Longest sequence length
429,989,326,389
Heads
214,994,663,195
Tails
214,994,663,194

A single completed sequence lasted 429,989,326,389 flips before heads finally edged ahead by one. After all that drama, its contribution to the average was essentially just 0.500000000001.

Record progression of longest sequence chart
The record longest sequence climbed from ridiculous to fully unreasonable.

Why the batches mattered

A fixed number of flips turned out to be a bad way to run the experiment, because one unfinished monster sequence could swallow huge amounts of computation and contribute nothing yet. Batching by completed sequences made the experiment manageable and also exposed another truth: even after thousands of batches, the batch-level estimates remain noisy.

In other words, the global average behaves; the local batches absolutely do not.

Batch-level pi estimates chart
Individual batches wander all over the place. The miracle is not that any one batch is good. The miracle is that the cumulative average survives them.

What I learned

First, independently rediscovering an idea still counts as thinking. I did eventually learn that Jim Propp Estimating π with a Coin had already written up essentially this exact experiment, proving that the expected value really does produce π after the multiply-by-four step. That did not diminish the fun. It confirmed that the intuition was sound, the Catalan clue was real, and the absurd convergence behavior was not an illusion.

Second, this is a terrible way to compute π. It is slow, noisy, and apparently eager to generate random-walk monsters with the attitude of small planets. But as a mathematical toy, it is wonderful. It starts with coin flips and ends with combinatorics, stopping times, Catalan paths, and a stubborn appearance by one of the most famous constants in mathematics.

Finally, this is exactly the kind of thing Stewart Labs is for: not polished research, not practical engineering, just the quiet satisfaction of following an unnecessary question all the way to the point where it becomes interesting.

One trillion flips later, the verdict is simple: as a practical estimator, terrible; as an experiment, absolutely worth it.

← Back to Stewart Labs homepage