Probability experiment
A Trillion Coin Flips Later, π Showed Up Anyway
Stewart Labs exists for exactly this kind of problem: a question too unnecessary to justify, too interesting to ignore, and just mathematical enough to quietly take over a week.
The idea started innocently. Flip a fair coin until the number of heads first becomes greater than the number of tails. That block of flips is one sequence. For each completed sequence, compute the fraction of heads in that sequence. Average those fractions. Multiply by four. The suspicion — admittedly a slightly ridiculous one — was that something like π might appear.
At first it looked like a Pi Day hallucination. Then it kept happening. Then the Catalan numbers showed up. Then the sequences became so absurdly long that the real experiment stopped being “does π appear?” and started becoming “how big can one of these monsters get before the laptop gives up?”
The setup
Each sequence follows one rule: stop the first time heads > tails. A one-flip sequence of H ends immediately. A sequence like THH ends on the third flip, because that is the first moment heads finally pull ahead. For a completed sequence of length 2n+1, the number of valid arrangements follows the Catalan numbers: 1, 1, 2, 5, 14, ....
That was the first clue that this was not just a cute numerical coincidence. The sequences are really lattice paths in disguise: constrained walks that refuse to let heads get ahead until the very last step.
The final run ended with an estimate of 3.141365387924, which is only 0.000227266 away from π. Not bad for an estimator that behaves, at times, like it was designed specifically to waste CPU time.
The real story: monster sequences
The most fascinating part was not π. It was the tail.
Most sequences are tiny. Many end immediately. Some are modest. And then, every so often, one sequence simply keeps going. Those ultra-long excursions are the reason this estimator converges so painfully slowly. A gigantic sequence contributes a fraction that is basically just 0.5, but it may consume billions or even hundreds of billions of flips before it finally does.
A single completed sequence lasted 429,989,326,389 flips before heads finally edged ahead by one. After all that drama, its contribution to the average was essentially just 0.500000000001.
Why the batches mattered
A fixed number of flips turned out to be a bad way to run the experiment, because one unfinished monster sequence could swallow huge amounts of computation and contribute nothing yet. Batching by completed sequences made the experiment manageable and also exposed another truth: even after thousands of batches, the batch-level estimates remain noisy.
In other words, the global average behaves; the local batches absolutely do not.
What I learned
First, independently rediscovering an idea still counts as thinking. I did eventually learn that Jim Propp Estimating π with a Coin had already written up essentially this exact experiment, proving that the expected value really does produce π after the multiply-by-four step. That did not diminish the fun. It confirmed that the intuition was sound, the Catalan clue was real, and the absurd convergence behavior was not an illusion.
Second, this is a terrible way to compute π. It is slow, noisy, and apparently eager to generate random-walk monsters with the attitude of small planets. But as a mathematical toy, it is wonderful. It starts with coin flips and ends with combinatorics, stopping times, Catalan paths, and a stubborn appearance by one of the most famous constants in mathematics.
Finally, this is exactly the kind of thing Stewart Labs is for: not polished research, not practical engineering, just the quiet satisfaction of following an unnecessary question all the way to the point where it becomes interesting.
One trillion flips later, the verdict is simple: as a practical estimator, terrible; as an experiment, absolutely worth it.