Does the Book Obey Benford's Law?
I extracted 1,245 numbers from the manuscript and checked whether the book follows the very law it explains. The answer is complicated—and revealing.
In Chapter 7 of The Math Inside the Machine, I tell the story of a curious pattern hiding in worn library pages. In 1881, the astronomer Simon Newcomb noticed that logarithm tables at the U.S. Naval Observatory showed uneven wear. The early pages—those for numbers starting with 1—were grimy and soft. The pages for 8s and 9s looked almost new.
Newcomb worked out a formula: the probability that a number’s first digit is d equals log(1 + 1/d). For digit 1, that’s about 30.1%. For digit 9, just 4.6%. He published two pages in the American Journal of Mathematics and was promptly forgotten.
Fifty-seven years later, a physicist named Frank Benford rediscovered the same pattern at General Electric. But Benford didn’t stop at intuition. He collected over 20,000 numbers from 20 different datasets—river surface areas, city populations, baseball statistics, numbers pulled from newspaper front pages—and showed that the pattern held everywhere. It got his name.
So I asked the obvious question: does the book itself obey the law it describes?
The Experiment
I wrote a script to extract every number from the manuscript’s LaTeX source—all fourteen chapters, from the prologue through the epilogue. Years, measurements, constants, counts, probabilities. Everything that qualifies as a number with a leading digit between 1 and 9.
The manuscript contains 1,245 qualifying numbers. When you plot their leading digits against Benford’s prediction, the overall shape is right: digit 1 dominates, and the distribution falls off toward 9. But the fit isn’t clean.
Digit 1 accounts for 44% of all leading digits—far above the 30% Benford predicts. The book has too many 1s.
Why the Book Cheats
The culprit is history.
A book about mathematics and its origins is, inevitably, a book about when things happened. Newton’s plague year: 1665. Leibniz’s notation: 1684. Benford’s own paper: 1938. Shannon’s information theory: 1948. The Pentium bug: 1994. ResNet: 2015. The manuscript contains 231 years, and almost all of them start with 1.
Strip out the years and the distribution tightens. Digit 1 drops from 44% to 40%. Digits 3 through 8 snap closer to their Benford targets. But even without years, the book still runs hot on 1s—because a book about mathematics naturally gravitates toward round powers of 10 (100, 1,000, 10,000), toward the constant 1 itself, and toward probabilities and percentages that cluster near small values.
This is exactly what Chapter 7 warns about. Benford’s Law emerges from data that spans many orders of magnitude through organic, multiplicative processes. River basins and city populations obey it because they grow through accumulation and compound change. But a curated set of numbers chosen by an author to illustrate mathematical concepts? That’s a designed collection, not an organic one. The book’s numbers carry the fingerprint of editorial choice.
The Deeper Connection
The chapter doesn’t end with Benford. It follows the logarithm from worn library pages to Claude Shannon’s 1948 paper at Bell Labs, where he asked a question that changed everything: how do you measure information?
Shannon’s answer was the logarithm again. The information content of an event with probability p is -log(p). Likely events carry little information. Unlikely events carry a lot. He called it surprisal.
That same formula—negative log of a probability—is the loss function used to train every modern language model. When a transformer predicts the next word and the actual word arrives, the training signal is -log(p), where p is the probability the model assigned to the correct answer. High confidence in the right word? Low loss. Confident and wrong? The logarithm sends the penalty toward infinity.
The logarithm that makes 1 the most common leading digit in nature is the same logarithm that teaches a language model how wrong it is. Benford’s Law and cross-entropy loss are different faces of the same mathematical truth: proportional changes matter more than absolute ones, and the logarithm is how you measure them.
If you want the full story—Newcomb’s worn pages, Benford’s 20,000 numbers, Shannon’s unicycle rides down the halls of Bell Labs, and how it all connects to the machine you’re probably reading this on—it’s in Chapter 7.