Forman: Where do baseball stats come from anyway?

From SABR member Sean Forman at LinkedIn on March 6, 2015:

Paul Erdős was one of the greatest mathematicians of the last century (famous enough to originate Erdős Numbers), along with being one of its most eccentric. He loved beautiful, simple proofs and talked about God having a book of proofs with all of the best and most direct proofs.

I left math nine years ago, so I probably wouldn’t understand any proofs in THE BOOK, but I would love to see God’s Baseball Encyclopedia. Close observers of our sites have probably noticed through the years that numbers, even as fundamental as strikeouts, runs batted in and hits, can change from time to time. As fans, reporters or pros, we want the stats to be THE STATS, but even if you exclude uncertainty related to scoring decisions and the like, historical statistics are just estimates (usually very, very good estimates) of what happened on the field.

This past week we had another such case, where Heinie Zimmerman received an updated RBI total and with it the 1912 NL Triple Crown. Changes such as this demonstrate how and why these changes occur.

There is a common misperception that the leagues have always kept meticulous records and that these records have been carefully maintained to where we are now. This is why I get angry e-mails about Ty Cobb’s hit total or Babe Ruth’s RBI total. But as Alan Schwarz pointed out in his very good book, The Numbers Game, the truth is far different.

Contemporary accountings of historical seasons (say pre-1970) are rife with errors. Pre-1905 or so the leagues were more concerned with staying in business than recording their history, so the job fell to the Chadwick’s, Reach’s and Spalding’s who produced annual guides to the baseball season. These books were quality efforts, but they were working with limited source material and of course have errors.

Read the full article here:

Related links:

Originally published: March 6, 2015. Last Updated: March 6, 2015.