Mains: The basics of historical baseball data analysis

From SABR member Rob Mains at Baseball Prospectus on November 26, 2019:

I’ll assume some of you can’t wait to escape the Thanksgiving dinner table. You probably have a book to read, or episodes of Mrs. Maisel to watch, or a friend’s apartment to go to. But maybe you don’t. Maybe your heart is set on doing historical baseball data analysis.

I do a lot of it here. Our sport lends itself to it. It’s tallied numbers since its inception. We can look at those numbers to identify trends. For example, here’s one: Strikeouts by year.

Having 149 years of data is, to be sure, pretty remarkable. But that chart incorporates a lot of noise. I’m going to show you how I filter it out.

Until the American League began in 1901, the number of teams bounced around a lot. So when I do my research, I concentrate on, at most, the years beginning in 1901. (Sorry, 19th century aficionados.) That’s what I’ll be doing for the rest of this article. Additionally, the Federal League in 1914-15 was considered major league, causing a spike. Strikes, especially the long ones in 1981 and 1994, caused dips.

And that’s not all. For 57 years in the last century, the standard schedule was 154 games. That rose to 162 in the American League in 1961 and in the National League in 1962. And, of course, the number of teams expanded, from 16 from 1901 to 1960, by way of 18 (starting in 1961), 20 (1962), 24 (1969), 26 (1977), 28 (1993), and, starting in 1998, 30.

Read the full article here:

This page was last updated November 26, 2019 at 3:51 pm MST.