Bain: Park-neutral home runs

From SABR member Derek Bain at on February 6, 2017:

“The ways in which the ballparks alter the game and therefore the statistics of the players who play there are so massive that it is impossible to perceive the abilities of the players accurately without constantly adjusting the lens.” – Bill James, The Bill James Baseball Abstract 1983

The outcomes of batted balls in baseball are affected by many factors, including but not limited to the field dimensions (distances from home plate to the outfield fences), altitude, wind, weather, etc. In order to compensate for these effects to allow for comparison of individual player-seasons across time, it is necessary to compute park factors. Once park factors are determined we can essentially place all players on an equal (“neutral”) playing field. Although park effects can influence all batted balls and other events in any given contest, the ensuing analysis focuses solely on home runs.

Section 11.7 of “Analyzing Baseball Data with R” by Max Marchi and Jim Albert outlines a method for calculating park factors using event data from Retrosheet. I extended the R code to process the data for every player-season since 1930, with the following caveat:

“Event files for most seasons prior to 1974 are each missing a few games. For a list of the games that are missing (although some of them are in the event files with some innings reconstructed based on a partial game account and box score) see the Most Wanted List… Note that some games in our files have “99” for missing plays that were outs.” – Retrosheet Event Files

To reconcile the potential for missing data prior to 1974, I added a column to my data tables which indicates the difference between actual balls in play (BIP) for every batter compared to the number of balls of play listed in the Retrosheet file. The differences are computed as BIPdiff and are generally greater prior to 1950.

Originally published: February 7, 2017. Last Updated: February 7, 2017.