Ben-Porat: Data impossible: the minor-league strike zone

From Eli Ben-Porat at The Hardball Times on April 23, 2018:

Our mission, which we have chosen to accept, is to delve into new areas of baseball research and explore data sets that have yet to be explored. Today and tomorrow we will explore an old but never used data set, the minor league strike zone, to assess its quality. We’ll also leverage it to learn more about Aaron Nola, predict ground ball pitchers and predict home run hitters.

We baseball fans have been spoiled for many years now by the excellent and public PITCHf/x and StatCast data sets. As a sports fan and as a data analyst who is always on the hunt for new, cool data sets to play with, I’ve become a bigger fan of baseball, at the expense of the other major sports, because of these great public data.

Recently, I was curious to see if it was possible to turn the minor league GameDay strike zone data into actual usable data. MLB’s GameDay XML files detail every pitch thrown in affiliated ball, going all the way back to 2008. These pitches often are hand-coded on a screen to demarcate location, either in the strike zone, or in the playing field for batted balls. Needless to say, there are a lot of errata in these data, especially in the lower leagues or the farther back you go.

Read the full article here:

Originally published: April 27, 2018. Last Updated: April 27, 2018.