# Dzeng: Using predictive modeling to identify the variables in a batter’s swing decision

From Richard Dzeng at Baseball Prospectus on October 10, 2019:

Pitcher throws, batter swings. The fundamental unit of baseball events can be distilled into an interaction between the pitcher and the batter. These days, it seems most popular to optimize for end results such as home runs. While home runs are easy to quantify because of their direct affecting on the score, they’re generally difficult to predict because they are rare events. Out of about 717,000 pitches, only 2.2 percent of them result in good contact.

On the other hand, another logical place to start is at the beginning – when the batter decides to swing. Baseball can be broken down into a sequence of conditional probabilities. The first event is the pitcher throwing the ball with a given set of physical characteristics. Given that the pitcher throws the ball with certain physical characteristics, what is the probability that the batter will swing at the ball? If the batter swings, what’s the probability of hitting the ball? If he hits the ball, what are the probabilities of hit locations? Given the probability of hit locations, what’s the probability of an array of ball in play outcomes – so on and so forth until you reach the end of the game. Each piece of the probability chain can be modeled and optimized separately to produce a more accurate predictions at each step to improve the overall prediction.

Secondarily, predicting if/when a batter will swing can be used to evaluate a batter’s physical capabilities and decision making. The decisions that a pitcher and batter make is a game where the batter needs to be able to predict the location of the ball and combine it with the knowledge of their o