SABR Baseball Research Journal, Fall 2025

Hitter and Catcher Adaptation in Major League Baseball

This article was written by Charles Steinhardt - Zach Borowiak

This article was published in Fall 2025 Baseball Research Journal


INTRODUCTION

In the early eras of baseball, the most important pitching statistic was wins, and the best pitchers were considered those who won the most. However, much of what determines the winning pitcher, most notably run support, is out of that pitcher’s control. Over time, an improved understanding of pitching has led to increasingly accurate measures of pitcher performance and skill.

The earliest improvements were focused on runs allowed, and then ERA so that the pitcher was not punished by fielding errors. Approximately 25 years ago, it was discovered that much of what had been attributed to pitching was really fielding. This led to new methods such as defense-independent [sic] pitching statistics (DIPS) and fielding independent pitching (FIP), evaluating pitchers based on only strikeouts, walks, and home runs, the outcomes under their direct control. Thus, not only the luck of run support, but now fielding and BABIP luck were removed as not being predictive.

Even these outcomes, however, continue to have a luck component. Although home run rate is a significant component of FIP, there is considerable luck involved in giving up a home run. Most bad pitches, even to excellent hitters, do not result in home runs. And sometimes a hitter manages to hit a home run on an excellent pitch. Presumably this sort of hitter luck is not predictive of future pitcher outcomes.

For the past decade, MLB has measured and released pitch trajectories and related information through its Statcast database. Based on these data, methods have been developed to try to remove the luck of the hitter in a manner similar to the way that the fielding luck was removed. Instead of looking at whether a given pitch was actually hit for a home run, instead one can look at the pitch trajectory and ask how often a similar pitch would be hit for a home run. That is, if one assumes that the throwing a pitch on a specific trajectory is a repeatable skill (an assertion supported strongly by Statcast tracking), then pitchers can be evaluated on whether the set of pitches they throw should typically lead to good outcomes. Several versions of this pitch scoring have been developed in the past few years, such as Pitching+.1

In this work, a similar set of techniques is used as a starting point to consider the value and skill of hitters. In Section 2, a pitch score is described which estimates the result that would be obtained by an average MLB hitter against that particular pitch in a manner similar to previous work. In Section 3, a hitter score is then defined as the difference between actual hitter outcomes (using an expected wOBA based on trajectory for balls in play to minimize the effects of defensive luck) and expected outcomes for an average hitter facing the same set of pitches. Based on these hitter scores, in Section 4, the origin of the previously documented effect that pitchers are less effective during their time through the order is shown. An additional result in Section 5 is evidence that hitters guess, so that pitch calling and sequencing represents a potentially important catcher skill. Thus, one can also define a catcher score, where a good catcher produces weaker hitter outcomes than would be expected given the pitch trajectory and hitter skill. In Section 6, it is shown that catcher fielding is several times more important than previously measured, likely primarily due to hitter adjustments to the expanded strike zones produced by catchers with good framing ability. The implications of these results are discussed in Section 7.

PITCH SCORES IN MLB

The pitch scores used here, and other metrics derived from them, were developed independently using a procedure similar to those described in previous metrics such as Pitching+. The pitch score model is derived from over 3.5 million pitches in MLB Statcast from 2019 to 2024.

First, each pitch is given a pitch score based on its true outcome, expressed as the difference in the run expectancy pre-pitch and post-pitch. In order to reduce the effect of defense luck, any ball in play is given a score based on the Statcast expected wOBA (weighted on-base average) derived from a model trained on launch angle, velocity, and direction rather than the true wOBA. For a pitch not put into play, the value of a given count is based on the league average wOBA across all place appearances going through that count. For example, using the table from, if a 1–1 pitch (pre-pitch expected wOBA 0.313) is taken for a ball, the new expected wOBA is 0.059 higher, so the pitch score would be -0.046 runs for a wOBA value of 1.27.2 If the ball is put into play on a trajectory with a Statcast expected wOBA of 0.201, then the pitch result wOBA is 0.112 lower than expected pre-pitch, so the pitch score would be +0.088 runs. The actual number of outs and set of baserunners was found to have negligible impact on the relationship between pitch trajectory and pitch scores, so they are ignored here.

Using XGBoost, a decision-tree based regressor machine learning tool, over 3.5 million pitches in MLB Statcast from 2019 to 2024 were divided into a training set and a testing set to create a pitch score predictor.3 The model predicts a pitch score from the following Statcast data: count (balls, strikes); location (plate_x and a rescaled zone_z relative to the hitter’s strike zone height); pitcher and hitter handedness; reported pitch type; and trajectory information (pitch velocity, spin rate, spin axis and horizontal and vertical breaks). As with Pitching+, these are divided into separate stuff and location scores, then combined to produce a total pitch score.

RELEASE POINTS AND OVERFITTING

The most significant difference in methodology between this and previous metrics is that release point information is not explicitly used. The metric does have limited release point information, since it can in principle be derived from the final pitch location and the trajectory, but the exact release point is somewhat obscured.

It is established that including the release point can reduce the variance between model and outcome. That is, throwing an identical pitch from a different release point will truly produce a different expected outcome. For that matter, release points for individual pitches by the same pitcher should not be considered independently, since pitch “tunneling” has been shown to improve pitcher outcomes.4

In MLB data, there are few enough pitchers that in many cases release points essentially specify the pitcher. That is, most sliders from Jacob deGrom’s release point in 2019 were in fact thrown by deGrom, one of the most effective pitchers in 2019. Thus, a model which, based on the training sample, simply assigns any slider from that release point a high pitch score would do well on the 2019 test sample. However, if an ineffective pitcher adjusted their release point to match deGrom’s while making no other changes, they would still be ineffective and the model would make poor predictions. Thus, for this work it was decided to omit release point information to avoid overtraining. With a larger pool of pitchers to draw from with the same data and calibration quality, such as pitchers from minor league and international play, including release point would likely improve the model for the same reason that it is important in Stuff+.

OVERVIEW

The primary results are similar to those found with previous models, including those with a release point component. For example, the most effective pitches are, as might be expected, those near the edges of the strike zone (Figure 1A). This is true even though when the ball is put in play, hitters are successful against pitches along a broad stripe (Figure 1B) that includes not just the center of the strike zone, but two corners as well. This stripe runs at approximately a fixed distance from the hitter’s shoulders, suggesting that it corresponds to locations that would line up with the barrel of the bat along a natural swing path.

 

Figure 1A. Average pitch score as a function of location for all pitches thrown to RHB in 2024, from the catcher’s point of view and scaled to the height of the strike zone. Combining all pitches thrown vs. RHB, pitch scores are the highest on the edges of the strike zone, as might be expected.

Figure 1A. Average pitch score as a function of location for all pitches thrown to RHB in 2024, from the catcher’s point of view and scaled to the height of the strike zone. Combining all pitches thrown vs. RHB, pitch scores are the highest on the edges of the strike zone, as might be expected.

 

Figure 1B. Average expected wOBA for pitches producing contact. Hitters produce strong results when putting the ball in play not just in the middle of zone, but along a stripe at approximately fixed distance from the batter’s shoulders. As might be expected, higher pitch scores are associated with the locations more likely to produce swings while avoiding high-wOBA contact.

Figure 1B. Average expected wOBA for pitches producing contact. Hitters produce strong results when putting the ball in play not just in the middle of zone, but along a stripe at approximately fixed distance from the batter’s shoulders. As might be expected, higher pitch scores are associated with the locations more likely to produce swings while avoiding high-wOBA contact.

 

However, for the best and worst pitchers, the stuff score is typically more important than location. Thus, a pitcher with strong stuff can have positive pitch scores for nearly all locations (Figure 2). These conclusions are similar to previous results using comparable techniques. The remaining sections will use these pitch scores to produce novel results by considering the effects of hitters and catchers.

 

Figure 2. Overall average pitch score for Devin Williams in 2023 as a function of location. Even pitches in the center of the strike zone, with significantly negative location scores, produced positive pitch scores due to the excellent stuff score produced by a combination of velocity and movement.

Figure 2. Overall average pitch score for Devin Williams in 2023 as a function of location. Even pitches in the center of the strike zone, with significantly negative location scores, produced positive pitch scores due to the excellent stuff score produced by a combination of velocity and movement.

 

HITTER SCORES

The pitch scores defined in the previous section measure the typical value of a pitch against an average major league hitter. Thus, a hitter score can then be calculated as the difference between that hitter’s outcome and the ones that would be expected given the actual pitches they faced. This metric will account for the luck of facing particularly good or poor pitches over the course of the season.

As with pitch scores, for a pitch taken by the hitter, the outcome used is the difference between expected wOBA before and after that pitch, expressed in runs. The hitter score is then the difference between that change and the pitch score, defined so that a positive number corresponds to more expected runs. To further remove luck from the results, when a ball is put in play, the Statcast expected wOBA based on trajectory is used rather than the actual wOBA resulting from the ball in play, since the latter depends upon fielding.

As the sabermetric understanding of pitcher evaluation has improved, the profile of pitchers considered most effective has shifted, since those who lead in traditional metrics such as wins or ERA are often not those who rank highest by more advanced metrics such as FIP, which can again be different from those with the highest pitch scores due to hitter luck. However, the hitters with the highest hitter scores are far more recognizable (Table 1), suggesting that over the course of a full season, the quality of pitches faced does not vary significantly across major league baseball and the trajectory-based expected wOBA is a good predictor of actual wOBA.

 

Table 1. Top Hitter Scores for MLB Hitters with a Minimum of 1,000 Pitches Faced in 2024

 

A closer examination of hitter scores reveals the different ways that hitters create value. For example, Shohei Ohtani’s value in 2024 predominantly came from exceptional results throughout the center of the strike zone (Figure 3). These are the locations that typically lead to the best contact, so a high hitter score here is typical of a power hitter.

 

Figure 3. Average hitter score as a function of location for Shohei Ohtani in 2024. Most of Ohtani’s value came from exceptional results throughout the center of the strike zone, even though he slightly underperformed the average major league hitter on pitches near the edges of the zone. This is a typical profile for a power hitter, since these are the location that typically lead to the best contact.

Figure 3. Average hitter score as a function of location for Shohei Ohtani in 2024. Most of Ohtani’s value came from exceptional results throughout the center of the strike zone, even though he slightly underperformed the average major league hitter on pitches near the edges of the zone. This is a typical profile for a power hitter, since these are the location that typically lead to the best contact.

 

Exceptional plate discipline, on the other hand, instead creates value near the edges of the strike zone. Most of Lars Nootbaar’s value in 2024 came from those pitches (Figure 4A), and he underperformed the average MLB hitter on pitches closer to the center of the zone. Pitchers and catchers were likely aware of these tendencies when facing Nootbaar, since he primarily saw pitches in the locations he was weakest relative to MLB hitters (Figure 4B).

 

Figure 4A. Average hitter score as a function of location for Lars Nootbaar in 2024. Most of Nootbaar’s value came from exceptional results around the edges of the strike zone. This is a typical profile for hitters described as having an exceptional eye.

Figure 4A. Average hitter score as a function of location for Lars Nootbaar in 2024. Most of Nootbaar’s value came from exceptional results around the edges of the strike zone. This is a typical profile for hitters described as having an exceptional eye.

 

Figure 4B. The most common locations for pitches thrown to Nootbaar that year were in the locations he most underperformed when compared with major league hitters. This likely indicates that pitchers had a gameplan informed by Nootbaar’s profile, either through techniques like those described here or through more traditional advanced scouting.

Figure 4B. The most common locations for pitches thrown to Nootbaar that year were in the locations he most underperformed when compared with major league hitters. This likely indicates that pitchers had a gameplan informed by Nootbaar’s profile, either through techniques like those described here or through more traditional advanced scouting.

 

In summary, the hitter score provides a solid and intuitive summary of hitter skill that generally reaffirms what existing metrics have already found. It aligns well with established rankings of the best hitters, and a breakdown by pitch type or location helps to build a profile of which skills an individual hitter is using to produce value. However, this likely reflects information teams already exploit when calling pitches. Still, the consistency between hitter score and real-world behavior helps to confirm that it can be a useful measure of hitter skill.

TIMES THROUGH ORDER PENALTY

This hitter score can now be applied to investigate other phenomena. It is well established that there is a progressive drop-off in a pitcher’s effectiveness as he faces the batting order multiple times. Teams have been willing to adjust strategy to avoid this effect, notably including the Rays pulling Blake Snell after just 73 highly effective pitches in Game 6 of the 2020 World Series. Although this “times through the order” (TTO) effect can be easily measured, the origins have been heavily debated.5

Tango, Lichtman, and Dolphin (2007) demonstrated the effect, finding that hitters gain 8–9 points of wOBA in each successive plate appearance (PA).6 They attributed this primarily to hitter learning rather than to pitcher fatigue. Lichtman (2013) reached a similar conclusion in a more detailed analysis.7 Under this theory, there should be a sharp difference between results against the 9th or 18th hitter faced and the 10th or 19th, who are seeing that pitcher an additional time. Indeed, such a discontinuity is found in observed results.

However, nearly all appearances in which a single pitcher goes through the order several times are by starting pitchers, so that the 10th and 19th appearances are also a return to the top of the lineup. These are the best hitters, which is why the first inning (the only inning in which the top of the order is guaranteed to bat) produces more runs than later innings.8 A more sophisticated statistical approach instead argues that this difference in hitter quality and other confounding variables are responsible for the apparent discontinuity when going through the order an additional time.9 Instead, pitchers exhibit a continuous decline throughout the course of a game, indicating that it is pitcher fatigue rather than hitter learning which is responsible.

With the pitch score and hitter score metrics, it is now possible to measure both effects independently. Pitcher fatigue results in lower pitch scores over the course of an outing. There is a survivor bias for long outings, since typically only effective pitchers are allowed to face, e.g., 25 hitters, although this is somewhat mitigated because managers determine when to bring in a reliever by outcome and game situation rather than pitch score. Further, the first few pitches thrown by a starter are substantially worse than the remainder of their pitches, indicating a brief adjustment period. However, when restricting the study to outings of at least 90 pitches thrown, a small but clear decline in pitch score over the remainder of an outing is observed (Figure 5).

 

Figure 5. (top) Average pitch (blue) and hitter (red) scores for 2024 major league hitters as a function of the number of hitters faced by that pitcher during that appearance; and (bottom) the same data grouped by position within the lineup. Pitch scores exhibit a small decline over the course of an outing following a brief initial adjustment period. However, the far larger effect comes from hitters, with the top of the order producing higher hitter scores than the bottom. The top of the order also shows a strong TTO effect, indicating that hitter learning is more important than pitcher fatigue. However, the bottom of the order shows minimal change in hitter scores, indicating that those hitters do not make the same sort of successful in-game adjustments.

Figure 5. (top) Average pitch (blue) and hitter (red) scores for 2024 major league hitters as a function of the number of hitters faced by that pitcher during that appearance; and (bottom) the same data grouped by position within the lineup. Pitch scores exhibit a small decline over the course of an outing following a brief initial adjustment period. However, the far larger effect comes from hitters, with the top of the order producing higher hitter scores than the bottom. The top of the order also shows a strong TTO effect, indicating that hitter learning is more important than pitcher fatigue. However, the bottom of the order shows minimal change in hitter scores, indicating that those hitters do not make the same sort of successful in-game adjustments.

 

However, the hitter effects are both more significant and more complex. During the first time through the lineup, hitter scores are very slightly negative at the beginning of an outing, then drop for the lower half of the lineup (Figure 5). By the second time through the order, the top of the lineup now produces hitter scores approximately 0.5 runs per 100 pitches better than the first time. However, the bottom of the lineup, which had the weakest hitter scores in the first PA, does not improve. The same pattern is repeated the third time through the order, with the top of the lineup performing even better than during their second PA, but the bottom of the lineup producing similar hitter scores in all three PA.

The natural conclusion is that although pitcher fatigue does produce a gradual and measurable decline in pitch quality, it is the hitters, and specifically the best hitters, that drive the majority of the observed effect. Familiarity with a pitcher produces a substantial advantage over the course of a game, but only for those hitters skilled enough to be capable of making the right adjustments. Although the bottom of the lineup shows little to no hitter improvement, pitcher fatigue does produce a smaller improvement from one PA to the next based on the decline in pitch quality.

This complex behavior, in which pitchers tire continuously but only some hitters improve, is likely responsible for the conflicting results of previous studies trying to determine whether pitchers or hitters drive the TTO effect. A further confounding factor is the difference in lineup qualities; most of the hitters in a very strong lineup might be good enough to make in-game adjustments, while in a weak lineup there might be only a couple such hitters. Further, although the best hitters are generally placed near the top of the order, managerial preferences, roster constraints, and occasional misjudgments about player quality mean this is not always the case, further obscuring the effect in previous studies.

PITCH SEQUENCING AND HITTER GUESSING

Another application of the idea of hitter scores comes from thinking about how pitches are selected. Most pitchers throw between two and five different pitches over the course of an outing, and it has long been thought that a clever catcher can improve their pitcher’s performance by calling for pitches in the right location and in the right order. The more that a hitter is surprised, the worse their performance should be.

If true, this is antithetical to the idea that one can simply assign every pitch a score. After all, it would mean that no pitch can be properly evaluated in isolation. The same pitch in a different context should be projected to have different effectiveness.

On the other hand, if part of the TTO effect is that hitters learn from the previous pitches thrown, then hitters should also become more effective over the course of a single at-bat. If a pitch is thrown several times in a row, the hitter score should continue to increase each time they see that pitch.

To test this hypothesis, hitter scores are considered for several situations (see Table 2):

  • The first pitch of a plate appearance (365,279 pitches in 2023–24 for an average hitter score of -0.153/100 pitches)
  • A non-first pitch which occurs for the first time in that plate appearance (428,972 pitches for an average score of -0.070/100)
  • A non-first pitch which has been thrown earlier in the same plate appearance (631,389 pitches for an average score of 0.099/100)

 

Table 2. Hitter Scores for Various Groups of Pitchers Within an At-Bat

 

As expected, hitters do appear to learn, and subsequent instances of the pitch result in a higher pitch score. However, additional segmentation reveals a more complex behavior:

  • A non-first pitch which is identical to the previous pitch (389,915 pitches for an average hitter score of -0.140/100)
  • A non-first pitch that occurred previously but is different than the previous pitch (241,474 pitches for an average hitter score of 0.485/100)

Surprisingly, the results show the opposite of what we might expect from the hypothesis that hitters learn. Throwing the same pitch type back to back would seemingly provide the best learning environment, because the hitter has just seen the exact same pitch from the same pitcher. However, this actually results in a lower pitch score, while showing a new pitch is less effective than a repeated pitch!

It seems clear that hitters do indeed learn, because they have a higher hitter score when seeing the same pitch multiple times with a gap in between. Thus, perhaps the most likely explanation is that hitters include an element of guessing in their approach. This idea of batter learning was first supported by David W. Smith in his article in the SABR Baseball Research Journal in 2005.10 Coaching typically emphasizes that hitters should be adjusting rather than guessing, and a “guess hitter” is a description typically seen as pejorative. However, the evidence seems to indicate that within an at-bat, the best hitters in the world do include an element of guessing when facing the best pitchers in the world. This is supported by Ted Williams, perhaps the greatest hitter in the history of baseball, who endorsed guessing in his book.11

Further, on average hitters appear not to guess often enough that they will see the same pitch twice in a row. Despite learning from having seen the exact same pitch previously, hitter scores are lowest when a pitch is immediately repeated. Thus, there would seem to be significant room for hitter improvement with a better approach to guessing pitches. If the 195k consecutively repeated pitches produce the same hitter scores as a different pitch (0.130/100), this would be worth approximately 1.8 wins/team/season. If they produce the same pitch score as when that same pitch is seen later in the at-bat with a gap in between, a time when hitters are likely guessing correctly more often than average, it would be worth around 4.1 wins/team/season. The extreme version of successful guessing would be a pitcher who is tipping their pitches, which would be worth enough to the hitter that even the best pitcher in baseball would struggle to survive at the major league level while tipping pitches.

PITCHER LEARNING

The pitch quality also changes between these different situations. On the first pitch of a plate appearance, the average pitch score is -0.342/100. However, both pitchers and hitters perform poorly on the first pitch of a PA. A possible explanation is that hitters are often conditioned to take a strike as part of their approach, producing weak hitter scores for otherwise hittable pitches. In return, pitchers adjust by being more willing to throw the ball towards the center of the zone, due to the increased odds of a hitter not swinging.

When the same pitch is immediately repeated, the average pitch score increases to 0.254/100, and when repeated with a gap, it becomes 0.356/100. Similarly, the first time a new pitch is thrown in the PA, the average pitch score is -0.342/100, identical to the first pitch of the PA. Thus, pitchers also appear to adjust over the course of throwing to the same hitter. However, a confounding effect is that most pitchers have a range of pitch quality in their arsenal, and are more likely to throw their best pitches multiple times in a PA rather than their worst.

FINDING THE FULL VALUE OF CATCHER FRAMING

In the previous section, it was shown based on changes in hitter scores in various scenarios within a plate appearance that hitters appear to guess. A consequence is that even as baseball moves towards an automated strike zone and framing becomes less useful, there might still be considerable value to a strong defensive catcher (or bench coach). If hitters indeed guess, then pitch calling can be quite valuable.

This can be tested by considering the impact of catchers on the hitter score of the pitches they receive. If a catcher is successful at surprising a hitter, the hitter score should be lower than they would typically achieve on a pitch with a similar trajectory. On the other hand, if the hitter guesses correctly, they should have a higher hitter score than on similar pitches. So, one can define a catcher score as the difference between the average hitter score and the actual hitter score on the pitches they receive.

MLB’s Statcast and Baseball Savant found that catcher framing value ranged from +16 to -10 runs in 2024. However, catcher scores overall ranged from +61 to -50 runs, over four times larger of a spread (Table 3). If correct, this means that the difference between the best and worst catchers was over 11 wins in 2024. Including both catchers on a roster produces an even larger spread. So, these catcher scores indicate that the best catchers are far more valuable than had been previously believed.

 

Table 3. Highest and Lowest Catcher Scores in 2024 for Catchers With At Least 5,000 Pitches Received

 

A possible confounding variable is that this larger spread could be park-induced. If a park has a particularly poor hitter backdrop, all hitter scores in that park might be lower than average, leading to a higher catcher score. However, the same trend exists when only considering pitches caught on the road (Table 3), so it appears that catchers indeed influence hitter scores to approximately this degree.

SOURCES OF ADDITIONAL CATCHER VALUE

Two potential explanations for this increased catcher value appear most probable. The first is that pitch calling is responsible. Since it is shown above that even major league hitters guess, it stands to reason that surprising a hitter should lead to a lower pitch score. If successfully surprising hitters is worth up to 40 runs/team/year, it is possible that the difference between the best and worst pitch callers could be several times as large.

The other possibility is that this comes from hitter adjustments. As a pitch moves away from the center of the strike zone, hitter outcomes decline (Fig. 1b). Thus, hitters try to avoid ‘expanding the zone’ and swinging at pitches that would otherwise be balls. However, in most situations (and particularly deeper in the count), swinging at a pitch that would otherwise be called a strike is a better outcome than taking the pitch. So, if a catcher with good framing skills can consistently establish that a pitch an inch outside the rulebook strike zone will be called a strike, hitters will adapt by swinging at those pitches at a higher rate than they otherwise would. The best framing catchers indeed do induce more swings, particularly outside of the strike zone, than the worst framing catchers (Figure 6).

 

Figure 6. Swing fraction as a function of location for hitter against (left) all catchers; (right) ratio of swings induced catchers with the best framing ability compared with catchers with the worst framing ability in 2024. Hitters were more likely to swing at pitches outside of the rulebook strike zone against catchers who had the ability to frame those more often into strikes.

Figure 6. Swing fraction as a function of location for hitter against (left) all catchers; (right) ratio of swings induced catchers with the best framing ability compared with catchers with the worst framing ability in 2024. Hitters were more likely to swing at pitches outside of the rulebook strike zone against catchers who had the ability to frame those more often into strikes.

 

Thus, in addition to the value of the strikes directly added via framing, those same pitches additionally produce better hitter outcomes on balls swung at, both when put into play and when missed. Since these pitches are not called by the umpire, they are not included in the direct framing value, but they will be included in the pitch score.

RELATIVE CONTRIBUTIONS TO CATCHER VALUE

These two effects can potentially be separated by location. The extra swings induced by good framing catchers occur predominantly outside of the rulebook zone, as do the additional called strikes by definition. Thus, catcher score on pitches thrown inside the rulebook zone can instead be attributed to pitch calling. The resulting framing runs for each catcher correlate strongly between home and road results, as would be expected for framing skill and effects induced by framing. However, because pitch scores do not include any sort of park adjustment, there is a weaker correlation between home and road calling scores. Thus, only road pitches are used. The true catcher value should be approximately double their value on the road.

The results for the best and worst defensive catchers in 2024 are shown in Table 4. The standard deviation of framing value for catchers with at least 5000 road pitches received is 9 runs, slightly over double the previously measured value of catcher framing for road pitches.

 

Table 4. Highest and Lowest Total Catcher Runs for Pitches Caught on the Road in 2024 (minimum 5,000 Pitches)

 

Although most of the difference comes from additional swings induced outside of the strike zone, framing runs shown here are not directly analogous to previous measurements. Those metrics assign a constant value to each extra strike, regardless of count. However, here framing a pitch on a 3–2 count produces a much higher catcher score than on a 1–0 count,12 since the difference in wOBA is far larger. In 2024, this was a particularly strong effect for Austin Hedges and Korey Lee, who induced incorrect calls from umpires predominantly in high-leverage counts. Similarly, the metric used here is based on the exact location of each pitch, giving more credit for framing a pitch less likely to be called a strike, while the Baseball Savant version groups pitches in a broader zone together and looks at the overall fraction of called strikes within that zone.

Highest and lowest total catcher runs for pitches caught on the road in 2024 (minimum 5000 pitches). Presumably the total 2024 value for each catcher was approximately twice their road value. Framing runs are calculated based on balls outside the zone, including additional called strikes, swinging strikes, and quality contact. Pitch calling runs are calculated based on balls inside the zone, both in the adjusted swing rate and the quality of contact. Framing runs correlate strongly between home and road, but because park factors are not included in pitch score, calling value does not.

A lack of correlation between framing and calling runs supports the proposed separation. Pitch calling does not require any physical skill of the sort that might be typical of a good framing catcher. Although a catcher with a strong defensive focus might have both skills, they should correlate only very weakly. Indeed, total catcher runs inside and outside of the rulebook zone for catchers with at least 5000 road pitches caught in 2024 show no clear correlation (Pearson r value of -0.072, corresponding to a p value of 0.48 that this would occur by random chance.) However, the two effects have comparable importance, with calling producing a standard deviation of 11 runs for road pitches. As with framing, the full value should be approximately double the value found for road pitches.

The full pitch calling effect has been assigned to catchers, but in practice pitch calling is team effort and involves the pitcher, manager, and scouts. Separating these effects might be possible with sufficient data comparing players who have played for multiple teams and managers, but these do not yet exist in the PitchCom era.

In summary,

  • As has been well explored in previous studies, the extra called strikes due to catcher framing are worth up to approximately 15 runs/year for the best catchers in MLB.
  • Hitter adjustments swinging at additional pitches near the edges or entirely outside of the rulebook strike zone can be worth an additional 15–20 runs/year, bringing the total framing value to nearly 40 runs/year for the best catchers.
  • Pitch calling might be worth up to another 25–30 runs/year for the best catchers. However, pitch calling is weakly or entirely uncorrelated with pitch framing.

DISCUSSION

In this work, pitch scores are used as a starting point to examine the effects of hitter and catcher skill. Three new effects are revealed by this analysis: (1) that the times through the order (TTO) effect is primarily due to learning from only the best hitters, rather than all of them; (2) that hitters guess what pitch is coming next as part of their approach; and (3) that pitch calling and hitter adjustments are each worth as much as the previously identified direct value of the additional called strikes induced by catcher framing.

At the major league level, teams are constantly competing to find new sources of value that allow them to use limited budgets more efficiently. All three of these effects could be exploited to gain a competitive advantage under current rules.

EXPLOITING THE TTO EFFECT WITH OPENERS

The main TTO result here is that it is driven primarily by learning from the top of the batting order rather than from all hitters. Since pitcher effectiveness declines each successive time the top of the order is faced, it makes sense to avoid letting a pitcher see the top of the lineup for a third (or even second) time when possible. The aggressive hooks that have become more common, especially in the playoffs, are therefore well supported by the data. Of course, this strategy comes at the cost of shorter outings from starters, leading to greater bullpen fatigue, which is hard to sustain over a 162-game season.

However, there may be room to extend starters further with a small tactical adjustment. The penalty for facing the bottom of the order a third time is minimal; if pitchers were limited to those hitters, they could continue pitching until fatigue forces a change. One could therefore imagine using an opener to handle the top of the order, then bringing in the starter/follower to face the bottom. In this manner, a starter who would normally be pulled after 18 batters to avoid the worst TTO effects could instead face the bottom of the order a third time, effectively adding about an extra inning per outing. This, in turn, would produce a lower bullpen workload, with several well-studied benefits.

VALUING CATCHER FIELDING

MLB teams with strong analytics departments are well aware of the value of catcher framing. For example, Austin Hedges, an excellent framing catcher coming off 2022 and 2023 seasons worth an MLB-worst combined -43 runs of offense according to FanGraphs, was still given a $4 million contract by Cleveland in 2024. After another -14 offensive runs in just 146 PA in 2024, he was re-signed for an additional $4 million in 2025. Thus, it is clear that Cleveland evaluates the excellent defense provided by Hedges as being worth more than the cost of having one of the worst hitters in Major League baseball in their lineup.

However, if the value of the catcher framing is 2–3 times greater than previously believed, an excellent defender such as Hedges is a considerable bargain. A catcher with both excellent framing and calling skills would be even more valuable, although this combination is rare because the two skills are likely uncorrelated. According to catcher score, Jake Rogers saved 61 runs in 2024, an equivalent value to the offensive contribution that earned Juan Soto’s a $765 million contract. Cal Raleigh, an excellent framing catcher with an above-average bat, could plausibly be the most valuable player in baseball with improved pitch calling.

However, a player like Rogers, or even Raleigh, at the same age as Soto could surely be signed far more cheaply as a free agent, providing considerable excess value. Further, for a team already winning well over half their games, saving runs improves a Pythagorean win estimate more than scoring the same number of extra runs. For example, scoring another 50 runs would have improved the 2024 Dodgers by an expected 4.4 wins, but preventing an additional 50 runs would have been worth 5.8 wins.

PITCH CALLING

Finally, pitch calling presents a unique opportunity because pitches can be called from the dugout. Thus, instead of signing a catcher with excellent pitch calling in addition to their other major league-level skills, a team can sign a bench coach at a much lower salary and without counting towards collectively bargained player salary thresholds. For that matter, since guessing is part of the approach of most hitters, the same bench coach could also provide improved guesses to their hitters. Bailey Freeman pointed out an analogous form of off-field value when arguing that the Royals gained several wins from an exceptionally strong replay review team.13

In the future, it appears that major league baseball may transition toward an automated ball-strike (ABS) system. If so, catcher framing would disappear and presumably hitter adjustments to catcher framing along with it. However, pitch calling, which appears to be more valuable than catcher framing and hitter adjustments, would retain its importance.

CHARLES STEINHARDT is a professor of physics at the University of Missouri, where his research focuses on the evolution of galaxies and the application of machine learning to a wide range of scientific problems. A lifelong Red Sox fan, he has carried his love of the game across the globe, playing amateur baseball during academic posts in countries including Japan and Denmark.

ZACH BOROWIAK is a physics/astronomy and statistics student at the University of Missouri who specializes in data-driven research. He applies machine learning and advanced data analytics on projects in both astrophysics, studying galaxy evolution, and baseball, looking at player performance and evaluation.

 

Acknowledgments

The authors would like to thank Daniel Mack and Kevin Seats for helpful comments.

 

Notes

1. Owen McGrattan, ” Stuff+, Location+, and Pitching+ Primer,” FanGraphs Library, March 10, 2023, accessed at https://library.fangraphs.com/pitching/stuff-location-and-pitching-primer/ on October 1, 2025.

2. Namiki, “Expected Pitch Value,” FanGraphs Community Research, May 26, 2021, accessed at https://community.fangraphs.com/expected-pitch-value/ on October 1, 2025.

3. Tianqi Chen and Carlos Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York: ACM, 2016), 785–94. https://doi.org/10.1145/2939672.2939785.

4. Dan Blewett, “Pitch Tunneling: Is It Real? And How Do Pitchers Actually Pitch?,” The Hardball Times, June 16, 2017, accessed at https://tht.fangraphs.com/pitch-tunneling-is-it-real-and-how-do-pitchers-actually-pitch/ on October 1, 2025.

5. Ethan Moore, “Pitch Quality 4: Solving the Times Through the Order Penalty,” Prospects 365, July 16, 2020, accessed at https://prospects365.com/2020/07/16/pitch-quality-4-solving-the-times-through-the-order-penalty/ on October 1, 2025.

6. Tom Tango, Mitchell Lichtman, and Andrew Dolphin, The Book: Playing the Percentages in Baseball (Potomac Books, 2007).

7. Mitchel Lichtman, “Everything You Always Wanted to Know About the Times Through the Order Penalty,” Baseball Prospectus, November 5, 2013, accessed at https://www.baseballprospectus.com/news/article/22156/ on October 1, 2025.

8. Jacob Peterson, “All Innings Are Not Created Equal: How Run-Scoring Varies By Inning,” Beyond the Box Score, July 3, 2011, accessed at https://www.beyondtheboxscore.com/2011/7/3/2255959/all-innings-are-not-created-equal-how-run-scoring-varies-by-inning on October 1, 2025.

9. Ryan S. Brill, Sameer K. Deshpande, and Abraham J. Wyner, “A Bayesian Analysis of the Time Through the Order Penalty in Baseball,” Journal of Quantitative Analysis in Sports 19, no. 4 (2023): 245–262. https://doi.org/10.1515/jqas-2022-0116

10. David W. Smith, “Do Batters Learn During a Game?” SABR Baseball Research Journal, Vol. 34 (2005), accessed at https://sabr.org/journal/article/do-batters-learn-during-a-game/ on October 1, 2025.

11. Ted Williams and John Underwood, The Science of Hitting (Simon & Schuster, 1986).

12. Dan Meyer, “Dynamic Run Value of Throwing a Strike (Instead of a Ball),” The Hardball Times, May 6, 2015, accessed at https://tht.fangraphs.com/dynamic-run-value-of-throwing-a-strike-instead-of-a-ball/ on October 1, 2025.

13. Foolish Baseball, “Umpires Hate Him! The Replay Review MVP,” YouTube.com, August 26, 2023, accessed at https://www.youtube.com/watch?v=RmeP77TEAU8 on October 1, 2025.

Donate Join

© 2025 SABR. All Rights Reserved.