Umpire Analytics
This article was written by Brian M. Mills
This article was published in The SABR Book of Umpires and Umpiring
Rule 9.02 of the official MLB rulebook states, “Any umpire’s decision which involves judgment, such as, but not limited to, whether a batted ball is fair or foul, whether a pitch is a strike or a ball, or whether a runner is safe or out, is final. No player, manager, coach, or substitute shall object to any such judgment decisions.” The importance of this authority is rooted in ensuring that the game proceeds smoothly and with integrity. MLB requires that umpires do not abuse their power and perform up to expectations as guided by the rules.
To ensure that rules are followed, the league monitors the behavior and performance of its umpires holding the integrity of the game in hand. Yet, precise measurements of performance, particularly for ball and strike calls, have not been readily available until recent technological innovations. The most important innovation was the introduction of ball-tracking technology to evaluate the called strike zone, but more recent additions include instant replay and tracking of umpire movements around the field. Various sabermetricians have used this data to apply the quantitative concepts to umpires that they have applied to player evaluation for years. The result has been a new and exciting understanding of the game of baseball and the role umpires play in those outcomes.
This article will start by giving a general overview of the evolution of technology used to monitor umpires and how this has been useful in developing an understanding of umpire behavior. While most readers are likely familiar with the use of sabermetrics to evaluate players and teams, player evaluation has been a part of the game since its inception. Much of this work on teams and players is known to a wide audience, but analysis of umpires has only recently become a topic of public conversation due to the availability of new data and technology. This article will therefore take the reader through many of approaches and findings of umpire analytics experts over the past 10 years to evaluate the performance and biases among umpires in MLB.
As the reader may well know, the history of umpires is one resistant to technological change and what is often seen as threatening oversight. However, recent advancements have encouraged the effective use of these technologies not only to monitor performance, but reduce bias and help train a new crop of high-performing umpires seemingly more open to receiving feedback on their work. The monitoring of umpires, however, is not the only proposed use of this technology. Using relatively simple analytics, paired with the influx of data on strike zones, many have proposed replacing the umpire with a machine to call balls and strikes. The possibility of replacement is uncertain, but we can be sure that analytics provide us with useful context in which to make more educated decisions about these types of policies.
2. The Evolution of Umpire Performance Monitoring Technology
Major League Baseball and its umpires have had a rather tumultuous history of labor strife, with performance standards, incentives, and salary considerations playing a central role in disagreements during the 1990s when the leagues were still separate (O’Neill, 1990; Chass, 1999). Umpires generally voiced interest in both autonomy and a seat at the table on rule and monitoring changes, but the commissioner’s office issued directives in the 1990s related to strike-zone uniformity without much consultation with their umpires (Chass, 1999). The imposition of actual monitoring of umpire calls first became visible when the commissioner’s office started its attempt at securing centralized control over umpires to create uniformity in the strike zone across the two leagues (Callan, 2012). While MLB refers to a single organization comprising the teams in both the American League and National League, these leagues operated as separate legal entities prior to 2000. There was a considerable effort to merge into a single group in the 1990s, and this merger had an important influence on the development of technology and umpire monitoring that would follow.
Specifically, the official strike zone had recently been amended in the rulebook in 1996, but Sandy Alderson, executive vice president for baseball operations at the time, had issued a memo calling for uniform enforcement of a zone slightly different from the newly described version (Weber, 2009). Most importantly, as part of this directive, Alderson asked team officials to manually chart pitches for umpires working their games (Callan, 2012), resulting in the first well-known direct evaluation of strike zones by the league. Manual charting, of course, did not provide substantial opportunities for in-depth statistical analysis of umpire performance. But it did give some level of monitoring and accountability to the arbiters of the strike zone as the commissioner’s office looked to gain centralized control. This would set the stage for future disagreement and opportunities for more in-depth evaluation of umpire performance by baseball leadership and the public.
Around this same time, the commissioner’s office had been experimenting with video simulations and other strategies to train and monitor umpires, largely driven by former minor-league umpire Philip Janssen and his research program (Weber, 2009).1 The simulations — adapted from training of fighter pilots — provided umpires with opportunities to practice making calls on different types of pitches or areas of difficulty in the zone. This was the league’s first real deep dive into using advanced technology to assist its umpires. But the endeavor did not get fully off the ground, with significant resistance from the umpires’ union and limited funding from the commissioner’s office.2
The league’s attempt at gaining full control over umpires came to a boiling point in 1999 when the leader of the umpires’ union, Richie Phillips, orchestrated a failed mass-resignation strategy. With a majority of major-league umpires sending resignation letters to the commissioner’s office to gain leverage in negotiations for a new collective-bargaining agreement, the league unexpectedly accepted the resignations, quickly leading to a divide in the union and its ultimate collapse (Weber, 2009; Cyphers, 2012). This opened up an opportunity for MLB to gain leverage in negotiations, move to the league-controlled officiating we see today, and introduce new technology to monitor and evaluate their performance.
In 2001 the system now colloquially known as QuesTec was installed in a few major-league ballparks to track the accuracy of ball and strike calls on a trial basis. This expanded to 13 ballparks in subsequent seasons (Karegeannes, 2004). The system used video cameras mounted around stadiums to track pitches from the pitcher’s hand to the plate, and identify where the ball crossed the front of the plate. The introduction of the system was not initially agreed upon by the umpires union, resulting in more unrest between the league and the umpires. Use of the system was ultimately ratified in 2004 during a new session of collective bargaining with a newer umpires union.
Umpires were sent a CD-ROM of the plotted data from their game(s) that included every pitch location and umpire call. However, in some cases, umpires did not receive the feedback at all, and there was little recourse for not paying attention to the discs (Karegeannes, 2004). And while the system could have been useful in improving umpires’ performance, it is unclear whether many umpires used this information, as skepticism remained among them over whether the system would be useful in improving performance. For example, Jerry Crawford noted in an interview on Real Sports that he threw the discs in the trash immediately after receiving them (Frankel, 2016). And umpires were not the only ones unimpressed with the QuesTec monitoring. Players also had concerns over the accuracy and consistency of the system. In 2003, while walking off the field, pitcher Curt Schilling sought out and smashed a QuesTec camera (AP, 2003).
It should be noted that QuesTec did have issues with certain types of pitches and identifying a zone consistent with what umpires actually called (or what the league expected them to call). Given the possibility of error, umpires had various complaints about the technology relating both to its accuracy and its use to make employment decisions. The league ultimately gave umpires some leeway with the accuracy measurements, and agreed that any hiring or firing decisions would not be based solely on the technology (Chass, 2004). This system was used through 2008, when it was replaced by a new system from commercial provider Sportvision.
At about the time QuesTec was initially introduced in 2001, ESPN was working with Sportvision to add a strike-zone visual (known as K-Zone) on its television broadcasts for each pitch thrown and called by umpires (Gueziec, 2002). The two systems ultimately meant umpires were suddenly being judged not only by the league and its QuesTec technology, but also by every fan watching a game broadcast on ESPN. Unfortunately for the umpires, as with QuesTec the visual on broadcasts left substantial room for error, perhaps resulting in more fan backlash than was warranted (Fast, 2011a). While the error rate was relatively low using the camera technology, there are questions as to the proper placement of the top and bottom of the strike-zone boundaries and other problems. Umpires seemed to have good reason for loathing the additional scrutiny they faced every game.
As a new collective-bargaining agreement came together in 2009, it was clear that MLB would be shifting away from the proprietary QuesTec in favor of a new camera system from Sportvision known as PITCHf/x.3 With PITCHf/x, umpires would receive direct and immediate feedback after each game through detailed reports accessible online at their convenience, avoiding some of the issues that plagued the QuesTec system early on. The new system, called Zone Evaluation, also used cameras to track ball movement and project the location of the ball as it crossed the front of the plate. This projection allowed both a two-dimensional and a three-dimensional representation of the strike zone. While presumably more accurate, the PITCHf/x system had some error in its data, measuring the location of the pitch within about a half-inch in either direction. Due to the error margin, umpires are allowed some leeway, though these adjustments are largely part of internal development and it is unclear whether the expected strike zone is evaluated precisely as the rulebook zone defines. But it is important to note that any performance that an analyst reports with this data may be slightly misleading without fully accounting for the error margin. And MLB is rather cagey about sharing the way in which it evaluates its umpires, or the strike-zone definition used to do so.
Luckily for sports analysts, PITCHf/x data had been available since mid-2007. By 2008 and beyond, anyone with the ability to scrape data from the internet had access to the location of every regular-season MLB pitch and the associated umpire call. This marked the beginning of a new era in analytics of baseball, and the start of analytics of umpires. Perhaps to the chagrin of umpires, this also meant that smart analysts had a way of publicly revealing relatively precise grades of ball-strike calls. As early work would show, umpires sometimes missed ball-strike calls in ways that revealed fallibility as arbiters of the truth, with notable biases in the patterns of these missed calls. However, analysts have shown that umpires have improved their accuracy since the system was put in place (Mills, 2016b).
Even more recently, a newer system developed by the Danish company, Trackman, has taken over data production across MLB. Trackman is a Doppler radar system that more precisely locates the ball and follows it on its entire path, rather than projecting its path to the plate (Nathan, Kensrud, Smith, & Lang, 2014). This information is combined with player and umpire movement tracking systems to form what is now well known as Statcast. As the league fully transitions to this new system it has started releasing some of the new information available from the radar system, including batted-ball trajectory and exit velocity and pitch spin rate and axis (Willman, 2016). Most of this information is more useful for player analysis, but has its place in understanding the effects of umpires on game play. Analysts have taken hold of much of this data and developed a new understanding of the work of baseball’s umpires.
Interestingly, umpire performance itself is only one aspect of this work. There are various topics related to analysis of umpires, most of which are related to the strike zone, and include performance changes, effects on the game, biases, psychological phenomena, and (in the future) positioning. The next section will detail the data available to the analyst and recent work on how umpires perform, how they have improved, and how this can affect the game. Section 4 will detail a large literature on the performance of umpires, while Section 5 discusses biases and other psychological phenomena revealed through analyzing umpire data. Lastly, Section 6 will briefly discuss future directions of umpire analytics and the potential for replacement of home-plate umpires with machines.
3. Data Availability and Strike Zone Definitions
3.1 Data Availability, Methods, and Tools
At its core, umpire analytics have been about the strike zone, the shortcomings of the way it is called by umpires, and impact this has on the game. This interest calls for an in-depth understanding of not only what the data can tell us, but what it cannot. One of the largest drivers of the development of umpire analytics has been the availability of data to the public through the PITCHf/x system. This availability has shaped the focus of umpire analytics on the strike zone, given that information about positioning and video replay of individual calls elsewhere on the field is relatively limited to the outside analyst.
At the initial release of PITCHf/x data, only a few researchers had the technical ability to scrape the data and manipulate it on their own computers. However, analyst Mike Fast opened the flood gates by publicly releasing his Perl and SQL code to download data directly from the MLB Gameday website into a personal database.4 Later, other engineers and computer savvy sabermetricians developed online databases that allow point-and-click access to large chunks of PITCHf/x and Statcast data. This data is now much more accessible, allowing for significant growth in the number of analysts evaluating umpire behavior.
Brooks Baseball (brooksbaseball.net), managed by Dan Brooks, was a pioneering site that allows close inspection of game data and visualization of ball and strike calls by umpires in each game. The most obvious benefit of this site is that it provides various options with which visitors can view and understand data on preprogrammed strike-zone maps. But there is also an option to download data at the game-pitcher level for use in Excel and statistical applications. For those who want large files of raw data for themselves, Daren Willman’s Baseball Savant — now directly partnered with MLB — has been an excellent source of data downloads of individual pitch data.5 The data can be downloaded using various queries, and includes umpire assignments within the data, and other information from Statcast like velocity of each hit, the exit angle, and other relevant variables.
Other applications have been developed for more experienced statistical programmers, such as the pitchRx package in the open-source programming language R, developed by Carson Sievert (2014). This package gives analysts the ability to download data directly into the R program from the MLB Gameday website and includes functions that make beautifully designed visuals of the strike zone. Other analysts are developing their own packages in R as well, such as Bill Petti’s baseballr package that continues to be developed and improved (Petti, 2016).
While access to and organization of the data is a (sometimes long) process, it is only the first step in the analysis process. Although various quantitative techniques can be performed relatively simply in Excel, much of the impactful work in umpire analytics has used more advanced statistical and visualization tools. One of the most popular programs for doing so is the open-source statistical program known as R. Others have used Matlab and similar powerful programs to perform their sophisticated analyses. While this article will not get into the details of statistical methods and programming, it is clear that nonparametric statistical methods such as kernel density estimation and generalized additive models are becoming standard for measuring the strike zone and how it is impacted by the decisions of its arbiters.
Because PITCHf/x data begins as raw two-dimensional spatial coordinates of the pitch location, nonparametric methods can help to interpret locational properties of strike calls. Generalized additive models (Wood, 2006) have proved to be among the more useful methods in modeling the strike zone called by the umpire. These models use nonparametric regression to measure the probability of a strike call, given its two-dimensional location (and other factors). The most useful outcome of these models is the ability to visualize the strike zone using contour and heat maps. Heat maps are a color representation of the probability of a strike call, presented in a two-dimensional figure. Contours can help to visually identify a square or ellipsoidal boundary at which a strike and a ball call are equally likely to occur (or any other probability of a strike or ball). These models can then be used to compare strike zones of individual umpires or understand how the strike zone changes across different game situations, a key feature for estimating biases in umpire calls in a later section. Later, we’ll take a look at the types of visuals that can be created from these methods. But first, it is important to define what we actually mean when referring to the strike zone itself.
3.2 Strike Zone Measurement and Definitions
As of 1996, the rulebook states that, “The STRIKE ZONE is that area over home plate, the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the knee cap. The Strike Zone shall be determined from the batter’s stance as the batter is prepared to swing at a pitched ball.” (MLB, 2010). However, most fans would agree that whatever the umpires are calling behind the plate, it does not look like the rectangle that K-Zone shows on television, or that would be expected from the description by MLB.
Early revelations that the called strike zone did not represent the rulebook zone brought about an important implication related to analysis of accuracy: The “correct” strike zone is a somewhat fluid concept. Hale (2007a) therefore suggests analyzing umpire accuracy and performance with great care (Josh Kalk [2009] and Mike Fast [2010; 2011d] echo this concern). This work set the stage for a deeper understanding of behavior of umpires relative to expectations of the league and general behavior of their peers, rather than a fixed rectangular strike zone.
Indeed, analysis of the data tells us that the shape of the strike zone called by umpires is more circular or ellipsoidal. And early work from John Walsh (2007a; 2007b) and Jonathan Hale (2007a) showed that its size and shape vary considerably depending on the umpire. Consistent with what fans and players suspected, it was clear that there were umpires with tighter strike zones than others.
There were three core points from this work that redefined how analysts might approach estimating strike zone accuracy: 1) the strike zone called by umpires tended to extend beyond the edges of the plate, 2) left-handed batters tend to have a strike zone shifted to the outside,6 and 3) the top and bottom of the strike zone are not consistently called strikes, particularly at the corners. The apparent flexibility in the zone relative to the rulebook definition therefore opens up questions regarding the variation in performance of umpires across the league, whether they can improve and change their behavior, whether they have clear biases for or against certain players or in certain situations, and whether this impacts gameplay and fan interest in the sport.
In addition to the issues related to the called zone, there were data problems in measuring the top and bottom of the zone, particularly in earlier PITCHf/x data. These boundaries were manually drawn in by operators, and varied considerably depending on the at-bat and the person making the determination (Walsh, 2007a; Mills, 2011). For example, Walsh (2007a) finds that the total height of the strike zone varied by as much as nine inches from game to game for Derek Jeter in 2007 PITCHf/x data. Identifying the top and bottom of strike zones should therefore be done with care — particularly given the bias in measurement that came along with miscalibration in certain stadiums — but is made difficult without reliable information about each batter’s stance in the data. Analysts have come up with ways to circumvent this issue in the data with a reasonably low loss in accuracy of the strike zone definition.
Mills (2016a; 2016b) uses a fixed strike zone to avoid these issues, using the average height of MLB players and setting the top and bottom of the zone congruent with anthropometric data from NASA (2000).7 However, it is important to note that these fixed zone measurements will likely underestimate the accuracy of umpires, since they do not adjust for the height and stance of each batter at the plate. Roegele (2013) and Mills (2014) do attempt to adjust the strike zone by height, but there tend to be relatively small gains in understanding of accuracy by doing so at the aggregate level. Nevertheless, Roegele (2013) does find that taller players see both an upward shift and an upward expansion of the strike zone as might be expected, and more research is needed to further understand the best ways to integrate batter stances into umpire evaluation.
An additional complication with strike-zone measurement is that accuracy measures are generally tabulated using the location of the ball when it crosses the front of home plate using coordinates directly provided in the data. Yet, there is the question of whether the zone should be measured in two or three dimensions. And technically, if any part of the ball crosses over any part of the plate within the height boundaries of the zone, then it should be called a strike (Rule 2.00, A STRIKE (b) [MLB.com, 2007]). The limitations here are obvious: the ball does not have to cross the front of the plate to be a strike, and if only a small piece of the ball crosses through the zone, it is technically a strike. It is physically possible that the ball crosses the back corners of the plate or drops within the top border of the strike zone toward the back of the plate, while not having done so when it crosses the rectangular two-dimensional plane at the front of the plate.
The issue related to having only a portion of the ball pass through the two-dimensional plane is relatively straight forward to fix. Since PITCHf/x data reports the location of the center of the ball as it passes through the plane, analysts can simply add the radius of the ball to identify additional pitches that should be called strikes. Aaron Baggett (2015) covers some of the intricacies of accounting for the radius of the ball (its diameter is between 2.86 and 2.94 inches) when calculating the width of the strike zone.8 Figure 1 shows a bird’s-eye view of the plate with the added radius of the ball on either side to make the strike zone slightly wider when measuring from PITCHf/x data.
The three-dimensional representation is a bit more involved, using information from PITCHf/x on the trajectory of each pitch, rather than the provided coordinates. Recent work has attempted to evaluate the impact of accounting for these additional strikes (Lang, 2015; Mata, 2015). By representing the zone using a pentagonal (the shape of the plate) volume, rather than a rectangular plane, analysts allow for the movement of the pitch to enter the strike zone at different points as it crosses the plate. Specifically, the movement and path of the pitch can affect the likelihood that a pitch does not pass through the pentagonal volume while missing the plane at the front of the plate (back-door curveballs, for example). With respect to pitch height, Lang (2015) notes that a pitch can drop as much as two inches from the front to the back of the plate, leaving plenty of room for additional strike calls on pitches that were otherwise measured as balls. Both Lang (2015) and Mata (2015) show that strategies using physics of ball flight can actually improve upon our understanding of how umpires see the pitch and call it in the three-dimensional context.
To optimize the three-dimensional zone measurement as called by umpires, Mata (2015) creates an algorithm that improves upon the three-dimensional rulebook and two-dimensional plane representations.9Specifically, the back triangle of the plate is not especially helpful in improving our predictions of strike calls in the three-dimensional context. However, using the rectangular portion of the plate — along with height information — improves upon two-dimensional descriptions of umpire behaviors in the strike zone. Lang (2015) further informs us about the role of time in the three-dimensional context: The longer the path of a pitch spends inside this zone, the more likely it is that the umpire will call it a strike. We can presume that the longer the time a pitch is in the zone, the more time an umpire has to determine that it did, in fact, pass through that zone.
Nevertheless, the two-dimensional nature of the projection offers a useful glimpse into how umpires are performing generally. As Lang (2015) notes, pitches that do not cross the front of the plate are very unlikely to be called strikes in the first place, and Mata (2015) adds that two-dimensional plane representation gives a better approximation of the umpire-called strike zone than the true rulebook zone in three dimensions (rather than the simulated optimized zone). Therefore, in ranking umpire accuracy, the few pitches that do cross through the three-dimensional zone — but not the two-dimensional one — are unlikely to substantially affect our measures of the relative performance of umpires.10
The remainder of this article will therefore focus on the two-dimensional representation, particularly given that the bulk of the work analyzing the zone uses the coordinates of pitches provided by the PITCHf/x data. But it is important to note that future study of the differences in these two definitions is a fruitful area for expansion of umpire analytics work, particularly in the context of introducing automation behind the plate.
For exposition, Figure 2 presents various visualizations of the left-handed batter two-dimensional strike zone. The left panel presents raw 2014 PITCHf/x locational data in a scatterplot, colored by whether the pitch was called a ball (yellow triangles) or a strike (red circles). The center panel presents output from a generalized additive model estimating the boundary at which strikes are called 50 percent of the time. In other words, inside the ellipse, strikes are called at least half the time. Finally, the right panel presents a heat map to visualize various strike probabilities using the same model that was used to produce the contour visual. Again, this boundary approximates the equal blending of yellow triangles and red dots in the scatterplot. This is where there is the most uncertainty in the strike zone. As pitches move toward the center of the strike zone, red (dark color) is more prominent, indicating higher probabilities of strike calls in the heat map. The opposite is true as we move away from the center of the zone, with lighter and lighter yellow as the probability of a called strike approaches zero and the probability of a called ball reaches one. These types of visuals — from the umpire’s view — will be used throughout this article to present some of the most interesting findings by analysts about the behavior and performance of umpires.
4. The Analytics of Umpire Performance
4.1 Who Are the Best Home-Plate Umpires?
As a primer on accuracy rates in general, I’ll begin by presenting some basic statistics on the accuracy of major-league umpires from the 2015 season. The data come from Baseball Savant’s Statcast Search tool, and include all pitches called by umpires during the season. Given that pitchouts and intentional balls do not require substantial judgment from the umpire, these are removed from the data, leaving us with just over 365,000 pitches to work with.11
I use the fixed strike zone definition created using the NASA information from Mills (2016b) to determine accuracy rates for the league and for individual umpires. This, of course, assumes that most umpires see a similar distribution of batter heights and pitch locations across a given season, an assumption required for simplicity of the exposition. I also note that the definition used here is more stringent than the one used by MLB. Specifically, umpires are given some cushion around the edge of the strike zone in their Zone Evaluation grades, a practice that began with the implementation of QuesTec. The accuracy rates reported here are, therefore, likely to be underestimated relative to MLB standards.
For our purposes, “true strikes” will be defined as pitches that cross through the two-dimensional strike zone plane at the front of the plate (about 29.4 percent of the called pitch data), while “true balls” will be those that do not cross this plane (about 70.6 percent of the called pitch data). “Correct strikes” will consist of those true strikes that the umpire correctly calls a strike. Finally, “correct balls” will consist of true balls that the umpire correctly calls a ball. These simple classifications are pictured below in Figure 3. These can then be combined to find overall accuracy rates among umpires, as well as accuracy rates specific to pitches within zone and outside of the zone, respectively.
Using these definitions, we see that umpires correctly called 87.8 percent of true strikes and 90.7 percent of all true balls in 2015 (the correct strike rate and correct ball rate, respectively). The overall rate of correct calls was 89.9 percent. However, there is considerable variation in the correct call rates among umpires. Figure 4 below shows the distribution of correct strike and correct ball rates across umpires in 2015. While most umpires center around the mean accuracy rate, there are some extremes in in the distribution and considerable variation in favorability toward batters or pitchers (Fast, 2011c). As an additional exposition, Table 1 then ranks the 10 highest and 10 lowest overall accuracy rates among all 87 major-league umpires with at least 1,000 ball-strike calls in 2015. Interestingly, additional work shows that umpires’ accuracy varies most up in the zone (Lindholm, 2014), though they all call strikes less often in the top and bottom corners (Figure 2).
Umpire |
First Year |
Total Calls |
Correct Strike % |
Correct Ball % |
Accuracy % |
1. Chris Conroy |
2010 |
4,410 |
88.85 |
92.72 |
91.59 |
2. Toby Basner |
2012 |
3,067 |
90.73 |
91.77 |
91.46 |
3. Lance Barksdale |
2000 |
4,164 |
88.25 |
92.56 |
91.28 |
4. Will Little |
2013 |
3,298 |
92.41 |
90.71 |
91.21 |
5. Al Porter |
2010 |
4,543 |
90.04 |
91.67 |
91.20 |
6. Tom Woodring |
2014 |
1,933 |
84.38 |
93.96 |
91.10 |
7. Phil Cuzzi |
1991 |
4,555 |
91.41 |
90.95 |
91.09 |
8. Adam Hamari |
2013 |
5,390 |
92.14 |
90.58 |
91.06 |
9. Quinn Wolcott |
2013 |
3,603 |
90.02 |
91.47 |
91.06 |
10. Ryan Blakney |
2015 |
3,226 |
87.25 |
92.58 |
91.01 |
78. Bruce Dreckman |
1996 |
3,201 |
86.38 |
89.56 |
88.60 |
79. Adrian Johnson |
2006 |
4,501 |
87.05 |
89.22 |
88.60 |
80. Gerry Davis |
1982 |
4,460 |
83.51 |
90.77 |
88.59 |
81. Angel Hernandez |
1991 |
4,955 |
90.21 |
87.93 |
88.56 |
82. Tom Hallion |
1985 |
3,995 |
83.97 |
90.52 |
88.54 |
83. Jerry Layne |
1989 |
1,481 |
85.75 |
89.65 |
88.52 |
84. Ed Hickox |
1990 |
4,027 |
84.89 |
89.99 |
88.45 |
85. Mike DiMuro |
1997 |
2,497 |
84.66 |
89.81 |
88.31 |
86. Paul Nauert |
1995 |
3,804 |
85.28 |
89.13 |
88.07 |
87. Tim Welke |
1983 |
4,344 |
85.15 |
88.51 |
87.52 |
Note: Minimum of 1,000 pitches called in 2015.
Let’s put these numbers in perspective. Each game includes approximately 150 umpire-called pitches, meaning that even the best umpire misses about 12 to 13 pitches each game, presumably evenly against each of the two competing teams in the long run.12 The least accurate umpire makes an additional six incorrect calls per game. And it is important to note that this accuracy rate has improved since the introduction of Zone Evaluation in 2009.13 The net accuracy rates in Table 1 are considerably higher than they were just six years ago. It’s no wonder we see players argue balls and strikes so often.
Explicit accuracy, however, is only one measure of performance for umpires. There are unresolved questions as to whether accuracy itself or consistency of calls is most representative of performance of an umpire. For example, it is entirely possible for one umpire to be more consistent than another but less accurate. To picture this, assume an umpire’s zone center is shifted outside by six inches, but he calls every pitch inside some given boundary a strike, and every pitch outside that boundary a ball. This umpire would be perfectly consistent: there is no uncertainty over the calls he makes. But he would be very inaccurate: many of these calls would be well beyond the outside corner of the plate. On the other hand, picture a second umpire who has his strike zone centered perfectly around the center of the rulebook strike zone, but with “fuzzy” edges (the blending of red dots and yellow triangles we saw in Figure 2). This umpire will inherently have more uncertainty over calls as they move farther away from the center of the zone, resulting in more uncertainty, despite higher levels of accuracy.
In practice, it is unlikely that umpires differ to these extremes, but some players have noted that they would prefer predictability of strike calls over accuracy, making it easier to adjust their expectations (Cyphers, 2012). Peter Bonney takes our understanding of umpire performance in this direction with his article in the 2016 Hardball Times Baseball Annual (2016) using a machine learning algorithm.14 Bonney notes that consistency itself is a skill among umpires that can be separated from simplistic binary accuracy classifications. Given this skill, the question remains as to whether MLB should be encouraging its umpires to be more accurate, more consistent, or both. Some combination of both excellent accuracy and consistency — in academic research we call this validity and reliability — would probably be most optimal, assuming umpires cannot be perfect.
However, the league also needs to be cognizant of other factors related to umpire variability. Specifically, there is the question of uniformity: If all umpires are equally inaccurate and internally consistent, is this better than higher accuracy but less uniformity across umpires? Players may prefer the former, as they would not have to change their expectations of strike probabilities in each game officiated by a different home-plate umpire. Interestingly, there is evidence that umpires have continually become more uniform in their strike calls, particularly since MLB gained centralized control (Mills, 2015a), indicating that there is some higher level of predictability when players come to the plate. But the variation is still nontrivial in size, and it is unclear as to whether MLB incentivizes accuracy over consistency and uniformity in the training of umpires that will always be fallible. And there remain questions on just how good umpires can get. Analysts have begun evaluating whether the new technological systems have, in fact, improved accuracy over time.
4.2 Umpire Performance and Improvements
As noted earlier, the implementation of QuesTec and its successor, Zone Evaluation, was part of a long-term effort by MLB to enforce appropriate ball-strike calls based on the stated rules. Further, there has been a clear push to not only monitor the performance of umpires, but provide feedback and training through more precise measurement of accuracy over time. This raises the question as to whether the evaluation and training effort has been successful. With the release of PITCHf/x data, the answer to this question has been addressed in recent work both academically and among analysts writing for more popular sabermetric outlets.
Recent academic work has looked closely at the changes in performance among umpires during the Zone Evaluation era (Mills, 2016b), showing that based on a fixed definition of the strike zone based on the MLB rules, umpire accuracy increased by 3.65 percentage points from 2008 (85.35 percent correct) to 2014 (89.0 percent correct). Data in the previous section confirm that this path continued through at least 2015, with the overall rate approximately 89.9 percent. Other analysts (Davis and Lopez, 2015) have confirmed the improvement in accuracy, including some familiar names from Table 1 toward the top and bottom of the rankings.
One of the more interesting realizations from this work is that the accuracy rates in the strike zone are largely uncorrelated with postseason assignment, despite their supposed merit-based determination (Lindbergh, 2014). Although the strike zone is not the only aspect of evaluation for umpires when these assignments are made, it is surely the most salient measure of performance, hinting that postseason assignment might have more to do with seniority than merit. This raises the question as to why the monitoring would be successful: If there is no apparent reward or punishment for being a better (worse) ball-strike caller, then why have umpires bothered to try to improve their accuracy? Perhaps umpires are motivated professionals who take pride in their performance. Many people work in jobs that are rewarding in their own right, and calling balls and strikes well may be no different. But there are of course always some actors who need the extrinsic motivation. And theories from economics, psychology, and management generally predict that the simple act of being observed or monitored can reduce mistakes and biases, though there are limits to the success of these systems.
On the other hand, there is some evidence that umpires with less job security might be getting more calls right. Mills (2016b) finds that while all umpires have tended to improve their accuracy, newer umpires have outpaced their older, more experienced counterparts. This pattern is easily seen in Table 1, with eight of the 10 most accurate umpires making their debut in 2010 or later, and only one of the least accurate 10 making their debut after 1997 (Adrian Johnson in 2006). While sabermetricians do not have legal access to medical examinations of umpires’ eyes,15 it seems reasonable to expect that younger umpires — likely with higher average visual acuity — would perform better overall at the ball-strike-calling task. And these younger umpires are often competing for permanent contracts, introducing an additional incentive to perform at a high level.
Despite the noted improvement in overall accuracy, there are important caveats to the measured accuracy improvements. On an episode of Real Sports, economist Tobias Moskowitz noted that accuracy at the edges of the strike zone is considerably worse than the overall accuracy rate reported here and in many sabermetric studies. After all, pitches directly down the middle or well outside the strike zone are easy to judge even by a non-expert, and umpires call nearly all of these pitches correctly. This means the error comes from the fewer number of pitches at the edge of the strike zone where uncertainty over the location rises. Moskowitz estimates that within two inches of the edge of the strike zone umpires make less than 70 percent of calls correctly. Therefore, in judging accuracy by using all pitches, there is likely some information loss in using aggregate accuracy rates.
On the other hand, the accuracy increase found in Mills (2016b) and Davis and Lopez (2015) largely comes from higher strike rates in the bottom of the strike zone, between 18 and 21 inches (Roegele, 2013; Mills, 2016a). In previous seasons, umpires were especially poor at calling low pitches in the zone as strikes. This has left an imbalanced improvement in accuracy on balls and strikes. Umpires have improved their correct strike rate from approximately 78.7 percent to 87.8 percent in 2015, while correct ball rates improved only from 88.4 percent to 90.7 percent. In other words, the improvement is coming from expanding the strike zone in places where umpires almost never called strikes before. While a general increase in accuracy could point to neutral impacts on the game — and higher levels of consistency — the relative changes in these two rates could result in other effects on the game, like run scoring. As umpires call more strikes low in the zone, hitters fall behind more often in the count and are put at a disadvantage in the at-bat, likely resulting in worse offensive outcomes. The findings on the changes in the strike zone have, therefore, led to closer inspection of these effects in the analytics space to identify contributors to the decline in offense that began in the 2000s after the Steroid Era.
4.3 How Umpires Affect the Game
Given that umpires have been known to change their behavior — and as we will see in the next section, have various biases in strike calling — there are likely to be effects on the scoring environment in the game. Indeed, one of the primary complaints from MLB nearing the 1999 labor dispute was that strike zones were becoming too small and may have contributed to the large increases in scoring during the Steroid Era. Recent work has therefore shown substantial interest in estimating the influence of umpires on the run-scoring environment. If umpires have been a large contributor to inflated (or deflated) run scoring at certain points over the past 30 years, then this could weaken evidence that the Steroid Era scoring outburst was actually driven by steroids alone. This sort of revelation could presumably curb judgment related to certain players and their use of steroids prior to enforcement of any official policy on the matter.
Official enforcement of steroid policies at the MLB level began in 2006 through the Joint Drug Prevention and Treatment Program,16 but it turns out that this was not the beginning of the reduced scoring we see in today’s game. In fact, work by Rader and Winkle (2008) and Mills (2016b) showed that called-strike rates increased dramatically shortly after the installation of QuesTec, and run scoring decreased precipitously at the same time. From 2000 through 2002, MLB scoring decreased by 10 percent while called strike rates increased by 5 percent. While we do not have access to the QuesTec data itself, this was strong preliminary evidence that the so-called Steroid Era actually started to wane in 2001,17 right about when we saw strike-zone enforcement become more uniform across the major leagues.18
More recently, PITCHf/x data have helped analysts understand the continued dramatic decrease in run scoring since 2008. Let’s return to the extension of the zone downward from the previous section. In 2008 a pitch at the border of the hollow beneath the knee would have been called a strike approximately 25 percent of the time. By 2015, that number was more than 60 percent. Although umpires have reduced the amount of outside pitches they call strikes, the change in the bottom of the zone has resulted in a net increase in the size of the strike zone plane.19
For example, Mills (2016a) measured the strike zone as that area inside the boundary at which a pitch is equally likely to be called a ball or a strike. The area within this elliptical shape grew by 34 inches for right-handed batters and 30 inches for left-handed batters between 2008 and 2014. Roegele (2014a; 2014b) presents similar results. The effect is approximately equivalent to the size of six or seven baseballs lined up side-by-side across the bottom of the zone. Figure 5 shows the strike zone in 2008 compared with 2014 empirically derived from Mills (2015a) using a generalized additive model. The downward extension continued in 2015. As with Figure 2, the strike-zone border is defined as the point at which a ball and a strike are equally likely to be called.
Given a larger zone, and especially one where the increase comes in the low portion of the zone, any astute baseball fan would expect impacts on batters’ ability to get hits and teams’ chances to score runs. Indeed, both Mills (2016a) and Roegele (2014a) — as well as Ben Lindbergh (2014) — show that this increased strike-zone area has accounted for approximately 20 to 40 percent of the decrease in run scoring across MLB from 2009 through 2014. In 2008, two teams scored a combined average of 9.32 runs per nine innings. However, that number had decreased to 8.13 by 2014, a reduction of nearly 1.2 runs per game.20 Of that 1.2 runs, estimates reveal that between 0.3 and 0.5 runs per game are directly attributable to umpire ball-strike calls. In other words, there are negative effects on the expected runs scored as umpires call more and more strikes. Throughout the game, this will put batters at a marginally higher disadvantage. For example, these results suggest that batters move from a 1-and-1 count to a 1-and-2 count instead of a 2-and-1 count on pitch 18 inches off the ground more often than they did before. Given that batters perform much worse in 1-and-2 counts, the result ends up being a net decrease in offense.
What’s more, if batters and pitchers change their behavior strategically across seasons in response to umpire changes, then the regression method used in the paper may actually be conservative. Interestingly, however, Mills (2015a) shows that while umpires had an even larger effect on the reduction in walks across the league (as much as 71 percent of the reduction in walks per nine innings pitched), the effect on strikeouts was almost completely negligible at 3 to 9 percent of the overall increase in strikeouts per nine innings pitched.21 In other words, high-velocity pitchers seem to be having an impact on some batter outcomes independent of these umpire changes.
Much of the work on umpire strike zone changes — and its apparent impact on scoring in baseball — got the attention of MLB, as reported by Jeff Passan (2015). As of 2015, there has been talk of enforcing new changes to the strike zone in an attempt to increase scoring back to levels more preferred by fans: a specific instance of where analytics have made clear contributions to policy considerations. Interestingly, scoring increased dramatically at the end of 2015 and through 2016, largely from an increase in home runs. But no official announcements were made regarding a change to the strike zone. This intrigued analysts, and Mills (2016c), Sullivan (2016), and Roegele (2015a; 2016) returned to the data to investigate changes in the zone. Each of these analysts found some evidence of small changes to the strike zone.
Figure 6 shows the differences in called strikes prior to and after the 2015 All-Star Game, when the changes began. As you can see, umpires have become less likely to call pitches low and away to both right- and left-handed batters, but show some increase in the propensity to call strikes up in the zone and inside to righties. Yet, the reduction in strike rate on pitches most difficult to hit — very low and away — could mean umpires exerted a new influence on offensive output.
Further, Mills (2016c) shows that pitchers and batters recognized this small change and changed their own behavior accordingly. However, after a comprehensive analysis of the effect of these changes on hitters’ exit-velocity and home-run rates — a clear precursor to run scoring — there is scant evidence that these more recent changes have accounted for any of the sharp uptick in scoring since the 2015 All-Star break. In the coming seasons it will be important to keep an analytic eye on ball-strike calls to see if there is a continued trend toward a smaller zone, and how MLB might balance this with its attempt to reduce the length of games.
5. The Analytics of Umpire Biases
Some of the most interesting findings from analyzing umpires are related to biases in strike-zone calls. These biases generally fall under four different areas: 1) home-field advantage, 2) count-based and recent call biases, 3) superstar biases, and 4) framing and visual influences. A few individual works cover various biases found across these categories (Mills, 2014; Roegele, 2013; Turkenkopf, 2008). However, with revealed biases come various competing interpretations depending on the analyst or academic investigator. This section will break each down into their relevant contribution to the analytics of umpire performance in the strike zone and discuss the various ways to interpret the findings.
5.1 Home-Field Advantage
Home-field advantage is a well-documented phenomenon across various sports. MLB home teams won approximately 53 percent of the time in 2016, and only five of 30 teams had a higher winning percentage on the road than at home.22 Contributors to this disparity may include fatigue and travel effects, team choices to rest top players on the road, spillover from excitement from the home crowd, comfort with the home park and its playing surfaces, and of course bias by umpires. Thanks to the availability of data, this last influence has been tested by various analysts.
Economist Tobias Moskowitz and writer L. Jon Wertheim were among the first to document ball-strike-call bias in favor the home team in their popular book Scorecasting (2011). The work in Scorecasting begins by first evaluating strike-rate data from the QuesTec era. The authors find that there are more called strikes or balls called in favor of the home team, particularly in situations that are crucial to the outcome of the game. They continue by comparing the home-field advantage effect in stadiums with and without QuesTec, revealing that when umpires were monitored by the QuesTec system, the difference in ball-strike rates actually reversed: the strike-rate difference turned in favor of the visitors. In other words, when umpires were being closely watched, they no longer showed bias in favor of the home team, and perhaps overcompensated to avoid doing so. By counting up all the extra favorable calls that home teams get, and multiplying those by the changes in expected runs scored when transitioning from one ball-strike count to the next, Moskowitz and Wertheim propose that this umpire bias results in about 7.3 additional runs for the home team over the course of the season. They note that this accounts for more than two-thirds of the entire home-field advantage observed in the data.
John Walsh in the 2011 Hardball Times Baseball Annual also finds the existence of umpire contributions to home-field advantage, calculating its effect as only one-third of the total run differential. And shortly after the release of Scorecasting, Jesse-Douglas Mathewson — writing for the sabermetric website Beyond the Box Score — finds results consistent with umpire ball-strike-call bias, too (2010a). However, there was even more disagreement on the size of the impact. Mathewson estimated that umpires account for only about 16 percent of the effect. It is important to note, however, that both Mathewson and Walsh do not break the data down for parks with and without QuesTec monitoring, and in fact use PITCHf/x data from seasons where this data is used to directly monitor umpires. So, assuming monitoring works well, it may simply be that much of the effect found in the QuesTec years — specific to non-QuesTec parks — is no longer apparent in the Zone Evaluation era. However, Mills (2014) uses a larger sample of data from 2008 through 2010, and still finds evidence of home-team bias even in the face of Zone Evaluation monitoring. Controlling for location and a host of other factors, the odds of a strike call on a pitch thrown by a home pitcher were about 7 percent higher than for a visiting pitcher, a nontrivial effect. Taken together, there is clear evidence for a home-field advantage effect.
Yet, while many fans may malign the idea of umpire bias and home bias as a net negative to the integrity of the game, there is the possibility that the league has little incentive to enforce more equal calls. Specifically, fans tend to enjoy seeing winning teams when they play at home. This gives the league an incentive to have teams win more at home than on the road to maximize their revenues through increased fan interest (a proposition that echoes Price, Remer, and Stone [2012] in their work studying the National Basketball Association). This wouldn’t necessarily affect the outcome of the season: all teams play 81 home and 81 away games so — leaving aside unbalanced scheduling — the bias should even out in the long run. Given this, any decrease in the propensity for home bias (if it exists) since the implementation of Zone Evaluation may not actually be optimal for MLB, despite its positive impact on the accuracy rates of umpires. But the impact on revenues is likely to be rather small relative to any loss in confidence from fans in the integrity of calls made by umpires down on the field. Still, there are other apparent biases that are not likely to play out this way.
5.2 The Ball-Strike Count and Make-Up Calls
Anyone familiar with baseball has heard of umpires widening their zone to speed up a game when a pitcher is struggling to hit the plate. As with home bias, PITCHf/x data has given us the tools to test whether umpires in fact behave in this way. Jonathan Hale (2009) and Dave Allen (2009a) were two of the first analysts to find the strike-zone variation across ball-strike counts. It turns out that umpires do in fact widen their zone when the batter is ahead in the count, and shrink it considerably when the pitcher is ahead. Ultimately, this behavior tends to extend at-bats longer than they otherwise would go. Experts have noted that the behavior is consistent with known behavioral biases, such as impact aversion or compassion and inequality aversion (Green & Daniels, 2015; Walsh, 2010; Mills, 2014). Impact aversion implies that umpires would prefer to “let the players play” than determine the outcome of an at-bat. Alternatively, umpires might have interest in making batters and pitchers more equal in the ball-strike count in some act of compassion.
Figure 7 shows the considerable differences in the size of the zone in 0-and-2 counts, where the pitcher is well ahead, and 3-and-0 counts when the batter is greatly advantaged. The area of the strike zone — measured as the 50 percent contour from before — is decreased by 117.4 square inches for left-handed batters and 124.7 square inches for right-handed batters when going from the 3-and-0 count to the 0-and-2 count. Ultimately, this can give players who tend to get behind in the count an undue performance advantage. This difference is pictured below in Figure 7.
While the idea that umpires act compassionately when calling balls and strikes may be intriguing, it doesn’t seem consistent at first glance with their stated job definition: to get the call right. But as it turns out, it may be possible to reconcile this behavior with the goal of accuracy. Analysts have shown that the zone-size change behavior may be rational from an accuracy-maximizing standpoint using Bayes Theorem (Moore, 2009). Green and Daniels (2015) present a similar proposition.
Bayes Theorem rears its head in many fun applications of statistics, most notably in the Monty Hall problem featured in the 2008 movie 21.23 Specifically, Bayes tells us that we can more accurately assess the probability of an event by integrating prior knowledge about the likelihood that event takes place. So, if umpires know that in 0-and-2 counts the pitcher is more likely to throw a pitch outside of the strike zone — as pitchers are wont to do — then they know that calling a ball on pitches over which they’re most uncertain may actually be their best bet. After all, determining whether a projectile traveling at 100 miles per hour crosses through an imaginary three-dimensional box is an inherently difficult and uncertain task. Even subconscious heuristics might be helpful in making the correct call.
Molyneux (2016) breaks down the data to show that the accuracy-rate data on pitches inside and outside of the strike zone are consistent with this Bayesian umpire hypothesis. In other words, umpires change their decision rule — whether to “lean” ball or “lean” strike — based on their prior knowledge about the probability that a pitch will be in the strike zone in the first place. Umpires have a lot of experience with this, and likely have a well-defined prior expectation for these probabilities. By “biasing” themselves, they can (admittedly counterintuitively) increase their accuracy rates.
The existence of makeup calls, however, would imply at least some level of compassion among umpires wanting to “make it right” after making what they know to have been an incorrect call. This behavior would not necessarily be consistent with Bayes. Fans certainly assume that umpires do this, yet much of this was subjective prior to data availability. But with this more recent data, there is some evidence for this phenomenon, with the size of the effect depending on the egregiousness of the previous incorrect call (Moskowitz & Wertheim, 2011). The jury is still out on the precise reason for the changes across ball-strike counts, but the influence of partially unconscious biases or knowledge continues to be especially interesting to academics.
One of these biases is known as the Gambler’s Fallacy, defined by an individual’s decision being negatively correlated with their past decisions. For example, imagine a roulette player. While at the casino, red has come up eight times in a row, and the gambler says, “It has to come up black this time,” and puts all of her money on black. This is an irrational act: black is no more likely to come up than it has on any other spin of the wheel. This is a good way to lose all of your money at the casino, but most people can succumb to this type of problematic rationalization in many instances. It turns out that umpires tend to have negatively correlated ball and strike calls as well, though this effect was relatively small (Chen, Moskowitz, & Shue, 2016). In other words, if an umpire made a strike call on a previous pitch — holding constant the ball-strike count and the pitch’s location — then he is more likely to call the next one a ball. Similarly, if he called a ball on a previous pitch, the next one is more likely to be called a strike. While this doesn’t mean the umpire is going to be losing thousands of dollars, it could mean that game outcomes are altered by these fallacies. And game outcomes lead to significant salary implications for players. The question is then: If umpires are subject to such biases, could they be subject to other biases that privilege certain players over others and costing some players millions?
5.3 Race and Status Bias
While home bias, inequality aversion, or compassion tend to be relatively harmless in practice, other subconscious biases can be a bit more problematic. While these biases could have negative impacts on individual players, they are not necessarily insidious in nature. For example, Mathewson (2011) showed that strike zones can vary considerably across pitchers, costing (or gaining) their teams precious runs, inviting the question as to whether there are identifiable causes of these discrepancies. In the years since, various analysts and academics have been particularly interested in umpire biases that could ultimately favor some players over, and why, with important impacts in the labor market for baseball players.
One of the more prominent investigations into umpire bias came from academic economists interested in racial discrimination (Parsons, Sulaeman, Yates, & Hamermesh, 2011). This work made the claim that umpires were more likely to call strikes for pitchers when their races were the same. Interestingly, they note that this bias is reduced when umpires are monitored either by QuesTec or by larger crowds at the game, a finding consistent with bias reduction found after monitoring in previous work. However, other researchers expanded the inquiry with larger and more comprehensive data sets, finding very little evidence of overt or implicit bias among umpires (Tainsky, Mills, & Winfree, 2015; Hamrick & Rasp, 2015). These authors note that much of the influence of the measured effect seemed to stem from very few minority umpires in the data set. Ultimately, if any bias does exist, it seems to be rather small as a whole.
Additional biases have been shown to be related to player star power and experience. Mathewson’s (2011) work pointed toward the existence of favorable calls for higher-caliber and well-known pitchers. Indeed, Mills (2014) expands the investigation, estimating the impact of player age (experience) and performance (prominence) on the ball-strike calls of umpires. This work revealed evidence that younger batters got more strike calls, and younger pitchers more ball calls, after controlling for location. Higher-performing or better known players saw similar outcomes. Specifically, for each additional win above replacement, a pitcher was 3 percent more likely to receive a strike call on a given pitch.24 And for each additional year of age, a pitcher was 2 percent more likely to receive a strike call on a given pitch. Each of these estimates controls for location, pitch type, ball-strike count, and other factors. Further, higher-performing teams, as a whole, were also found to receive favorable ball-strike calls for their pitchers.
Kim and King (2014) find similar results, noting that if a player was an All-Star in the previous season, he was more likely to receive favorable calls from the umpire. The results are reflective of a phenomenon known as the Matthew Effect, or the propensity for people to privilege high-status individuals external to the quality of the outcomes produced. These sorts of biases could enhance disparities in contracts for high-performing and low-performing players and make it more difficult for rookies to make a name for themselves. Although the bias is relatively small, it makes excellent rookie campaigns all the more impressive: Mike Trout was likely receiving fewer favorable calls than his teammate Albert Pujols when he posted a 10.5 WAR rookie season.
5.4 Other Strike Call Influences
Despite the interesting results related to psychological biases found in past work, it can be difficult to disentangle these from other characteristics of umpires’ tasks behind the plate. For example, with star power and experience come higher levels of game play. Mariano Rivera and Greg Maddux were known to get the benefit of the doubt on many calls, but they also have another characteristic in common: They were both control artists, rarely missing their intended spot in the strike zone (Hale, 2007b; Allen, 2009b). Rivera was able to hit his spot like few others, painting corners with precision that is visible in the data. While data are not available for Maddux in his prime, he was known as a control artist as well. By hitting the catcher’s glove without movement, these pitchers probably made it easier for the umpire to lean toward calling any given pitch a strike, using the still glove as a heuristic for their calls. It would be particularly interesting to assess how glove movement affects umpires’ calls, but the data that has been collected is not publicly available.25 Without this information, however, various analysts have found ways to measure catcher framing as a skill influencing the probability that umpires call strikes on borderline pitches (Fast, 2011f; Pavlidis & Brooks, 2014; Judge, Pavlidis, & Brooks, 2015; Carleton, 2016b; Carleton, 2016c).
Pitch type and velocity also tend to affect the probability of strike calls. For example, while fastballs are much more likely to be called strikes than off-speed pitches like curveballs and sliders. But the velocity of a pitch actually reduces the likelihood that it will be called a strike, holding constant location (Mills, 2014; Mathewson, 2010b). Despite the competing effects, this is not especially surprising, given the task at hand. Pitches traveling faster or with more movement will be more difficult to judge, and therefore might not look as much like a strike as something coming straight through the strike zone at a slightly slower speed. A portion of the discrepancy with respect to velocity could be related to higher-velocity pitchers not being control artists like Maddux and Rivera. Interestingly, there is little work investigating the accuracy of these calls, which might be more helpful in understanding how umpire accuracy keeps up with the ever-increasing average velocity of pitches in the major leagues and how these characteristics affect our estimates of umpire biases discussed here.
6. The Future of Umpire Analytics and the Automated Strike Zone
While this article has largely focused on analytics of the strike zone, analysis of other umpire behaviors could be just as interesting. As noted earlier, data availability limit these investigations, but recent implementation of instant replay provides useful data on overturned calls and biases that may result (such as home-field advantage) when replay is used. It is possible to evaluate umpires on the rate at which their replayed calls are overturned. Pairing this sort of analysis with video and expertise, along with more years of data, it is possible to develop an understanding of how well umpires make out and safe calls at the bases, or judge shoestring catches and fair or foul balls. It will be interesting to identify whether performance on these types of calls is correlated in any way with performance in the strike zone. A lack of (or negative) correlation in these skill sets could reveal why certain umpires are assigned to postseason games despite showing relatively low accuracy rates in the strike zone.
Additionally, a move to Statcast has allowed MLB to track everyone on the field during a game. Umpires’ positioning and movement is especially important in evaluation, and this data could be used to begin new types of analytic understanding of umpires’ ability to be in the right place at the right time. In fact, lessons currently being learned from Statcast data about outfielder routes could be applied in the umpire context to assess not only speed, but efficiency of movement. Of course, without publicly available data, much of this analysis will remain unseen by most of us.
We should also be careful in analyzing performance as outsiders, particularly in instances when we do not fully understand the expectations of umpires. In revealing various mistakes and biases among umpires, analysts and fans and analysts have begun to ask why the league does not just replace umpires with the machines that monitor them (Shultz, 2015; Frankel, 2016). In the pursuit of perfection in ball-strike calls, this would probably be the best choice. And it would mostly resolve the consistency and uniformity issues that can plague balls and strikes. Umpires may scoff at this idea, but human umpires would not necessarily have to be replaced. They’re still needed to call out and safe at home, balks, fair and foul balls, and maybe for players to argue with when they don’t agree with Robo Ump. In this case, there would be just as many umpires as before and nobody would be out of a job. Umpire-union resistance would therefore seem misguided.
There are, of course, important caveats related to robot umpires that often go overlooked. Fans should remember that PITCHf/x and its successor, Trackman, are still fallible, and have their own problems with consistency across ballparks or operators drawing in the top and bottom of the strike zone for each batter. That doesn’t mean this would make things worse, but it would not necessarily make things better. And the accuracy rates reported in this article could be underestimated due to these errors. What would a player do if a machine was miscalibrated and continued to call balls on pitches down the middle? What if the calibration is only slightly off and goes unseen? Can we rely on internal consistency in place of accuracy and uniformity? And where would all the fun we get from watching players arguing balls and strikes go?
But perhaps most importantly, the net impact on run scoring and game strategy is unclear with an automated strike zone. The league would certainly need to change its official definition and measurement if robot umpires took over for calling balls and strikes. Currently, umpires do not call strikes in the top and bottom corners very often, while a PITCHf/x robot would ring up a batter on all of them under the current definition. Although some of the zone growth would be offset by removing outside pitches previously called strikes, the increased strikes in the zone would come in areas where batters tend to get poor results. This could reduce run scoring substantially. On the other hand, if batters know what will and will not be called a strike, they may be able to focus more closely on the areas in which they should swing and make better contact on average. The net effect is, therefore, ambiguous.26 Substantial research needs to take place before making a move like this. Lucky for us, umpire analytics experts are probably already working to understand this issue in more detail.
BRIAN M. MILLS is an Assistant Professor at the University of Florida. His research encompasses sports analytics and managerial sports economics, with a focus in labor and personnel economics and industrial organization. Brian is also the creator of a course on pitching analytics at DataCamp called “Exploring Pitch Data with R.” He holds a PhD and MA in Sport Management, an MA in Economics, and an MA in Statistics from the University of Michigan. Prior to his time in Ann Arbor, Brian earned a BA in Psychology from St. Mary’s College of Maryland where he played Division III baseball. He still holds out hope of being the inspiration for a sequel to The Rookie once he gets his elbow put back together.
Sources
Allen, D. (2009a). Does the umpire know the count? Baseball Analysts. Retrieved January 7, 2017, from: baseballanalysts.com/archives/2009/04/the_effect_of_t.php.
Allen, D. (2009b). Mariano Rivera: Another appreciation. Baseball Analysts. Retrieved January 8, 2017, from: baseballanalysts.com/archives/2009/10/mariano_rivera.php.
Associated Press (2003). Schilling calls QuesTec system a joke. ESPN Baseball. Retrieved December 31, 2016, from: a.espncdn.com/mlb/news/2003/0525/1558965.html.
Baggett, A. (2015). Conceptualizing the MLB strike zone using PITCHf/x data. Exploring Baseball Data with R. Retrieved January 8, 2017, from: baseballwithr.wordpress.com/2015/02/17/conceptualizing-the-mlb-strike-zone-using-pitchfx-data/.
Bonney, P. (2016). Who watches the watchers? Introducing umpire consistency score. In The Hardball Times Baseball Annual. Joe Distelheim, Jason Linden, Greg Smions, & Paul Swydan (eds.), Fangraphs & The Hardball Times.
Brooks, D. Brooks Baseball: PITCHf/x tool. Retrieved December 19, 2016, from: brooksbaseball.net/pfxVB/pfx.php.
Callan, M. (2012). Called out: The forgotten baseball umpires strike of 1999. The Classical. Retrieved December 19, 2016, from: theclassical.org/articles/called-out-the-forgotten-baseball-umpires-strike-of-1999.
Carleton, R. (2016a). Baseball therapy: The knee. Baseball Prospectus. Retrieved January 11, 2017, from: baseballprospectus.com/article.php?articleid=29358.
Carleton, R. (2016b). Baseball therapy: The dark side of pitch framing? Baseball Prospectus. Retrieved January 11, 2017 from: baseballprospectus.com/article.php ?articleid=28350.
Carleton, R. (2016c). Baseball therapy: Framing the at-bat. Baseball Prospectus. Retrieved January 11, 2017, from: baseballprospectus.com/article.php?articleid=29292.
Chass, M. (1999). Umpires giveth and taketh. New York Times. Retrieved December 19, 2016, from: nytimes.com/1999/03/10/sports/baseball-umpires-giveth-and-taketh.html.
Chass, M. (2004). Baseball and umpires settle grading dispute. New York Times. Retrieved December 31, 2016, from: query.nytimes.com/gst/fullpage.html?res=9E06E6 DB1E30F937A15751C1A9629C8B63.
Chen, D.L., Moskowitz, T. J., & Shue, K. (2016). Decision making under the gambler’s fallacy: Evidence from asylum judges, loan officers, and baseball umpires. Quarterly Journal of Economics. DOI: 10.1093/qje/qjw017.
Cyphers, L. (2012). Players and umps think QuesTec stinks, they don’t know the half. ESPN. Retrieved December 19, 2016, from: espn.com/espn/magazine/archives/ news/story?page=magazine-20030804-article18.
Davis, N. & Lopez, M. (2015). Umpires are less blind than they used to be. FiveThirtyEight. Retrieved December 22, 2016, from: fivethirtyeight.com/features/umpires-are-less-blind-than-they-used-to-be/.
Fast, M. (2010). The internet cried a little when you wrote that on it. The Hardball Times. Retrieved December 20, 2016, from: hardballtimes.com/the-internet-cried-a-little-when-you-wrote-that-on-it/.
Fast, M. (2011a). Spinning yarn: How accurate is PitchTrax? Baseball Prospectus. Retrieved January 8, 2017, from: baseballprospectus.com/article.php?articleid=13109.
Fast, M. (2011b). Spinning yarn: Home plate umpire positioning. Baseball Prospectus. Retrieved January 8, 2017, from: baseballprospectus.com/article.php?articleid=14951.
Fast, M. (2011c). NLCS umpire charts and data. Baseball Prospectus. Retrieved January 8, 2017, from: baseballprospectus.com/article.php?articleid=15269.
Fast, M. (2011d). Spinning yarn: The real strike zone Part 1. Baseball Prospectus. Retrieved December 20, 2016, from: baseballprospectus.com/article.php?articleid= 12965.
Fast, M. (2011e). Spinning yarn: The real strike zone Part 2. Baseball Prospectus. Retrieved December 20, 2016, from: baseballprospectus.com/article.php?articleid= 14098.
Fast, M. (2011f). Spinning yarn: Removing the mask encore presentation. Baseball Prospectus. Retrieved January 8, 2017, from: baseballprospectus.com/article.php?articleid =15093.
Frankel, J. (2016). Interview with Jerry Crawford. Real Sports with Bryant Gumbel, Episode 234. September 27, 2016.
Green, E.A. & Daniels, D.P. (2015). Impact aversion and arbitrator decisions. SSRN Working Paper, January 19, 2015: papers.ssrn.com/sol3/papers.cfm?abstract_id=2391558.
Guziec, A. (2002). Tracking pitches for broadcast television. IEEE Computer, 35, 38-43.
Hale, J. (2007a). A zone of their own. The Hardball Times. Retrieved December 20, 2016, from: hardballtimes.com/a-zone-of-their-own/.
Hale, J. (2007b). A gentle massage. The Hardball Times. Retrieved January 8, 2017, from: bjays.wordpress.com/archives/a-gentle-massage/.
Hale, J. (2009). Strikeouts are fascist (walks, too). The Mockingbird. Retrieved January 7, 2017, from: bjays.wordpress.com/2009/01/03/strikeouts-are-fascist-walks-too/.
Hamrick, J. & Rasp, J. (2015). The connection between race and called strikes and balls. Journal of Sports Economics, 16, 714-734.
Janssen to speak at IHCC banquet. (2007). Retrieved January 11, 2017, from: dailyiowegian.com/janssen-to-speak-at-ihcc-banquet/article_1ef5cce5-78c2-5e24-b4f1-64ac3a8d1605.html.
Judge, J., Pavlidis, H., & Brooks, D. (2015). Moving beyond WOWY: A mixed approach to measuring catcher framing. The Hardball Times. Retrieved January 8, 2017, from: baseballprospectus.com/article.php?articleid=25514
Kalk, J. That was a strike? The Hardball Times. Retrieved December 20, 2016, from: hardballtimes.com/that-was-a-strike/.
Karegeannes, J. (2004). Confessions of a QuesTec operator: How the system works, how it can be improved. Baseball Prospectus. Retrieved December 19, 2016 from: baseballprospectus.com/article.php?articleid=3326.
Kim, J.W. & King, B.G. (2014). Seeing stars: Matthew effects and status bias in Major League Baseball umpiring. Management Science, 60, 2619-2644.
Lang, E. (2015). Analyzing the strike zone as a three-dimensional volume. The Hardball Times. Retrieved December 22, 2016, from: hardballtimes.com/analyzing-the-strike-zone-as-a-three-dimensional-volume/.
Lindbergh, B. (2014). Rung up: Are postseason umpires actually baseball’s most accurate? Grantland. Retrieved December 22, 2016, from: grantland.com/the-triangle/postseason-umpires-mlb-accurate-joe-west/.
Lindholm, S. (2014). How well do umpires call balls and strikes? Beyond the Box Score. Retrieved January 8, 2017, from: beyondtheboxscore.com/2014/1/27/ 5341676/how-well-do-umpires-call-balls-and-strikes.
MLB.com. (2007). 2.00 Definitions of terms. Retrieved January 11, 2017, from: mlb.mlb.com/mlb/downloads/y2007/02_definitions_of_terms.pdf.
Mata, M. (2015). On the nature of the strike zone in two and three dimensions. The Hardball Times. Retrieved December 22, 2016, from: hardballtimes.com/on-the-nature-of-the-strike-zone-in-two-and-three-dimensions/.
Mathewson, J.D. (2010a). Benefit of the doubt: Odd patterns in umpire compensation. Beyond the Box Score. Retrieved January 7, 2017, from: beyondtheboxscore.com/ 2010/ 12/24/1892898/benefit-of-the-doubt-odd-patterns-in-umpire-compensation.
Mathewson, J.D. (2010b). Benefit of the doubt: How pitch speed and movement affect the zone. Beyond the Box Score. Retrieved January 8, 2017, from: beyondtheboxscore .com/2010/12/15/1877296/benefit-of-the-doubt-how-pitch-speed-and-movement-affect-the-zone.
Mathewson, J.D. (2011). Benefit of the doubt: Mo and the wide zone. Beyond the Box Score. Retrieved January 8, 2017, from: beyondtheboxscore.com/2011/2/9/1970784/ benefit-of-the-doubt-mo-and-the-wide-zone.
Mills, B.M. (2011). Data quality in Pitch f/x. The Prince of Slides. Retrieved December 22, 2016, from: princeofslides.blogspot.com/2011/03/data-quality.html.
Mills, B.M. (2014). Social pressure at the plate: Inequality aversion, status, and mere exposure. Managerial and Decision Economics, 35, 387-403.
Mills, B.M. (2015a). Expert workers, performance standards, and on-the-job training: Evaluating Major League Baseball Umpires. SSRN Working Paper. Retrieved December 31, 2016, from: papers.ssrn.com/sol3/papers.cfm?abstract_id=2478447.
Mills, B.M. (2015b). Measuring strike zone contour areas. Exploring Baseball Data with R. Retrieved January 8, 2017, from: baseballwithr.wordpress.com/2015/05/12/ measuring-strike-zone-contour-areas/.
Mills, B.M. (2016a). Policy changes in Major League Baseball: Improved agent behavior and ancillary productivity outcomes. Economic Inquiry. DOI: 10.1111/ecin.12396.
Mills, B.M. (2016b). Technological innovations in monitoring and evaluation: Evidence of performance impacts among Major League Baseball umpires. Labour Economics. DOI: dx.doi.org/10.1016/j.labeco.2016.10.004.
Mills, B.M. (2016c). Are the umpires at it again? The Hardball Times. Retrieved December 31, 2016, from: hardballtimes.com/are-the-umpires-at-it-again/.
Molyneux, G. (2016). Umpires aren’t compassionate, they’re Bayesian. Baseball Prospectus. Retrieved January 7, 2017, from: baseballprospectus.com/article.php? articleid= 28513.
Moore, C. (2009). Bayesian umpires. Baseball Analysts. Retrieved January 7, 2017, from: baseballanalysts.com/archives/2009/12/bayesian_umpire.php.
Moskowitz, T.J. & Wertheim, L.J. (2011). Scorecasting. New York: Crown Archetype.
NASA. (2000). Human integration design handbook. Retrieved February 4, 2014, from: msis.jsc.nasa.gov/sections /section03.htm.
Nathan, A., Kensrud, J., Smith, L., & Lang, E. (2014). Testing TrackMan. Retrieved January 11, 2017, from: baseballprospectus.com/article.php?articleid=23202 #commentMessage.
O’Neill, D. (1990). Umpires are victimized by lockout, too. Chicago Tribune. Retrieved March 20, 2014, from: articles.chicagotribune.com/1990-03-18/sports/9001230580_ 1_umpires-spring-training-lockout-dave-phillips.
Parsons, C.A., Sulaeman, J., Yates, M.C., & Hamermesh, D.S. (2011). Strike three: Discrimination, incentives, and evaluation. American Economic Review, 101, 1410-1435.
Passan, J. (2015). MLB could alter strike zone as response to declining offense. Yahoo! Sports. Retrieved December 31, 2016, from: sports.yahoo.com/news/sources—mlb-could-alter-strike-zone-as-response-to-declining-offense-232940947.html.
Pavlidis, H. & Brooks, D. (2014). Framing and blocking pitches: A regressed, probabilistic model. Baseball Prospectus. Retrieved January 8, 2017, from: baseball prospectus.com/article.php?articleid=22934.
Petti, B. (2016). Developing the baseballr package for R. The Hardball Times. Retrieved December 20, 2016, from: hardballtimes.com/developing-the-baseballr-package-for-r/.
Price, J., Remer, M., & Stone, D.F. (2012). Subperfect game: Profitable biases of NBA referees. Journal of Economics and Management Strategy, 21, 271-300.
Rader, B.G. & Winkle, K.J. (2008). Baseball’s great hitting barrage of the 1990s (and beyond) reexamined. NINE: A Journal of Baseball History and Culture, 17, 70-96.
Roegele, J. (2014a). The strike zone during the PITCHf/x era. The Hardball Times. Retrieved December 31, 2016, from: hardballtimes.com/the-strike-zone-during-the-pitchfx-era/.
Roegele, J. (2014b). The strike zone expansion is out of control. The Hardball Times. Retrieved December 31, 2016, from: hardballtimes.com/the-strike-zone-expansion-is-out-of-control/.
Roegele, J. (2015a). The expanded strike zone: It’s baaaack. The Hardball Times. Retrieved December 31, 2016, from: hardballtimes.com/the-expanded-strike-zone-its-baaaack/.
Roegele, J. (2015b). The commissioner speaks: Imagining a redefined strike zone. The Hardball Times. Retrieved December 31, 2016, from: hardballtimes.com/the-commissioner-speaks-imagining-a-redefined-strike-zone/.
Roegele, J. (2016). The 2016 strike zone. The Hardball Times. Retrieved December 31, 2016, from: hardballtimes.com/the-2016-strike-zone/.
Sievert, C. (2014). Taming PITCHf/x data with XML2R and pitchRx. The R Journal, 6, 5-19.
Steiner, N. (2009). Measuring the umpire’s effect on the game. The Hardball Times. Retrieved December 31, 2016, from: hardballtimes.com/tht-live/measuring-the-umpires-effect-on-the-game/.
Tainsky, S., Mills, B.M., & Winfree, J.A. (2016). Further examination of potential discrimination among MLB umpires. Journal of Sports Economics, 16, 353-374.
Tango, T. (2015). Evaluating the effectiveness of an umpire … effectively. Tangotiger Blog. Retrieved January 8, 2017, from: tangotiger.com/index.php/site/article/evaluating-the-effectiveness-of-an-umpire-effectively.
Turkenkopf, D. (2008). A strike is a strike, right? Beyond the Box Score. Retrieved January 8, 2017, from: beyondtheboxscore.com/2008/4/24/459913/a-strike-is-a-strike-right.
Walsh, J. (2007a). Strike zone: Fact vs. fiction. The Hardball Times. Retrieved December 20, 2016, from: hardballtimes.com/strike-zone-fact-vs-fiction/.
Walsh, J. (2007b). The eye of the umpire. The Hardball Times. Retrieved December 20, 2016, from: hardballtimes.com/the-eye-of-the-umpire/.
Walsh, J. (2010). The compassionate umpire. The Hardball Times. Retrieved January 7, 2017, from: hardballtimes.com/the-compassionate-umpire/.
Walsh, J. (2011). That was a strike? In The Hardball Times Baseball Annual 2011. Joe Distelheim, Bryan Tsao, Jeremiah Oshan, & Carolina Bolado Hale (eds.), (Chicago: ACTA Sports).
Weber, B. (2009). As They See ’Em: A Fan’s Travels in the Land of Umpires. New York: Scribner.
Weinstock, J. (2012). Which umpire has the largest strike zone. The Hardball Times. Retrieved January 8, 2017, from: hardballtimes.com/which-umpire-has-the-largest-strikezone/.
What is WAR? (2017). Retrieved January 11, 2017, from: fangraphs.com/library/misc/war/.
Willman, D. (2016). Baseball Savant: Statcast search. Retrieved December 19, 2016, from: baseballsavant.mlb.com/statcast_search.
Wood, S. (2006). Generalized additive models: An introduction with R. Boca Raton, Florida: Chapman Hall, Taylor & Francis Group, LLP.
Notes
1 Janssen has a Ph.D. in Adult Education and looked to use these lessons to develop successful training programs for umpires (“Janssen to Speak,” 2007).
2 Janssen did ultimately end up working with MLB — and later with World Umpires Association and minor-league umpires — in developing other evaluation and training programs for a number of years.
3 Data from this system had been publicly available since 2007, but not used for official evaluation of umpires.
4 Fast was hired by the Houston Astros shortly thereafter, and has served as the team’s director of research and development as of 2015.
5 A previous website also allowed point-and-click downloads of raw data, developed by Joe Lefkowitz, but was taken down after he was hired within MLB.
6 Mike Fast (2011b) notes that some of this may be due to umpire positioning behind the plate.
7 Pairing this information with the stated rulebook strike zone, using an average batter height of 73.5 inches sets the bottom of the zone (just below the hollow of the knee) at 18.2 inches, and the top of the zone at 41 inches. This will be the preferred strike-zone definition for the data presentations in this article, and is relatively consistent with suggestions by PITCHf/x expert Mike Fast (2011e). However, as Carleton (2016) notes, knee shapes could result in varying zones, even for batters of the same height, so any measurement should be taken with caution.
8 The bird’s-eye view of the plate in Figure 1 was created using Bagget’s code from Github (github.com/aaronbaggett/baseball_blog).
9 The R code used to create these analyses are provided at pastebin.com/zMar7LUQ and pastebin.com/caZxe9y3.
10 Though it is certainly possible that they slightly underestimate performance as a whole.
11 Approximately 700,000 pitches were thrown in 2015 and recorded in the data with locational information.
12 As it turns out, there may be subconscious biases that influence the distribution of these incorrect calls such that they are not evenly distributed across players and teams. I’ll address this in the following section.
13 The relative missed ball and strike call rates also tell us about the size of individual umpire strike zones, as exhibited in Weinstock (2012). Mills (2015b) gives a comprehensive overview of how to analyze strike zone surface area in R.
14 MLBAM’s senior database architect, Tom Tango (2015), also suggests this approach.
15 Analyzing umpires hardly seems worth committing a HIPAA violation.
16 However, MLB did begin implementing drug tests in 2003 with no explicit punishment for a failed test.
17 This makes the emergence of Barry Bonds as the all-time single-season home-run leader in 2001 all the more impressive.
18 Interestingly, minor-league scoring also decreased dramatically from 1999 to 2002, though the large drop started in 2000, rather than 2001, as it did in MLB. However, decreases in MLB scoring have outpaced minor-league scoring since 2006.
19 Both Mills (2016a; 2016b; 2016c) and Roegele (2014a; 2014b; 2015a; 2016) have ensured that these changes are well-documented.
20 Scoring had peaked in 2000 at 10.28 combined runs per game.
21 Much earlier work by Steiner (2009) also found that pitchers’ walk rates could individually be affected by umpire mistakes on ball calls.
22 This varies from year to year. For example, in 2002 home-team win percentage was 0.542, and in 2009 it was 0.549.
23 It should be noted that Bayes Theorem has much more important applications than baseball and game shows, particularly in medicine, and the Monty Hall problem was known well before the release of this movie.
24 Wins Above Replacement (WAR) is a measure of player productivity developed by sabermetricians. While there are varying methods to reach the WAR value for a given player, the definition of the concept is relatively standard. WAR is defined as the number of additional wins provided to their team relative to a player available to replace them, such as a Triple-A player that could be moved up to the MLB roster in the event of injury. WAR can be negative if a player performs below what would be expected of a Triple-A player, with 10 wins being an unusually excellent season. As an example, in 2016 Mike Trout had the highest MLB WAR (as measured by Fangraphs [“What is WAR?” 2017]) at 9.4, while Carlos Correa produced 4.9 wins, and Denard Span produced 1.4 wins.
25 Sportvision built technology called COMMANDf/x with data back through the 2010 season. Mike Fast (2011f) attempts to analyze glove movement impacts by pairing video and PITCHf/x data.
26 Though this author notes for the record that he suspects run scoring would decrease substantially without a redefinition or some other under-the-hood tweak to the game.