This article was written by Dick Cramer
This article was published in 2002 Baseball Research Journal
A most surprising discovery about baseball was reported several years ago by Voros McCracken on various Web sites. Despite their individual efforts, major league pitchers seem to have almost identical abilities to prevent base hits. Of course, they differ greatly in how often they yield strikeouts, walks, and home runs. There are also large and consistent variations in the “ground-ball-yielding” tendencies of pitchers. But once a batted ball is put into play, no matter whether in the air or on the ground, the frequency of base hits resulting is essentially the same for a Jimmy Anderson as for a Randy Johnson.
This well-confirmed fact is all the more surprising when it is remembered that a pitcher is supported by eight other fielders, whose only jobs are to convert as many batted balls as possible into outs while minimizing advancement of any baserunners. Surely there are differences in fielding skill, even though it has proven very difficult to measure these differences and to assess their values to their teams. For example, from its very first recording date during 1981 spring training, STATS has emphasized trying to directly and systematically gather observational data that would allow them to “rate fielders.” But the Zone Ratings that have resulted are little more convincing than Range Factors.
Could it be that the small differences that do exist in “batting average per batted ball in play” (BABIP from now on, as suggested to me by Rob Neyer), among pitchers and among teams, are more strongly affected by the fielders’ skills than by the pitchers’ skill? This research report describes four studies to address this question. In summary, three of the four results obtained agree in strongly suggesting, “Yes. Fielders collectively seem to have a much greater effect on BABIP than do the pitchers”, while the fourth result is not inconsistent with this statement.
Many of the following studies share a methodological point of view. Effects are real if they tend to correctly predict future effects. Effects that are not persistent are the results of random chance, or luck — in this case, the at-’em screamers or wind-blown bloopers, which actually seldom even out in 162 games. To decide whether or not an effect is predictive, we may ask whether an effect that shows up in a particular year, say, 1997, is observed again in the next year, 1998. And we look at all the examples we can find, for the small effects that haven’t been established in 162 games may become significant over a decade of major league play. In baseball analysis, this general “persistency” approach seems to have first been applied to “clutch hitting” in the early 1970s, when only two seasons of relevant data existed.
Of course, this point of view is also that of most scientists and statisticians in addressing many practical questions, such as “Is this new drug better than that old one?” A widely used yardstick when statisticians compare two sets of numbers (such as 1997 vs. 1998) is r, the Pearson correlation coefficient, which varies from 0.0 (no relationship between these sets) to 1.0 (there is an exact relationship) and has a sign that will be either positive (the numbers tend to vary in the same way) or negative (the relationship is consistent but backward-for example, the relation between ERA and WHIP among pitchers). It is also useful to square r (r X r or r2), because the resulting r2 expresses the proportion of the differences among one set of numbers that can be predicted by knowing the other set.
A second underlying point of view here, conceptually the same as par runs in Total Baseball, is a focus on team or player performance above or below the league average. As Pete Palmer was the first to stress, teams win or lose games not, say, because their team batting average is .270, but because their team batting average is better or worse than the other teams’ batting averages. Here we will be considering only the numBer of hits yielded, above or below the “expected” or league average value. Expressed as a formula, for a pitcher- or team-season, this value is calculated as:
Hits Prevented =
Lg. BABIP x (3 x IP – K + H – HR) + HR – H
where Lg. BABIP =
(Lg.H – Lg.HR) (3 x Lg.IP – Lg.K + Lg.H – Lg.HR)
Hits Prevented will be positive whenever the defense is more effective than average (whether because of superiority in fielding, in pitching, in home park effect, or in luck) and will be negative for weaker than average performance. The sum of Hits Prevented
over all the pitchers or teams in a league will be zero. (This formula differs a bit from McCracken’s, but not in any way that affects the conclusions.)
Finally, the scope of these studies was the 11 seasons from 1991 to 2001, including the strike season of 1994. These seasons provided 10 consecutive-season comparisons.
To begin with, the question of whether the pitchers themselves have any effect on BABIP was reexamined. There were 945 instances in which an individual pitcher worked a total of 200 innings in consecutive seasons, all for the same club (the 945 were identified by hand, almost certainly yielding an undercount, but an unbiased one). The r of Hits Prevented over these 945 paired pitcher seasons is .162, so pitchers do have a statistically significant effect on BABIP. However as a practical matter that effect is very small. Squaring r yields the predictability of Hits Prevented in year 2, given the result in year 1, as a value of .026 or 2.6%.
Hits Prevented can be compared for consecutive team-seasons as well as for consecutive individual pitcher seasons. There are exactly 308 such comparisons for these seven consecutive seasons. The r over these 308 paired team-seasons is .369, considerably higher than pitcher-seasons, especially when r is squared to yield a 13.6% predictability. So the persistence of Hits Prevented from one season to the next is about five times greater for teams than for individual pitchers. I have shown directly that this increased team persistence is not caused by the individual pitchers. And turnover in pitching staffs is rather high anyway. It seems reasonable to attribute the much higher persistence of team Hits Prevented to a less variable influence on BABIP, the skill of the fielders.
Park effects also play an important role. For example, removing the most extreme park effect, the Rockies’ eight comparisons, yields r = .323, or a 10.5% predictability, for the remaining 299 cases. Also the consecutive team-season correlation for park effects has a relatively large r of .535 (307 cases, excluding the Astros move).
Then could the park effect account for all of the season-to-season persistence of team BABIP? The STATS Major League Handbook has presented complete home/away team statistics for the last decade. So these calculations were repeated for 1992-2001 away data only, excluding 1995, which for various reasons was not available. The r for the resulting 199 away-team BABIP persistence was .300, a 9.0% predictability.
Summarizing, from the r2 values for all these various correlations, there is an overall team BABIP season-to-season persistence of 13.6%. If the smaller set of away-game BABIP data, with the park-neutral persistence of 9.0%, is considered sufficiently representative, then the average home park effect on BABIP persistency becomes 4.6% (the difference). The individual pitcher season-to-season persistence of BAB IP is 2.6%, an overestimation of the “pure pitching” effect since the much larger park and fielder effects that must affect individual pitcher BABIP persistence were ignored. The only other persistent entity appears to be the fielders, so they are left responsible for the remaining 6.4% (9.0% minus 2.6%) of persistence in BABIP, the largest single factor.
Perhaps the most important point to note about BABIP is that 86.4% is not persistent at all. For the most part, differences among teams in their hits yielded, per ball in play, appear to be random variations, in “lucky bounces” and “at-’em balls.” Pete Palmer reports that this conclusion is also expected on the basis of statistical theory. League BABIP rates are currently about 0.290 (in other words, the league-average batter currently hits about .290 when he puts the ball in play, excluding home runs). However, the BABIP off an individual pitcher in a season will randomly vary, just as the number of heads in 100 actual coin flips will usually not be exactly 50. Pete Palmer has recently calculated that the actual historical variations in BABIP for individual pitchers behave indistinguishably from variations in coin flips (for the same distributions of sample sizes). (However, just to avoid any possible confusion, year-to-year persistence is relatively much greater in individual batting statistics, or in walks, strikeouts, and home runs off pitchers, or in traditional fielding statistics — and again in accord with statistical theory, because for all these other statistics the variations among individual players’ totals are much larger than the theoretically expected ran dom variations.)
If most of the variations in BABIP among teams and especially pitchers are actually random fluctuations, who if anyone should be held accountable? From the simple accounting balance viewpoint of the classical box score, the traditional practice of charging the hits to the pitchers and only the errors to the fielders will not be easily improved. However, from the point of view of the baseball analyst, who is mainly trying to better understand how individual players help teams win and lose games, to say nothing of the point of view of all the spectators, it is difficult to believe that fielders differ only in their frequencies of errors, double plays, and passed balls. And fielders are indeed found to have a greater effect than pitchers or parks on team Hits Prevented, here on the basis of year-to-year persistence arguments. The following is the same question, examined from a different perspective.
There already exist direct if incomplete statistical measurements of :fielding skill, errors, double plays per opportunity, and passed balls. We now have a candidate indicator for another aspect of fielding skill, Hits Prevented. If in fact Hits Prevented also indirectly reflects fielding skill, then team superiority in Hits Prevented should correlate with team superiority in errors, double plays, and passed balls. And indeed there is such a positive correlation. For the 338 team comparisons in this study, the r value between conventional and unconventional fielding skill is .273 (see the Appendix for how conventional fielding skill was summarized).
There are of course two more important team skills, batting and pitching. Note that for this purpose team pitching skill is being calculated with all pitchers assigned the same league average BABIP (see the Appendix for this too). Here is a “correlation matrix” of r values showing how all these four skills are related to each other, for these 338 team comparisons.
|Conventional fielding (CF)
|Hits Prevented (HP)
|Pitching skill (P)
Although the .273 between conventional fielding and Hits Prevented is modest, it is the largest association in the table. Furthermore, note the insignificant (and negative!) correlation between Hits Prevented and pitching skill. If Hits Prevented instead reflects mostly pitching skill, as all of us believed until very recently, this is a very surprising result. Hits Prevented correlates with conventional fielding skills more strongly than any other pairing of the four skills. The second largest correlation is between offensive skill and hits prevented, which can easily be interpreted as a tendency for good hitters also to be good fielders, but is very hard to understand if hits prevented are the pitchers’ responsibility.
Park effects must also affect these measures of team skills — i.e., as an important cause of the weak negative relation shown between batting and pitching skills.
The approaches used to obtain Result 1 and Result 2 can be combined by forming the correlation matrix for 308 consecutive team-season comparisons among the four team skills. (Results 2 and 3 are actually independent, despite any contrary impression, as there is very little tendency for consecutive team-season differences to correlate with the values forming those differences. The actual r‘s range from .0006 to .070 for the four team skills.)
Here is the outcome:
|Conventional fielding (CF)
|Hits Prevented (HP)
|Pitching skill (P)
The same tendencies exist as in individual season skills, and to a somewhat greater extent. Consecutive season changes in team Hits Prevented strongly follow changes in conventional fielding skill but are unrelated to changes in (BABIP constant) pitching skill.
If differences in BABIP are determined more by the fielders than by the pitchers, then pitchers who change teams should have a relatively small season to-season persistence in their Hits Prevented (difference between actual hits allowed and those with constant BABIP). There were 348 pitchers in this 1990-2001 sample who over consecutive seasons appeared with more than one team. The r value for season-to season correlation in this group’s Hits Prevented was .154. Although this value is less than the r value of .162 for the 945 pitchers who worked for only one team, and thus the change is in the expected direction, the decrease is too small to have statistical significance.
A less objective but much more interesting way to continue looking at this question is to consider individually some of the largest, less likely to be chance, season-to-season changes in team Hits Prevented. Are changes in the fielders a reasonable cause? In the following discussion, the numbers shown are not Hits Prevented, but the extra (fewer) runs that these extra (fewer) hits are expected to create.
By far the best Hits Prevented team during the entire 1990-2001 era was the 2001 Mariners, at +128 runs while the worst was the 2001 Indians, at -79 (fielding skill prevailing when they met in the postseason!) The second and third best teams were the surprising second-place 1998 Reds (+91) and 1991 White Sox (+84), the fourth and fifth the 1990 and 1997 Athletics (+77 and +70), and the sixth another record setter, the 1998 Yankees (+60). The second and third worst teams, the 1993 and 1999 Rockies (-75 and -72), were strongly influenced by an unfavorable park effect. The fourth worst was the 2001 Rangers (-71), the fifth worst was the 1990 Braves (-62), and the sixth the 1997 Rockies (-60).
The second largest change in Hits Prevented runs was a +87 run improvement by the 1991 Braves over the 1990 team. My recollection of a concerted effort improve the defense and “support the young pitching” is supported by changes at every field position except catcher: Lemke over Treadway (2b); Belliard over Blauser (ss); Pendleton over Presley (3b); Otis Nixon over Dale Murphy (cf); Justice over Lonnie Smith (rf); Bream for Justice (1b). The pitching staff was mostly unchanged.
An even bigger improvement is the last two years in Seattle, where a +67 runs from 1999 to 2000 (fifth largest) was followed by the greatest year-to-year improvement in the 1990-2001 era, of +90 runs from 2000 to 2001. The major 1999-2000 changes were Cameron for Griffey (cf), David Bell for Russ Davis (3b), and Olerud for Segui (1b). The 116 wins resulted after McLemore was replaced by Boone (2b), Buhner by Suzuki (rf), and A-Rod by Guillen (ss). Again, the pitchers were mostly the same, although some of this improvement is the park change.
Large negative changes seem as much the result of injuries as conscious decision making. For example, the second worst change of -91 runs in Tampa Bay’s second year (1998 to 1999) saw many fewer games played by defensive stalwarts Cairo, Stocker, Boggs, and McCracken but only one intentional change, in left field. A slightly larger decline (-92) occurred in the turmoil of the Brewers’ last two AL seasons (1996 to 1997), notable being departures by Jaha and Vaughn and arrivals by Gerald Williams and Burnitz.
In summary, several independent analyses of the available data converge in suggesting that the fielders have much more influence on opponents’ hits per batted ball in play (BABIP) than do the pitchers, in current major league play. BABIP season-to-season persistency is much greater for teams than for individual pitchers. And even more important, team hits allowed per batted ball correlate positively with other measures of fielding skill but negatively with other measures of pitching skill. Home park effects are also significant. While any other major influences on hits per batted ball remain either unknown or, much more likely, random and nonexistent, there seems quite enough justification to assign the total of team BABIP variation to team fielding rather than individual pitching.
Of course, team fielding is the summation of the individual fielding performances that we all most want to understand. So these findings about BABIP offer hope for significantly better evaluations of individual fielders. Analysts of individual fielding performances have always been confounded by the arithmetic of the defense. There will be three putouts per inning, regardless of how many baserunners and runs occur in between those putouts. So the better one’s teammates field, the fewer one’s own chances to record those individual putouts and assists that we can objectively count and compare.
However, with BABIP obviously a measure of those baserunners between the putouts, if BABIP differences may be attributed mostly to fielders, then the arithmetic of baseball rules need no longer dominate our numerical comparisons of individual fielders. Although the next details will no doubt be debated thoroughly, they seem clear in principle, and creative analyses like those by Bill James in Win Shares and by groups like the Baseball Prospectus have already made substantial progress in this direction.
DICK CRAMER is best known as founder of STATS, Inc., and creator of much of its software. He is a past VP and board member of SABR. For the last twenty years his day gig has been Chief Scientific Officer of Tripos, and currently on most Santa Fe Saturday afternoons he can be found playing Dixieland jazz at Evangelo’s.
Here is an example of the spreadsheet formula used in this work to calculate conventional fielding par runs, here for the 2000 Angels:
0.7 x (112 – AM5) + 1 x (AN5 – 162.9 x ((AA5 + AD5 + AE5) / 2192.4)) + 0.25 x (10.7 – AO5)
It is the sum of three terms, which from left to right account for errors, double plays, and passed balls. The spreadsheet cell references are to specific Angel team totals. The constants are either the league average for a team or the run value attributed to an error, DP, or PB.
Thus the average AL 2000 team made 112 errors, while the Angels made 134 (referenced by AM5). Those 134 – 112 = 22 extra errors are assumed to have cost the Angels about 15 runs on defense (0.7 x 22). Passed balls are handled in exactly the same way. Expected double plays are weighted by the number of opportunities, approximated by the number of Angel opponents reaching base by hit (AA5), walk (AD5), or HBP (AE5). The league average for opponents reaching base in the 2000 AL is 2192.4, as shown.
Here is an example of the much more intricate pair of formulae used to convert Hits Prevented into par runs allowed, applicable either to Pedro Martinez or the Boston Red Sox.
Expected runs allowed =
11950 – 1.068601 x (29710 – (AA77 + 3 x AB77)) x (30881.75 – (AA77 + AD77 + AE77 + (AG77 + AH77) / 4))
(82045 – (Z77 x 3 + AA77))
Par pitching runs = (11950 + 20141) x Z77 x 1.0072 – “Expected runs allowed”
The first formula expresses two basic ideas:
In the spirit of the “component ERA” that Bill James has promulgated, expected runs allowed are calculated using a runs-created-type formula for pitchers.
Primarily to make the total of individual pitchers runs allowed very nearly equal to the team runs allowed (i.e., so that the “whole equals the sum of its parts”), every runs created (RC) formula I use is for the league after omitting the contributions of the player or the team. The RC by the player or team then becomes the difference between the actual RC by the league and the RC calculated as just described (after the pitcher or team is omitted).
So the first formula expresses the difference between 11,950, the runs actually scored off AL pitchers in 2000, and three multiplied terms divided by a fourth. As a detailed example, the second term is an approximation of total bases yielded (hits yielded + 3 x HR yielded), where hits yielded can be either actual or calculated by the league average of BABIP. Its 29,710 value is the “total bases yielded” for the league (calculated using actual hits) and the “- (AA77 + 3 x AB77)” then removes the pitcher or team “total bases yielded” from this league “total bases yielded”.
The third term is an “on-base allowed,” expressing hits, walks, HBP, and (WPs and balks) in a similar way. The fourth divisor term is an “at-bat,” approximated as (3 x IP + hits).
The first constant term (1.068601) is the ratio of the actual league runs allowed to those that are calculated for the league, omitting team or player subtraction — in this instance 29710 x 30881.75 / 82045.
The par pitching runs formula is much simpler, again the difference of two terms, with the first being the number of runs an average pitcher would yield in the same number of innings (Z77) in a neutral park (1.0072 being the 2000 Fenway correction) and the second the output of the first formula.
The par runs resulting from Hits Prevented (used to obtain Results 2 and 3) is the difference in runs allowed (or par runs allowed, same thing) calculated as above either (1) with actual hits allowed vs (2) with hits calculated at the league average BABIP rate.