This article was written by Pete Palmer
This article was published in the Spring 2016 Baseball Research Journal
After the 1970 season, two brothers, Eldon and Harlan Mills, unveiled a new approach to baseball statistics: Player Win Averages. Eldon was a retired Air Force colonel and an expert in computer programming and data processing, while Harlan was a professor and mathematics consultant to IBM. What they did was develop a model for calculating win probability as a function of inning, score, and base-out situation and then measure the change for each at-bat during the 1969 season for every batter and pitcher. They paid Elias Sports Bureau to enter the data on punch cards and then tallied the results. The model started with 0 points for each team and at the end one team had 1000 and the other minus 1000, so 1000 points equaled one win. For each player they added up all the plus points and minus points, with the player win average being plus points over the sum of both, so .500 was average. Willie McCovey (.677) and Mike Epstein (.641) led their respective leagues. They did not publish a copy of the win probability, but did show a play-by-play for the 1969 World Series, which contained many values.
Unfortunately, the idea did not catch on. They calculated data for 1970, but did not publish, although they did send me a copy of the results. I shared the data with Dick Cramer, who joined with me to produce an article in the 1974 Baseball Research Journal on Batter Run Average, which was basically on-base average times slugging average. Since those data were independent of game situation, you could compare them to player win averages. Players with higher win averages than expected therefore might be considered clutch hitters. However, in his famous article in the 1977 Baseball Research Journal (“Do Clutch Hitters Exist?”), Cramer showed there was no correlation from one year to the next. Dick and I gave a presentation at the 2008 SABR national convention, which was also published in the BRJ that year, which supported this conclusion over a much larger sample — about 1000 players from 1957 through 2007.
Earnshaw Cook introduced me to Dick Trueman, a professor at Cal State Northridge in the 1960s. Dick had gotten copies of the raw figures from Mills and done quite a bit of work on them. He discovered that a key relief pitcher can actually have about double the effect of a starter on game outcomes because of the situations he faces. Trueman also presented a paper on clutch hitting at the ORSA/TIMS Meeting in 1977. His conclusion was that it might exist, but more data were needed. Trueman gave me a tape of the raw data, which I sent to Dave Smith of Retrosheet who converted them to floppies. I recently turned a copy over to Sean Lahman to have it made available to SABR members. (I also believe Retrosheet used it to reconstruct play-by-play data missing from games 1969–70.)
In order to compare normal stats to player win averages, I punched up all the batting and pitching data for 1969. This turned out to be fairly easy, which started me on creating my baseball database. I kept up with each year and worked backwards, so by 1984 when The Hidden Game of Baseball (co-authored by me and John Thorn) came out, I had done to 1925. By then I also had a full basic register based on my work on the old A.S. Barnes baseball encyclopedia, to which I had added at-bats, hits, walks, total bases, innings pitched, and earned run average for all players. Later I added all the other categories back to 1871, which was used for Total Baseball with Thorn and the ESPN Baseball Encyclopedia with Gary Gillette.
I created a win probability program in the 1970s. It actually wasn’t that hard. All you do is start with two outs in the last of the ninth and play the game backwards. I had earlier created a table of runs scoring distribution from 0 to 12 versus the 24 base-out situations. So, in the last of the ninth, one run behind, with two outs and a runner on third, the win probability is simply half the probability of scoring one run from that situation plus the probability of scoring more than one. When you get back to the bases empty, none out situation, you now have ending points for each score difference for the top of the ninth. The run distribution table was created from a game simulation I wrote, using probabilities of each event, plus various transitional frequencies from going from first to third on single, advance on an out, being picked off or getting thrown out stretching, etc. There were very little play-by-play data in those days. Most of my calculations for these figures came from play-by-play of 34 World Series games shown in the baseball guide from 1956 through 1960. Of course now, thanks to Bill James, Gary Gillette, Dave Smith, and others, we have play-by-play of games back to 1946, a full 70 years, plus a great many before that date.
So now I can create the run distribution table from real data, but the win probability table still needs to be done by a program because many of the cells simply don’t come up often enough in a season to produce reliable results. I made up a run distribution table and the derived win probability table for each league each year for 1946 through 2015.
The original win probability tables developed by the Mills brothers came from thousands of computer simulated games starting at the 24 different base-out situations (empty, none out through full, two outs). In my model, I used plus-or-minus 8 runs, which would be 24 x 18 x 15 or 6,480 different calculations. Using the run-distribution data, I was able to calculate the win probability for all 6,480 cases in one short computer run. The problem with using simulations is even after a thousand runs, the margin of error is still three percent. This is calculated by finding the square root of one half times one half times a thousand, or 16, which is the standard deviation (sigma). The margin of error is plus or minus 2 times sigma. The true answer should be in that range 95 percent of the time, assuming that you have a perfectly random sample. If you have a sample of 1,000 games and get 530 wins, that means the true value should be between 500 and 560, which is a pretty big range. Using ten thousand simulations reduces the margin by the square root of ten, which would mean the true answer should be between 51 and 53 percent. Very few people understand what the margin of error means. When you see a poll of a thousand people and the answer comes out 53 percent, the usual interpretation is that since 50 percent is within the margin of error, the result is too close to call. Actually, since the true answer would be between 50 and 56 percent, 95 percent of the time, the probability that true answer is in the majority is actually 97.5 percent.
The win probability table is useful in calculation of leverage, a concept that Tom Tango (www.tangotiger.net) and I invented independently. What you do for each cell in the table, you calculate the average change for the various items — single, double, triple, home run, walk, or out — each weighted by the appropriate frequency. Then add up the absolute value of each for the total. I called mine “stress,” but Tom’s name (leverage) was much better. There are usually about 700 cases a year (about one every three or four games) where the leverage is 200 points or more (one-fifth of a win). Almost all are in the ninth inning or later with the team one run behind or tied. The highest level is around 400 points, depending on the league and year. It is two out, last of the ninth, one run behind. The win probability is around 28 percent. If a batter makes out, you lose 280 points. If driving in a run, you gain 360 points, and if you drive in two, you gain 720 points. Starting pitchers average around thirty five leverage points per plate appearance, but key relievers can get up to seventy. Mop-up men can go as low as twenty. This shows one of the drawbacks of player win averages. A few key appearances can swamp out hundreds of ordinary ones. For example, in 2015, Mike Trout’s top ten leverage situations average 162 points, which was about equal to the sum of his lowest two hundred appearances.
My method of calculating player win averages was slightly different from the Mills method. I started with each team at 500 points or 50 percent win probability, and added or subtracted the change after each at bat to the batter and pitcher. Thus at the end of the game, one team gains 500 points and other team loses 500. The Mills brothers used 1,000 points for each win, which did not affect the value of their average, since both numerator and denominator were doubled. But I used the sum of points, not the ratio, so a player with 5,000 points over the season would have contributed five wins more than average. The split between batting and pitching for the winning team in a particular game could vary, but the difference would always be plus 500 and the losing team would be minus 500. In a poorly pitched game, the winning team might have 900 batting points and the losing team 400 batting points. Successful defense points, which are equal to other teams’ offense points, are negative. A balanced game would be 250 batting points and 250 pitching points (or minus 250 batting points for the other team).
There are a few more adjustments to be made. Even though the win tables were calculated from the league batting stats, the sum of all points for the year might not be quite zero. The league total usually comes out off by about 0.1 points per appearance (one ten-thousandth of a win). A much bigger correction has to do with the designated hitter rule. The league average for the AL from 1973 to 1996 contained very little pitcher hitting (slightly more since interleague play started). Thus the AL players were being compared to a higher standard than the NL. In order to handle this, I took an average of batting by pitchers and non-pitchers. Pitchers were about 9 points lower than average per at-bat through 1960 and have crept up to 12 point lower today. Since so few AL pitchers have batted since 1972, I used the NL value for the AL in the DH era. This resulted in a correction of about .7 points per appearance for NL batters and AL batters before 1973. Thus I subtracted a bit less than half a win from each batter who played a full season. The correction was zero for the AL until 1997, when it became around .05.
For the ballpark correction, I took the sum of all batters, visitor and home, for each park and compared it to the same players on the road (pitchers excluded). About half the teams were within plus or minus one point per appearance. However, no surprise, Colorado dominated the largest corrections, with 11 of the top 13 spots. The largest was 6.36 points per appearance in 1995. This would mean a regular player would have two wins subtracted from his season record for approximately 300 appearances at home. It is strange that from 2003 through 2009 the adjustment was only about two points, but in recent years it has gone up again and 2012, 2014, and 2015 are in the top ten. There is no such domination on the low end, although Houston does have three of the ten bottom teams. The lowest was the Mets in 1988 with minus 4.29. I took appearances in each park for each player and applied the corrections for that team and year. Thus the fact that the NL West plays a lot more games in Colorado than other divisions is properly taken care of. The park correction is not entirely fair because players are affected differently. Power hitters can take advantage of a short fence better than singles hitters. Asymmetrical parks can benefit batters based on handedness, although not consistently. Lefties Carl Yastrzemski and Wade Boggs were able to take advantage of Fenway Park, while Ted Williams had just a normal home advantage. Joe DiMaggio was hurt by Yankee Stadium, but so was Lou Gehrig.
The batters shown in Table 1 are rated by player wins over average. I included all players with 2,800 or more at-bats. Barry Bonds wins by three touchdowns.
Table 1: Batters
The 120.3 represents 120,300 player win points. RW is the increase in wins based on runs above average, independent of game score or inning, which was 1,232 runs. For example, a single with one out sending the runner to third will always be worth about six tenths of a run (or .06 wins), regardless of the game situation. But if it occurs in the last of the ninth with the score tied, it could be worth about 200 win points, or .20 wins. The difference between the two for 1,156 players since 1946 with 2,800 or more at-bats was random, supporting the theory that clutch hitting doesn’t exist. I ran a simulation of 18 identical batters with their performance dictated by a random number generator through 750 games (about 3,000 at-bats), for 200 times and came up with a difference between the wins above average calculated by player win average and by runs from average. The standard deviation (sigma) was 2.2 wins, meaning 95 percent of the time the difference would be between plus and minus two sigma or 4.4 wins. Since this number is proportional to the square root of the number of at-bats, a player with 12,000 at-bats would be within plus or minus 8.8 wins. Five percent of the sample (or 58 players) should be beyond the two-sigma limit. There were actually 47. There should have been three beyond the three sigma limit, and there were actually five.
Ted Williams and Stan Musial show figures only since 1946. Musial probably would have about 15 more wins, plus another five for missing 1945 in the military. Ted would have about 30 more wins from his actual play, plus another 35 for the almost five seasons missed in the air force, which would have moved him to the top. Joe DiMaggio had only 20 wins from 1946, but had 34 before and could have had another 13 during the war, which would have put him on the list below, which shows everyone with 50 or more. Willie Mays missed two years in military service, but it was at the start of his career, so it is difficult to give it a number, perhaps six to twelve wins.
Player win average is a batting statistic, so there is no accounting of fielding. The original Mills method charged an error to the fielder, not the pitcher, and gave the batter and pitcher an out on the play. This does not allow credit for fielding range. Infielders make more errors, but also have more difficult plays. I gave the batter credit for reaching on an error and charged the pitcher. A shortstop who hits like a typical first baseman can be worth a couple of wins a year more than an average first sacker. This is because his team will have a first baseman in the lineup, where the other team will need a shortstop. Shortstops and catchers average about minus ten runs batting per season compared to average non-pitchers, second basemen around minus five, third basemen and center fielders zero, corner outfielders five and designated hitters and first basemen ten. In addition, a fielder can get a swing of another win or two based on his fielding being better or worse than average.
Pitchers were done in the same manner. As shown in Table 2, Roger Clemens and Greg Maddux came out on top. I listed the leverage to show that most starting pitchers have a value of about 35. This means the average change in win probability is only 35 points or 3.5 percent. The difference between win calculated by ordinary run difference and actual win probability is small, as was the case with the batters. There was one notable exception, Mariano Rivera, who had a leverage figure about double that of the starters. He was the only reliever to have 36 or more career wins. This was because relievers can be used in more crucial game situations, so their effect is stronger. By runs alone, he had only 35 wins, but taking the game situation into account, he produced 54 wins. Bob Feller had 20 wins from 1946, but had 21 before and could have had another 19 in the four years his missed in the military which would put him at 60.
Table 2: Starting Pitchers
I charted relievers only, 3,000 or more batters faced, discounting any season where they started at least 10 games, or started more than they relieved. The leverage values are typically 50 to 70 as shown in Table 3.
Table 3: Relievers
Rivera is really in a class by himself. And this doesn’t even include his playoff games. He pitched 141 innings and had an earned run average of 0.70 there and then added nine innings with no earned runs in All-Star games for good measure. In his final all-star appearance in 2013, he went out to pitch the eighth inning. When he got to the mound, he found himself alone on the field, as all the other players stayed in the dugout. He received a well-deserved two minute standing ovation in honor of his remarkable career.
One of Bob Davids’ favorite subjects was pitcher hitting. He wrote a book about it, Great Hitting Pitchers, originally published by SABR in 1979. I rated pitchers against the average of all pitchers, which meant adding about ten points per appearance to their total.
Table 4: Pitchers hitting
PETE PALMER is the co-author with John Thorn of “The Hidden Game of Baseball” and co-editor with Gary Gillette of the “Barnes and Noble ESPN Baseball Encyclopedia” (five editions). Pete worked as a consultant to Sports Information Center, the official statisticians for the American League 1976–87. Pete introduced on-base average as an official statistic for the American League in 1979 and invented on-base plus slugging (OPS), now universally used as a good measure of batting strength. He won the SABR Bob Davids Award in 1989 and was selected as a charter member of the Henry Chadwick Award in 2010. Pete also edited (with Thorn) seven editions of “Total Baseball.” He previously edited four editions of the “Barnes Official Encyclopedia of Baseball” (1974–79). A member of SABR since 1973, Pete is also the editor of “Who’s Who in Baseball.”