Calculating Skill and Luck in Major League Baseball

This article was written by Pete Palmer

This article was published in Spring 2017 Baseball Research Journal


One of my favorite topics is the contributions of skill and luck in baseball. I recently ran one thousand simulations of a 162-game schedule that is the same as is currently being used in the majors (two leagues of 15 teams, three divisions each, with interleague play) where every team was the same and games could be decided by a coin flip (a random number generated by the computer). Of course you would not expect each team to win 81 games, but you would expect that the wins would show a bell-shaped curve centered at 81 wins.

This is of course what happened.

The width of the distribution is characterized by its standard deviation, defined as the square root of the sum of the squares of the difference between the team wins and 81, all divided by the number of teams. You would expect about two-thirds of the value to be within plus or minus one standard deviation, and 94 percent to be within two.

There is a formula which defines what this number should be in binomial distribution—the special case of the normal distribution where there are only two outcomes, like heads or tails, or in this case, wins or losses. The formula is simply the square root of the win probability times the loss probability times the number of trials, or games. For a 162-game season, this would be the square root of ½ times ½ times 162, which is 6.36. In my simulation, of course, each season out of the 1000 that I ran would not be exactly that value, but the number should be close. It turns out the season value was 6.35 with a range of plus or minus 0.9. If I combined the data into 10 year periods, the range was down to 0.3, as expected, reduced by the square root of 10.

If you assume luck and skill are independent — the luck factor for a season is the 6.36 wins calculated and the total variation is what happened on the field — then you can calculate the skill factor. It is simply the square root of total squared minus luck squared. I took the data by decades from 1871 to the present. The Union Association (1884) and Federal League (1914-15) were excluded.

 

Table 1

Years Teams Total Luck Skill
1871-1880 86 10.59 3.68 9.93
1881-1890 164 15.4 5.36 14.44
1891-1900 121 15.84 5.84 14.72
1901-1910 160 16.49 6.07 15.33
1911-1920 160 14.41 6.1 13.06
1921-1930 160 13.96 6.19 12.51
1931-1940 160 14.99 6.18 13.66
1941-1950 160 14.39 6.19 12.99
1951-1960 160 13.46 6.2 11.95
1961-1970 206 12.85 6.35 11.17
1971-1980 248 11.63 6.34 9.75
1981-1990 260 9.98 6.25 7.78
1991-2000 282 10.51 6.23 8.46
2001-2010 300 11.74 6.36 9.87
2011-2016 180 10.95 6.36 8.91

A variation in the luck factor of 0.3 wins or so would result in a change in skill of about 0.2.

What this shows is the skill factor, which was around 13 wins for 1901–1950, has been reduced significantly so that is has now been around 9 since 1971. This is a team skill factor, not a player skill factor, so basically the teams have become more evenly matched. It also shows that over the course of a full season, the skill and luck factors are almost equal. If you assume the 9 wins (.055 pct) would be constant for any length schedule, then you need 81 games before the skill factor and luck factor are equal. In 81 games, the skill factor would be 81 x .055 or 4.5 wins, while the luck factor (sqrt (81/4) would also be 4.5.

Next I modified my simulation to use teams that were not the same. I chose teams with a standard deviation equal to 9 wins derived from the previous table. I ran 1000 seasons, each with a different set of expected team win percentages. I noted the number of teams within each win range and the number of expected wins for each team that actually won games in that range. I derived a formula for the probability of one team beating another, which is the difference in overall win probability of the teams plus one half.

The results show the actual range is quite a bit broader than the expected one. From the previous table we would expect the actual range to have a standard deviation around 12, which it did. The important column is E/A, which is the expected number of wins for a team that actually won games in range. For example, a team that actually won 59 to 61 games was expected to win 67.4 games, which was 6.5 wins more than actual. Only 8 percent of the number of teams that won 59-61 games won more games than expected. Looking at the 89-91 range, those teams expected to win only 86.5 games and 70 percent of those teams won more games than expected.

 

Table 2

Wins Exp Act E/A Dif More
41-43 0 4 58.0 -16.0 0.000
44-46 0 19 58.3 -13.3 0.000
47-49 0 40 60.7 -12.7 0.000
50-52 0 96 62.5 -11.5 0.000
53-55 66 194 63.9 -9.9 0.026
56-58 127 332 65.2 -8.2 0.054
59-61 267 551 67.4 -7.4 0.078
62-64 526 919 69.5 -6.5 0.103
65-67 988 1333 71.6 -5.6 0.131
68-70 1681 1847 73.5 -4.5 0.178
71-73 2333 2298 75.4 -3.4 0.237
74-76 3207 2773 77.1 -2.1 0.309
77-79 3918 3055 79.2 -1.2 0.366
80-82 3963 3091 81.1 -0.1 0.466
83-85 3789 3080 82.7 1.3 0.559
86-88 3140 2780 84.9 2.1 0.619
89-91 2356 2293 86.5 3.5 0.706
92-94 1653 1751 88.5 4.5 0.763
95-97 1011 1359 90.4 5.6 0.815
98-100 512 928 92.5 6.5 0.867
101-103 261 589 94.1 7.9 0.912
104-106 135 304 96.5 8.5 0.928
107-109 67 194 98.0 10.0 0.954
110-112 0 96 100.1 10.9 1.000
113-115 0 42 103.3 10.7 1.000
116-118 0 21 101.9 15.1 1.000
119-121 0 6 106.8 13.2 1.000
122-124 0 1 109.0 14.0 1.000

 

I then looked at actual data showing change in wins in the following year. I used all teams 1901 to date and found, not surprisingly, that teams tended to show wins closer to .500. In fact the change was almost identical to the change found in the simulation above. Thus I believe that the so-called regression to the mean is simply due to luck. Of course, we don’t know the true team win probabilities of the actual teams, but it does seem likely that they are similar to the simulation, and a real life 90 win team is probably an 86.5 wins team that was lucky. Wins were normalized to 162 games to account for varying length seasons. Diff1 is the difference between wins this year and next. Diff2 is the difference in team wins expected and actual from the previous table. The table below is from a much smaller sample, so the numbers would be less uniform.

 

Table 3

Wins Teams Next Diff1 Diff2
38-40 2 66.5 27.5  
41-43 4 60.5 18.5 16.0
44-46 8 61.0 16.0 13.3
47-49 10 58.2 10.2 12.7
50-52 15 65.1 14.1 11.5
53-55 40 65.5 11.5 9.9
56-58 47 65.5 8.5 8.2
59-61 64 67.5 7.5 7.4
62-64 85 70.8 7.8 6.5
65-67 130 71.4 5.4 5.6
68-70 125 73.0 4.0 4.5
71-73 155 77.1 5.1 3.4
74-76 204 78.8 3.8 2.1
77-79 192 80.7 2.7 1.2
80-82 168 80.6 -0.4 0.1
83-85 204 82.4 -1.6 -1.3
86-88 221 84.1 -2.9 -2.1
89-91 177 86.6 -3.4 -3.5
92-94 159 87.4 -5.6 -4.5
95-97 148 89.8 -6.2 -5.6
98-100 105 90.7 -8.3 -6.5
101-103 68 94.5 -7.5 -7.9
104-106 34 94.8 -10.2 -8.5
107-109 17 99.0 -9.0 -10.0
110-112 12 103.0 -8.0 -10.9
113-115 6 98.6 -15.4 -10.7
116-118 4 97.0 -20.0 -15.1
119-121 1 105.3 -14.7 -13.2

 

This method can also be used to look at player performance. The variation in batting average due to luck uses the same formula, except the probability of success is more like 0.25 rather than 0.50. For a full season of 500 at-bats, the variation in hits is the square root of 0.25 times 0.75 times 500, which is 9.7, or about 20 percentage points. The table below shows batting average by decades for all players with at least 300 appearances (at bats plus walks) in a season. Total, luck and skill are in batting average points, that is 37 is .037 on batting average. The variation in skill level has decreased, which indicates that the average level has probably increased, making it harder for the best players to exceed it. However some of the decrease may be due to power hitters who sacrifice batting average for homers. The standard deviation from year to year is 1.4 (square root of two) times the yearly value, so that means five percent of the players can change more than 60 points just due to luck.

 

Table 4

Years Players App Avg Total Luck Skill
1901-1910 1239 493 .265 37 21 30
1911-1920 1319 499 .272 36 21 29
1921-1930 1299 512 .300 35 21 28
1931-1940 1348 518 .289 32 21 24
1941-1950 1297 508 .273 30 21 22
1951-1960 1269 506 .273 30 21 21
1961-1970 1722 512 .263 30 20 22
1971-1980 2158 508 .268 30 21 21
1981-1990 2208 499 .268 28 21 19
1991-2000 2374 499 .276 31 21 22
2001-2010 2622 509 .274 28 21 18
2011-2016 1538 500 .264 30 21 21

* Note: Although the chart shows appearances (AB + walks), actual at-bats were used to calculate the variance in batting average. AB are usual around 40 fewer than appearances. Example: 1901-10 it was 456, so sigma is sqr (456 *.265 *.735)/456 o 21 pts. Appearances give credit to players who walked, but I used batting average as the criterion and didn’t want too many columns.

 

Normalized on-base plus slugging (NOPS) is a better measure of batting than average, though. The definition is on-base average (OBA) player over OBA league plus slugging average (SLG) player over SLG league minus 1, all times one hundred. The league averages do not include pitcher batting. This is then adjusted for park by dividing the player park factor (PF). PF is basically runs scored per inning at home over runs scored per inning away plus one divided by two. So a park where twenty percent more runs were scored at home than on the road would have a PF of 1.10. NOPS correlates directly with runs in that a player with a 120 NOPS produces runs at a rate 20 percent higher than the league average. The standard deviation for NOPS is a bit more complicated. Slugging average is driven by home runs, so homer hitters have a higher standard deviation.

I ran a simulation where all players each year from 1901 through 2016 with 300 or more plate appearances were run through 100 seasons. The league standard deviation came out 15 points, although it was more like 14.5 for the first half of the period and 15.5 for the last half, where homers were more frequent. If I divided the league in half by homer percentage in 2016, the top half had 16.5 and the bottom half 14.5. I will use 15 for analysis. This table shows that the variation in skill level has remained fairly constant with a slight dip recently, which as with batting average may indicate a rise in average skill.

 

Table 5

Years Players App NOPS Total Luck Skill
1901-1910 1239 493 104 28 15 23
1911-1920 1319 499 104 28 15 23
1921-1930 1299 512 104 30 15 25
1931-1940 1348 518 103 28 15 24
1941-1950 1297 508 105 27 15 23
1951-1960 1269 506 105 27 15 23
1961-1970 1722 512 105 28 15 24
1971-1980 2158 508 104 26 15 22
1981-1990 2208 499 104 25 15 19
1991-2000 2374 499 104 27 15 22
2001-2010 2622 509 104 25 15 20
2011-2016 1538 500 105 24 15 18

 

So for NOPS, five percent of the players can change 40 or more points from year to year due to luck alone.

For pitchers, I would use normalized ERA, which is simply league ERA over pitcher ERA times 100. Again, a 120 NERA will result in 20 percent less runs allowed than an average pitcher. I do have a quarrel with the way earned runs are given, though. A pitcher is always charged with runs that score by players he has allowed on base. It would be fairer if they were shared when a relief pitcher comes in. For example, if a pitcher left with the bases loaded and none out, he would be charged with 1.8 runs. This is the number of runs usually scored by the 3 runners. If the relief pitcher got out with no runs scored, he would get minus 1.8 runs.

The actual value varies a bit from year to year and league to league, and does have a random factor associated with it. If runs were individual events like goals in a hockey game, the standard deviation would be the square root of the average number of runs, but in baseball scoring one run can often lead to another and a grand slam homer can score four in one blow, so the actual value is the square root of twice the number of runs. In the bases loaded case, the 1.8 figure is 701 scored in 389 cases for 2015 AL. The square root of 1402 over 389 is about 0.1. A runner on third and none out scores about 82 percent of the time, while a runner on first with two outs comes in only 13 percent.

For ERA, the luck factor is calculated the same way. A pitcher with 180 innings and an ERA of 4.00 would have allowed 80 runs, and the luck factor would be the square root of 160 times 9 over 180 or 0.63, a fairly hefty figure. That means five percent of the pitchers could have their ERA off by more than 1.26 due to luck alone. For NERA, the luck factor if the league ERA was 4.00 would be 0.63 divided by 4.00 times 100 or 16. The table below shows all pitchers with 150 or more innings from 1901 to the present by decades.

 

Table 6

Years Players Innings ERA NERA Total Luck Skill
1901-1910 658 253 2.65 112 28 16 23
1911-1920 745 232 2.85 110 26 16 20
1921-1930 711 217 3.93 110 20 15 13
1931-1940 663 212 3.91 111 21 15 15
1941-1950 627 207 3.48 113 22 16 16
1951-1960 576 207 3.65 110 20 15 13
1961-1970 768 216 3.40 110 22 16 16
1971-1980 918 219 3.52 109 20 15 13
1981-1990 836 204 3.72 109 21 15 15
1991-2000 819 197 4.05 113 25 15 20
2001-2010 905 194 4.10 111 23 15 18
2011-2016 529 189 3.76 109 23 16 17

 

I did a study of all players with at least 300 at bats their first two years and sorted by difference in NOPS and whether they made 300 at-bats in their third year. What it showed was that most players who did worse their second year improved in their third year, while most players who did better their second year didn’t do as well in their third year. About 30 percent of those who were worse their second year did not get a third year, while only 10 percent of those who were better did. In the 30-point change area, those who improved were 37 points higher in year two, but only 5 points higher in year three. So it appears that a big improvement is considered a trend, when it is really mostly luck. The 30-point players who improved did do a little better the third year (up 7 points), while those who did worse only went up 4, but that is a pretty small difference. (See Table 7.)

 

Table 7


(Click image to enlarge)

 

A handy rule for determining simulated series winners is that the probability of winning a seven-game series is equal to twice the one game percentage minus one half. So a .550 team will win the series sixty percent of the time. If you actually do the math, it turns out that a .500 team will win .500 of the time, obviously, while a .550 team will win .608, a .600 team will win .710 and a .650 team will win .800, but the rule of thumb is close enough.

I ran four separate runs of 5000 162-game season simulations based on playoff structure. The first was one league of 30 teams with no playoffs. The winner was the first place team at the end of the season.

The next case was 2 leagues of 15 teams with the league winners in the playoff. Then I tried two leagues of two divisions each and 4 teams in the playoffs. Finally it was two leagues, three divisions and a wild card, eight teams in the playoffs. As the number of playoff teams increased, the probability of the best team making the playoffs also increased, but the likelihood of the best team winning went down.

 

Table 8

Leagues Divisions Playoff
teams
Best team
in playoffs
Best team
wins
1 1 1 .40 .40
2 1 2 .55 .33
2 2 4 .72 .28
2 3 8 .86 .22

 

In real life, we do not know which team was really the best, but if we assume that it was the team with the best record during the season, then that team has won the World Series five times since 1995 when the wild card was introduced, although in one case there was a tie for the best. That works out to about 20 percent, consistent with the above table. The average rank of the World Series winner was fourth out of eight. If the playoffs were completely random, the average rank would be 4.5. The worst team has won 4 times.

In a season, the variation due to luck is about 6.3 games or about 40 percentage points. In a 7 game series, the variation is the square root of ½ times ½ times 7 which is 1.32 games or 188 percentage points. For a single game it is the square root of ½ times ½ times 1 which is .5 wins or 500 percentage points. The average difference in team skill in a game is about 55 percentage points, but if you include home/away and variation among starting pitchers, the actual difference per game is around 100 points or one run. We established that the variation due to chance is the square root of twice the number of runs involved. This means the variation of the difference in runs for a single game would be the square root of 18 or 4.25. This is over four times the variation due to skill.

Looking at other sports as comparisons, the skill factor is much more important in basketball and football. With fewer players on a basketball team, one star player can make a big difference. Football has a much shorter season, so the luck factor is higher per game, but skill still wins out. If you assume the skill factor would be the same regardless of the length of the season, then for a 162-game season basketball would be about 150 points of winning percentage, football about 140, and baseball 55 as shown above. Trying to hit a round ball with a round bat introduces a lot of variability which does not exist in other sports.

 

Table 9.1. Basketball

Years Teams Total Luck Skill
1946-1950 48 10.03 3.85 9.26
1950-1960 88 9.25 4.20 8.24
1960-1970 103 11.90 4.49 11.02
1970-1980 192 10.92 4.53 9.94
1980-1990 236 12.35 4.53 11.49
1990-2000 280 13.23 4.44 12.47
2000-2010 296 12.17 4.53 11.29
2010-2016 180 12.58 4.45 11.76

 

Table 9.2. Football

Years Teams Total Luck Skill
1920-1929 167 2.46 1.45 1.99
1930-1939 98 2.68 1.64 2.12
1940-1949 129 3.02 1.67 2.51
1950-1959 121 2.49 1.71 1.82
1960-1969 232 2.99 1.82 2.37
1970-1979 268 3.00 1.88 2.33
1980-1989 280 2.84 1.94 2.08
1990-1999 291 2.98 2.00 2.21
2000-2009 318 3.09 2.00 2.36
2010-2015 192 3.07 2.00 2.33

 

A team’s record from year to year includes a great deal of luck, and luck contributes over four times as much as skill to a team’s eventual record. If all teams were equal, the standard deviation year to year would be 9 games (the square root of 324/4), or alternately the square root of 2 times the in season variation (square root of 162/4 or 6.36). That means every year there should be one or two teams with differences of 18 just by luck. Below are results by decade. The real difference between teams is only about 7 games a year. There were 166 teams who gained 18 or more games from one year to the next, going from 67 wins to 90 on the average. However, the next year, they dropped back to 85, just like any other team. Likewise there were 156 teams who dropped 18 or more wins, going from 91 to 68, but won 75 the following year. Wins were normalized to 162 games to allow for schedule differences.

 

Table 10

Years Teams Total Luck Skill
1901-1910 144 15.3 9.0 12.4
1911-1920 160 14.8 9.0 11.7
1921-1930 160 11.6 9.0 7.3
1931-1940 160 11.9 9.0 7.8
1941-1950 160 12.6 9.0 8.8
1951-1960 160 10.6 9.0 5.6
1961-1970 198 11.0 9.0 6.2
1971-1980 246 10.3 9.0 5.0
1981-1990 260 11.8 9.0 7.6
1991-2000 278 12.2 9.0 8.2
2001-2010 300 11.0 9.0 6.3
2011-2016 180 11.4 9.0 7.0

 

Conclusion

Most people think luck is a lot less important than it is. A team’s record from year to year includes a great deal of luck, and luck contributes about equally as skill to a team’s eventual regular season record. (And in the postseason, it’s nearly all luck.)

PETE PALMER is the co-author with John Thorn of “The Hidden Game of Baseball” and co-editor with Gary Gillette of the “Barnes and Noble ESPN Baseball Encyclopedia” (five editions). Pete worked as a consultant to Sports Information Center, the official statisticans for the American League from 1976 to 1987. Pete introduced on-base average as an official statistic for the American League in 1979 and invented on-base plus slugging (OPS), now universally used as a good measure of batting strength. He won the SABR Bob Davids Award in 1989 and was selected by SABR in 2010 as a winner of the inaugural Henry Chadwick Award. Pete also edited with John Thorn seven editions of “Total Baseball.” He previously edited four editions of the “Barnes Official Encyclopedia of Baseball” (1974–79). A member of SABR since 1973, Pete is also the editor of “Who’s Who in Baseball,” which celebrated its 101st year in 2016.