Calculating Skill and Luck in Major League Baseball

April 20, 2017/in Articles.2017-BRJ46-1 /by admin

This article was written by Pete Palmer

This article was published in Spring 2017 Baseball Research Journal

One of my favorite topics is the contributions of skill and luck in baseball. I recently ran one thousand simulations of a 162-game schedule that is the same as is currently being used in the majors (two leagues of 15 teams, three divisions each, with interleague play) where every team was the same and games could be decided by a coin flip (a random number generated by the computer). Of course you would not expect each team to win 81 games, but you would expect that the wins would show a bell-shaped curve centered at 81 wins.

This is of course what happened.

The width of the distribution is characterized by its standard deviation, defined as the square root of the sum of the squares of the difference between the team wins and 81, all divided by the number of teams. You would expect about two-thirds of the value to be within plus or minus one standard deviation, and 94 percent to be within two.

There is a formula which defines what this number should be in binomial distribution—the special case of the normal distribution where there are only two outcomes, like heads or tails, or in this case, wins or losses. The formula is simply the square root of the win probability times the loss probability times the number of trials, or games. For a 162-game season, this would be the square root of ½ times ½ times 162, which is 6.36. In my simulation, of course, each season out of the 1000 that I ran would not be exactly that value, but the number should be close. It turns out the season value was 6.35 with a range of plus or minus 0.9. If I combined the data into 10 year periods, the range was down to 0.3, as expected, reduced by the square root of 10.

If you assume luck and skill are independent — the luck factor for a season is the 6.36 wins calculated and the total variation is what happened on the field — then you can calculate the skill factor. It is simply the square root of total squared minus luck squared. I took the data by decades from 1871 to the present. The Union Association (1884) and Federal League (1914-15) were excluded.

Table 1

Years	Teams	Total	Luck	Skill
1871-1880	86	10.59	3.68	9.93
1881-1890	164	15.4	5.36	14.44
1891-1900	121	15.84	5.84	14.72
1901-1910	160	16.49	6.07	15.33
1911-1920	160	14.41	6.1	13.06
1921-1930	160	13.96	6.19	12.51
1931-1940	160	14.99	6.18	13.66
1941-1950	160	14.39	6.19	12.99
1951-1960	160	13.46	6.2	11.95
1961-1970	206	12.85	6.35	11.17
1971-1980	248	11.63	6.34	9.75
1981-1990	260	9.98	6.25	7.78
1991-2000	282	10.51	6.23	8.46
2001-2010	300	11.74	6.36	9.87
2011-2016	180	10.95	6.36	8.91

A variation in the luck factor of 0.3 wins or so would result in a change in skill of about 0.2.

What this shows is the skill factor, which was around 13 wins for 1901–1950, has been reduced significantly so that is has now been around 9 since 1971. This is a team skill factor, not a player skill factor, so basically the teams have become more evenly matched. It also shows that over the course of a full season, the skill and luck factors are almost equal. If you assume the 9 wins (.055 pct) would be constant for any length schedule, then you need 81 games before the skill factor and luck factor are equal. In 81 games, the skill factor would be 81 x .055 or 4.5 wins, while the luck factor (sqrt (81/4) would also be 4.5.

Next I modified my simulation to use teams that were not the same. I chose teams with a standard deviation equal to 9 wins derived from the previous table. I ran 1000 seasons, each with a different set of expected team win percentages. I noted the number of teams within each win range and the number of expected wins for each team that actually won games in that range. I derived a formula for the probability of one team beating another, which is the difference in overall win probability of the teams plus one half.

The results show the actual range is quite a bit broader than the expected one. From the previous table we would expect the actual range to have a standard deviation around 12, which it did. The important column is E/A, which is the expected number of wins for a team that actually won games in range. For example, a team that actually won 59 to 61 games was expected to win 67.4 games, which was 6.5 wins more than actual. Only 8 percent of the number of teams that won 59-61 games won more games than expected. Looking at the 89-91 range, those teams expected to win only 86.5 games and 70 percent of those teams won more games than expected.

Table 2

Wins	Exp	Act	E/A	Dif	More
41-43	0	4	58.0	-16.0	0.000
44-46	0	19	58.3	-13.3	0.000
47-49	0	40	60.7	-12.7	0.000
50-52	0	96	62.5	-11.5	0.000
53-55	66	194	63.9	-9.9	0.026
56-58	127	332	65.2	-8.2	0.054
59-61	267	551	67.4	-7.4	0.078
62-64	526	919	69.5	-6.5	0.103
65-67	988	1333	71.6	-5.6	0.131
68-70	1681	1847	73.5	-4.5	0.178
71-73	2333	2298	75.4	-3.4	0.237
74-76	3207	2773	77.1	-2.1	0.309
77-79	3918	3055	79.2	-1.2	0.366
80-82	3963	3091	81.1	-0.1	0.466
83-85	3789	3080	82.7	1.3	0.559
86-88	3140	2780	84.9	2.1	0.619
89-91	2356	2293	86.5	3.5	0.706
92-94	1653	1751	88.5	4.5	0.763
95-97	1011	1359	90.4	5.6	0.815
98-100	512	928	92.5	6.5	0.867
101-103	261	589	94.1	7.9	0.912
104-106	135	304	96.5	8.5	0.928
107-109	67	194	98.0	10.0	0.954
110-112	0	96	100.1	10.9	1.000
113-115	0	42	103.3	10.7	1.000
116-118	0	21	101.9	15.1	1.000
119-121	0	6	106.8	13.2	1.000
122-124	0	1	109.0	14.0	1.000

I then looked at actual data showing change in wins in the following year. I used all teams 1901 to date and found, not surprisingly, that teams tended to show wins closer to .500. In fact the change was almost identical to the change found in the simulation above. Thus I believe that the so-called regression to the mean is simply due to luck. Of course, we don’t know the true team win probabilities of the actual teams, but it does seem likely that they are similar to the simulation, and a real life 90 win team is probably an 86.5 wins team that was lucky. Wins were normalized to 162 games to account for varying length seasons. Diff1 is the difference between wins this year and next. Diff2 is the difference in team wins expected and actual from the previous table. The table below is from a much smaller sample, so the numbers would be less uniform.

Table 3

Wins	Teams	Next	Diff1	Diff2
38-40	2	66.5	27.5
41-43	4	60.5	18.5	16.0
44-46	8	61.0	16.0	13.3
47-49	10	58.2	10.2	12.7
50-52	15	65.1	14.1	11.5
53-55	40	65.5	11.5	9.9
56-58	47	65.5	8.5	8.2
59-61	64	67.5	7.5	7.4
62-64	85	70.8	7.8	6.5
65-67	130	71.4	5.4	5.6
68-70	125	73.0	4.0	4.5
71-73	155	77.1	5.1	3.4
74-76	204	78.8	3.8	2.1
77-79	192	80.7	2.7	1.2
80-82	168	80.6	-0.4	0.1
83-85	204	82.4	-1.6	-1.3
86-88	221	84.1	-2.9	-2.1
89-91	177	86.6	-3.4	-3.5
92-94	159	87.4	-5.6	-4.5
95-97	148	89.8	-6.2	-5.6
98-100	105	90.7	-8.3	-6.5
101-103	68	94.5	-7.5	-7.9
104-106	34	94.8	-10.2	-8.5
107-109	17	99.0	-9.0	-10.0
110-112	12	103.0	-8.0	-10.9
113-115	6	98.6	-15.4	-10.7
116-118	4	97.0	-20.0	-15.1
119-121	1	105.3	-14.7	-13.2

This method can also be used to look at player performance. The variation in batting average due to luck uses the same formula, except the probability of success is more like 0.25 rather than 0.50. For a full season of 500 at-bats, the variation in hits is the square root of 0.25 times 0.75 times 500, which is 9.7, or about 20 percentage points. The table below shows batting average by decades for all players with at least 300 appearances (at bats plus walks) in a season. Total, luck and skill are in batting average points, that is 37 is .037 on batting average. The variation in skill level has decreased, which indicates that the average level has probably increased, making it harder for the best players to exceed it. However some of the decrease may be due to power hitters who sacrifice batting average for homers. The standard deviation from year to year is 1.4 (square root of two) times the yearly value, so that means five percent of the players can change more than 60 points just due to luck.

Table 4

Years	Players	App	Avg	Total	Luck	Skill
1901-1910	1239	493	.265	37	21	30
1911-1920	1319	499	.272	36	21	29
1921-1930	1299	512	.300	35	21	28
1931-1940	1348	518	.289	32	21	24
1941-1950	1297	508	.273	30	21	22
1951-1960	1269	506	.273	30	21	21
1961-1970	1722	512	.263	30	20	22
1971-1980	2158	508	.268	30	21	21
1981-1990	2208	499	.268	28	21	19
1991-2000	2374	499	.276	31	21	22
2001-2010	2622	509	.274	28	21	18
2011-2016	1538	500	.264	30	21	21

* Note: Although the chart shows appearances (AB + walks), actual at-bats were used to calculate the variance in batting average. AB are usual around 40 fewer than appearances. Example: 1901-10 it was 456, so sigma is sqr (456 *.265 *.735)/456 o 21 pts. Appearances give credit to players who walked, but I used batting average as the criterion and didn’t want too many columns.

Normalized on-base plus slugging (NOPS) is a better measure of batting than average, though. The definition is on-base average (OBA) player over OBA league plus slugging average (SLG) player over SLG league minus 1, all times one hundred. The league averages do not include pitcher batting. This is then adjusted for park by dividing the player park factor (PF). PF is basically runs scored per inning at home over runs scored per inning away plus one divided by two. So a park where twenty percent more runs were scored at home than on the road would have a PF of 1.10. NOPS correlates directly with runs in that a player with a 120 NOPS produces runs at a rate 20 percent higher than the league average. The standard deviation for NOPS is a bit more complicated. Slugging average is driven by home runs, so homer hitters have a higher standard deviation.

I ran a simulation where all players each year from 1901 through 2016 with 300 or more plate appearances were run through 100 seasons. The league standard deviation came out 15 points, although it was more like 14.5 for the first half of the period and 15.5 for the last half, where homers were more frequent. If I divided the league in half by homer percentage in 2016, the top half had 16.5 and the bottom half 14.5. I will use 15 for analysis. This table shows that the variation in skill level has remained fairly constant with a slight dip recently, which as with batting average may indicate a rise in average skill.

Table 5

Years	Players	App	NOPS	Total	Luck	Skill
1901-1910	1239	493	104	28	15	23
1911-1920	1319	499	104	28	15	23
1921-1930	1299	512	104	30	15	25
1931-1940	1348	518	103	28	15	24
1941-1950	1297	508	105	27	15	23
1951-1960	1269	506	105	27	15	23
1961-1970	1722	512	105	28	15	24
1971-1980	2158	508	104	26	15	22
1981-1990	2208	499	104	25	15	19
1991-2000	2374	499	104	27	15	22
2001-2010	2622	509	104	25	15	20
2011-2016	1538	500	105	24	15	18

So for NOPS, five percent of the players can change 40 or more points from year to year due to luck alone.

For pitchers, I would use normalized ERA, which is simply league ERA over pitcher ERA times 100. Again, a 120 NERA will result in 20 percent less runs allowed than an average pitcher. I do have a quarrel with the way earned runs are given, though. A pitcher is always charged with runs that score by players he has allowed on base. It would be fairer if they were shared when a relief pitcher comes in. For example, if a pitcher left with the bases loaded and none out, he would be charged with 1.8 runs. This is the number of runs usually scored by the 3 runners. If the relief pitcher got out with no runs scored, he would get minus 1.8 runs.

The actual value varies a bit from year to year and league to league, and does have a random factor associated with it. If runs were individual events like goals in a hockey game, the standard deviation would be the square root of the average number of runs, but in baseball scoring one run can often lead to another and a grand slam homer can score four in one blow, so the actual value is the square root of twice the number of runs. In the bases loaded case, the 1.8 figure is 701 scored in 389 cases for 2015 AL. The square root of 1402 over 389 is about 0.1. A runner on third and none out scores about 82 percent of the time, while a runner on first with two outs comes in only 13 percent.

For ERA, the luck factor is calculated the same way. A pitcher with 180 innings and an ERA of 4.00 would have allowed 80 runs, and the luck factor would be the square root of 160 times 9 over 180 or 0.63, a fairly hefty figure. That means five percent of the pitchers could have their ERA off by more than 1.26 due to luck alone. For NERA, the luck factor if the league ERA was 4.00 would be 0.63 divided by 4.00 times 100 or 16. The table below shows all pitchers with 150 or more innings from 1901 to the present by decades.

Table 6

Years	Players	Innings	ERA	NERA	Total	Luck	Skill
1901-1910	658	253	2.65	112	28	16	23
1911-1920	745	232	2.85	110	26	16	20
1921-1930	711	217	3.93	110	20	15	13
1931-1940	663	212	3.91	111	21	15	15
1941-1950	627	207	3.48	113	22	16	16
1951-1960	576	207	3.65	110	20	15	13
1961-1970	768	216	3.40	110	22	16	16
1971-1980	918	219	3.52	109	20	15	13
1981-1990	836	204	3.72	109	21	15	15
1991-2000	819	197	4.05	113	25	15	20
2001-2010	905	194	4.10	111	23	15	18
2011-2016	529	189	3.76	109	23	16	17

I did a study of all players with at least 300 at bats their first two years and sorted by difference in NOPS and whether they made 300 at-bats in their third year. What it showed was that most players who did worse their second year improved in their third year, while most players who did better their second year didn’t do as well in their third year. About 30 percent of those who were worse their second year did not get a third year, while only 10 percent of those who were better did. In the 30-point change area, those who improved were 37 points higher in year two, but only 5 points higher in year three. So it appears that a big improvement is considered a trend, when it is really mostly luck. The 30-point players who improved did do a little better the third year (up 7 points), while those who did worse only went up 4, but that is a pretty small difference. (See Table 7.)

Table 7

(Click image to enlarge)

A handy rule for determining simulated series winners is that the probability of winning a seven-game series is equal to twice the one game percentage minus one half. So a .550 team will win the series sixty percent of the time. If you actually do the math, it turns out that a .500 team will win .500 of the time, obviously, while a .550 team will win .608, a .600 team will win .710 and a .650 team will win .800, but the rule of thumb is close enough.

I ran four separate runs of 5000 162-game season simulations based on playoff structure. The first was one league of 30 teams with no playoffs. The winner was the first place team at the end of the season.

The next case was 2 leagues of 15 teams with the league winners in the playoff. Then I tried two leagues of two divisions each and 4 teams in the playoffs. Finally it was two leagues, three divisions and a wild card, eight teams in the playoffs. As the number of playoff teams increased, the probability of the best team making the playoffs also increased, but the likelihood of the best team winning went down.

Table 8

Leagues	Divisions	Playoff teams	Best team in playoffs	Best team wins
1	1	1	.40	.40
2	1	2	.55	.33
2	2	4	.72	.28
2	3	8	.86	.22

In real life, we do not know which team was really the best, but if we assume that it was the team with the best record during the season, then that team has won the World Series five times since 1995 when the wild card was introduced, although in one case there was a tie for the best. That works out to about 20 percent, consistent with the above table. The average rank of the World Series winner was fourth out of eight. If the playoffs were completely random, the average rank would be 4.5. The worst team has won 4 times.

In a season, the variation due to luck is about 6.3 games or about 40 percentage points. In a 7 game series, the variation is the square root of ½ times ½ times 7 which is 1.32 games or 188 percentage points. For a single game it is the square root of ½ times ½ times 1 which is .5 wins or 500 percentage points. The average difference in team skill in a game is about 55 percentage points, but if you include home/away and variation among starting pitchers, the actual difference per game is around 100 points or one run. We established that the variation due to chance is the square root of twice the number of runs involved. This means the variation of the difference in runs for a single game would be the square root of 18 or 4.25. This is over four times the variation due to skill.

Looking at other sports as comparisons, the skill factor is much more important in basketball and football. With fewer players on a basketball team, one star player can make a big difference. Football has a much shorter season, so the luck factor is higher per game, but skill still wins out. If you assume the skill factor would be the same regardless of the length of the season, then for a 162-game season basketball would be about 150 points of winning percentage, football about 140, and baseball 55 as shown above. Trying to hit a round ball with a round bat introduces a lot of variability which does not exist in other sports.

Table 9.1. Basketball

Years	Teams	Total	Luck	Skill
1946-1950	48	10.03	3.85	9.26
1950-1960	88	9.25	4.20	8.24
1960-1970	103	11.90	4.49	11.02
1970-1980	192	10.92	4.53	9.94
1980-1990	236	12.35	4.53	11.49
1990-2000	280	13.23	4.44	12.47
2000-2010	296	12.17	4.53	11.29
2010-2016	180	12.58	4.45	11.76

Table 9.2. Football

Years	Teams	Total	Luck	Skill
1920-1929	167	2.46	1.45	1.99
1930-1939	98	2.68	1.64	2.12
1940-1949	129	3.02	1.67	2.51
1950-1959	121	2.49	1.71	1.82
1960-1969	232	2.99	1.82	2.37
1970-1979	268	3.00	1.88	2.33
1980-1989	280	2.84	1.94	2.08
1990-1999	291	2.98	2.00	2.21
2000-2009	318	3.09	2.00	2.36
2010-2015	192	3.07	2.00	2.33

A team’s record from year to year includes a great deal of luck, and luck contributes over four times as much as skill to a team’s eventual record. If all teams were equal, the standard deviation year to year would be 9 games (the square root of 324/4), or alternately the square root of 2 times the in season variation (square root of 162/4 or 6.36). That means every year there should be one or two teams with differences of 18 just by luck. Below are results by decade. The real difference between teams is only about 7 games a year. There were 166 teams who gained 18 or more games from one year to the next, going from 67 wins to 90 on the average. However, the next year, they dropped back to 85, just like any other team. Likewise there were 156 teams who dropped 18 or more wins, going from 91 to 68, but won 75 the following year. Wins were normalized to 162 games to allow for schedule differences.

Table 10

Years	Teams	Total	Luck	Skill
1901-1910	144	15.3	9.0	12.4
1911-1920	160	14.8	9.0	11.7
1921-1930	160	11.6	9.0	7.3
1931-1940	160	11.9	9.0	7.8
1941-1950	160	12.6	9.0	8.8
1951-1960	160	10.6	9.0	5.6
1961-1970	198	11.0	9.0	6.2
1971-1980	246	10.3	9.0	5.0
1981-1990	260	11.8	9.0	7.6
1991-2000	278	12.2	9.0	8.2
2001-2010	300	11.0	9.0	6.3
2011-2016	180	11.4	9.0	7.0

Conclusion

Most people think luck is a lot less important than it is. A team’s record from year to year includes a great deal of luck, and luck contributes about equally as skill to a team’s eventual regular season record. (And in the postseason, it’s nearly all luck.)

PETE PALMER is the co-author with John Thorn of “The Hidden Game of Baseball” and co-editor with Gary Gillette of the “Barnes and Noble ESPN Baseball Encyclopedia” (five editions). Pete worked as a consultant to Sports Information Center, the official statisticans for the American League from 1976 to 1987. Pete introduced on-base average as an official statistic for the American League in 1979 and invented on-base plus slugging (OPS), now universally used as a good measure of batting strength. He won the SABR Bob Davids Award in 1989 and was selected by SABR in 2010 as a winner of the inaugural Henry Chadwick Award. Pete also edited with John Thorn seven editions of “Total Baseball.” He previously edited four editions of the “Barnes Official Encyclopedia of Baseball” (1974–79). A member of SABR since 1973, Pete is also the editor of “Who’s Who in Baseball,” which celebrated its 101st year in 2016.

Search the Research Collection

SABR Analytics Conference

Calculating Skill and Luck in Major League Baseball

Support SABR today!