Why OPS Works
This article was written by Pete Palmer
This article was published in Fall 2019 Baseball Research Journal
Pete Palmer, the inventor of OPS (on-base plus slugging), explains how the offensive statistic was developed and why it remains robustly in use in the 21st century.In this paper I’ll examine OPS (on-base plus slugging) and not only why I believe that the stat remains robustly in use in the twenty-first century, but how it was developed in the first place. I will recap my own early research and that of others in trying to relate batting performance to wins. Many formulas and schemes have been calculated both by me and others over the years, but the marginally more accurate methods are also more likely to be difficult to calculate or to understand, resulting in lower popularity.
I started trying to relate batting to team wins back in the 1960s. I had determined that 10 extra runs over the course of a season resulted in one more win for the team. A slightly more accurate method would be to use 10 times the square root of runs per inning by both teams—a figure usually between 9 and 11. The standard deviation of the difference between expected and actual wins from 1960 through 2018 was 4.03 for the simple method and 3.97 for the advanced one. Compare that to the “Pythagorean” method (runs squared over the sum of runs squared and runs allowed squared) where it was 4.07. This could be slightly improved to 3.99 by using 1.83 for the exponent instead of 2. Both simple and advanced methods had the same number of wins for 1016 out of 1552 teams and were off by 2 or 3 for only 47 teams. Only one team since 1971 differed by more than 2 wins: the 1996 Tigers, who allowed 1103 runs. Ten runs per win gave them 53 wins while Pythagoras had 56. They actually won 53. The runs over ten method was easier to calculate than other methods and also easier to relate individual performance to team performance.
My next step was to derive team runs from player stats. We had no play-by-play data available in those days except for World Series games as published in the annual baseball guides, so I started there. I analyzed 34 games 1956–60. I calculated the probability of scoring from each base depending on the number of outs. Later, I ran a paper-and-pencil simulation, using the various advanced data I compiled from the Series data, such as first to third on a single, first to home on a double, taking a base on an out, etc. The simulation also gave me the number of times each base-out situation occurred, so I determined how the scoring probabilities increased for each batting event. For example, a walk to lead off the inning would increase the scoring probability for that batter from .16 to .39—an increase of .23. If there was a runner on first, that runner would go from .39 to .62 for a total of .46. A single would be the same except a runner on first would go to third 45 percent of the time and the increase would be .57. I then summed up each situation weight by its frequency to get an overall value. A single came out .37 runs.
However, when I tried to project the number of team runs from its batting statistics, I did not find as close relationship as I expected. Teams with high on base percentages would predict too low and vice-versa. I realized I was doing something wrong. A player can produce runs for his team three ways:
1) He can advance himself around the bases.
2) He can advance teammates around the bases.
3) He can cause other batters to get up by not making an out.
This third factor could never be measured by just using base advances. What I needed to calculate was the number of runs expected from each starting point through the end of the inning. These vary from year to year and league to league depending on the average batting. The expected number of runs to be scored from leadoff position is usually around .50, which is simply the league total of runs per inning. With play-by-play data for hundreds of thousands of regular season games now available through Retrosheet, we can do this easily for any season. Since I didn’t have those data then, I expanded my paper-and-pencil simulation to calculate the values. Under the revised values, when a walk led off the inning, the run potential went from .48 to .82—an increase of .34.
Each positive batting event has an average increase of about .50 runs, while a negative event is around minus .25. An average player with an OBA of .333 will therefore have a net value of zero. This can be higher or lower based on the distribution of walks and hits. A portion of that comes from allowing extra batters to come up. Each batter is worth about .16 runs with none out, .12 with one out, and .08 with two outs, for an average of .12. But there is also the possibility of more than one additional batter appearing. This is a converging infinite series of 1 + 1/3 + 1/9 + 1/27 etc… which sums to one and a half. So each time the batter gets on base, he adds 1.5 batters and when he goes out, he adds none, for an average of one half batter per appearance. Getting on base adds one batter and going out subtracts a half from the average. So of the .50 runs for a positive event, .12 is due to the extra batters and .38 is due to advances of the batters and other runners if any. That is the power of on base average (OBA).
It would still be a few more years before I would settle on the power of OPS, though. I made two mistakes in my original research. In the World Series data, there were only 5 cases of a runner on first when the batter doubled, and only one scored. So I used 20 percent for that. When more data became available, I found the real number was 40 percent, unless there were two outs when it was 60 percent. I did not consider intentional walks, since they had been first compiled in 1955. Worse, I made the bad assumption that walks occurred equally in all base-out situations. Actually walks—intentional or not—are more likely to occur in low value situations and less likely to occur in high value situations, thus the value of a walk was too high by about 10 percent. You can find a detailed study of intentional walks in the July 2017 edition of By the Numbers, the Statistical Analysis Committee bulletin. On average, intentional walks are worth about .15 runs.
However, you really have to look at the context of all walks, since good hitters are more apt to be walked in less favorable situations and also get more unintentional walks in intentional walk situations. Over the years the values were refined a bit, and by 1984 when John Thorn and I were writing The Hidden Game of Baseball, I had settled on .47 for a single, .83 for a double, 1.02 for a triple, 1.40 for a homer and .33 for a walk. The out factor was calculated to make the league total zero with pitcher batting subtracted and was usually around –.25. Thus an average player had a rating of zero. Subtracting pitcher batting puts the two leagues on an equal basis in the designated hitter era and also allows batters to be compared only with other non-pitchers for all years. Outs were defined as at bats minus hits. I called these linear weights. Although baseball is not linear, the values remain relatively constant over the range of environments found in normal play. High scoring years would result in the positive event values slightly higher and the out value slightly more negative. You can also calculate values for various other events. Most don’t have much influence on team stats. Stolen bases are more important for individuals. In the Deadball Era, everybody stole, but today they are more specialized. A few players may gain a fair number of runs from stealing. I used .22 for stolen bases and –.38 for caught stealing, although the impact on wins could be higher, since steals are more apt to occur in close games. The standard deviation in deriving team runs using the simple linear weight method is about 22 runs on the season. Adding steals and outs on base from caught stealing, double plays, and other items reduces the value to 20, which is about as low as you can get.
When correlating various measures to team runs, you can use runs per game, but a better method is to use runs per innings batted. Innings batted can vary based on extra innings games and games won, since a home team does not bat in the last of the ninth if ahead. In fact, you can deduce wins for the season with a standard deviation of about two if you use innings batted and innings pitched. This is half the value found by using runs scored and allowed. Wins equals games over two plus innings pitched minus innings batted.
W = Games/2 + IP – IB
It does not work for teams with an imbalance of home and away games like in the strike year of 1994, since the real difference uses home wins and road losses. Innings batted can be calculated easily if team left on base is known. LOB has been kept since 1920, although in the early years the official figures had a lot of errors. Retrosheet’s box score project, headed by Tom Ruane, now has accurate LOB calculated from team data back to 1906. Innings batted is equal to plate appearances minus runs minus left on base, all divided by three.
IB = (PA – R – LOB)/3
Thus you can estimate innings batted when LOB is not available by taking innings pitched minus one half of wins minus losses. The only unknown variable is the number of outs in games won or lost in the bottom of the last inning. Innings batted per game has a standard deviation of about one percent, which would be 14 innings per year, equivalent to about 7 runs.
But getting back to OPS. Soon after SABR was founded in 1971, Dick Cramer suggested a statistical analysis committee and I became the chairman. Dick recently published his autobiography titled When Big Data Was Small, which covered his work in baseball and science as well as his personal life. In it he mentions a paper written by his friend Paul Bamberg in 1959 for a science project. Unfortunately, Paul was ahead of his time and did not have help from people like Bob Davids, Bill James, John Thorn, or Gary Gillette to spread the word, so his work languished in obscurity until being included in Dick’s book in 2019. Others may have had the same problem of finding an outlet for their work. I was doing range factors in the 1960s and almost got an article on batting and pitching in The Sporting News in 1969, but they chickened out because they thought it was too complicated. George Lindsay published in Operations Research, but not many people noticed. Earnshaw Cook had to put out his own book and was then helped by Frank Deford, who noticed it and did a nice article in Sports Illustrated. Bill James also was aided when Dan Okrent did a piece there about his work.
At the time SABR’s publications concentrated on historical rather than analytical work, and the Statistical Analysis Committee did not yet publish its own bulletin.1 In 1973 Cramer contacted me about research he had done which showed that team runs were proportional to the product of on base percentage and slugging percentage. Dick created a simulation to measure this, entering individual batting data for Babe Ruth and others and calculating how many runs would score. I had reached a similar conclusion with my work with linear weights. I was looking at team runs as a function of their stats. We did a joint article in SABR’s Baseball Research Journal in 1974, coining the term Batter’s Run Average.
On base times slugging (OxS) exaggerates the individual player’s contribution when a team of nine identical players is used in the simulation. When I ran Ruth on the 1920 season, adding him to an average team added .79 runs per game while a team of nine Babes scored 14.11 runs per game—10.12 runs more than average or 1.12 runs per player per game, 44 percent higher, since he had the benefit of other Ruths on the team. Ruth’s on-base was 50% higher than the league and his slugging was double. The normalized formula for OxS is OBA/lg times SLG/lg, where lg is the league average. This would mean 3 times the number of runs. For NOPS (normalized OPS) the formula is OBA/lg plus SLG/lg minus 1, or 2.5, which is about what he had. So by 1978 I had converted to OPS, which has the advantage of being easy to calculate and relates individual performance directly to team wins.
In 1920, Ruth was 110 linear weight runs above average, but he was helped considerably by playing in the Polo Grounds. His OPS at home was an incredible 1.535. Anyone who thinks Ruth and Gehrig were helped by the short right field porch in Yankee Stadium is mistaken. Most players have an OPS at home about 5 percent better than on the road. Ruth’s career figure in Yankee Stadium was only 2 percent higher, while Gehrig was actually 2 percent worse at home.
It turns out that the normalized version of OPS is directly proportional to a batter’s the contribution to team wins. A player with a normalized OPS of 110 percent wil on average contribute 10 percent more runs than the average player. A player with an OBP and slugging each 10 percent higher than league average will have a normalized OPS that is 20 percent higher than average and will produce 20 percent more runs. Using raw OPS, 10 percent higher in each would mean the normalized version would also be 10 percent higher, which would also produce 20 percent more runs.
In OPS, a walk counts one for on-base and zero for slugging, while any hit counts one for on base and the number of bases for slugging. So counting both OBA and SLG, a single counts 2 and a homer 5. These are in about the same proportion as in linear weights. That is why OPS works almost as well.
Using an equation where OBP is multiplied by a factor (OBP times F plus SLG) gives a slightly more accurate correlation to actual team runs. However, the difference is very small. The standard deviation for the team runs projection per year for 1960–2018 is between 24.9 and 25.2 for any value of the multiplier between 1.4 and 2.4, but doing this complicates the calculation. Counting both equally is off by 26.4, only a little higher. Using OxS, you get a value of 25.4, a bit lower. If I had used 1.8 x OBP plus SLG, I don’t think OPS would have caught on so well. It is possible to adjust the OBP by adding stolen bases over two minus caught stealing minus grounded into double plays. This reduces the standard deviation by about half a run.
Using the normalized method helps reduce this error, since by dividing by the league average makes 33 points of OBA equivalent to about 42 points of slugging, a factor of 1.3. OBA and slugging are highly correlated, since each is very dependent on hits over at-bats, so the multiplying factor has little effect.
Taking an average player and adding ten at-bats reduces his OPS by about 12 points. Adding 10 walks raises it by 11 points, while 10 singles adds 22 points. Doubles, triples, and homers increase the value by 40, 59, and 77 points respectively. This shows a ratio of 1-2-3-5-7, a bit higher than the 1-2-3-4-5 factors for linear weights. Thus slugging is little heavier than it should be. Tom Tango addressed this problem in his wOBA calculation. He took the linear weight values for each event and created a pseudo OBA. The result looks very much like linear weight runs per appearance plus league average OBA. Tom also made an allowance for the fact that walks are more apt to occur in low value situations by reducing their value.
You can adjust for parks effects for either of these. The simple way of calculating park factor (PF) is to take runs scored per game by both teams in home games compared to road games. The park adjustment factor (PA) is that ratio plus one divided by two, since half the games are played at home and the road park factor is pretty close to one. Adjusted NOPS is just NOPS/PA. But park factor itself has a rather large error due to chance. Dallas Adams had a 1983 article in the Baseball Analyst which showed the run distribution per game for various levels of team scoring.2 From that I deduced that the standard deviation of runs in a game was equal to the square root of twice the number of runs. This is very handy when figuring if a difference in runs under various conditions is significant, either in a simulation or real life. So for a particular park, if 700 runs were scored by both teams in home games, the standard deviation of the total would be around 37. But when comparing it to road games it would be higher by the square root of two, since you are comparing two samples. This comes out to be around 52. The standard deviation of the yearly park factor itself due to chance would be around 52/700, about 7 percent. The total difference in parks is only about 10 percent. So that means the real difference between parks is also 7 percent, as the total difference squared is equal to the actual difference squared plus the random difference squared. So you have to use a park factor over several years to get a better estimate of the true value. To adjust straight OPS, you divided by the square root of the park adjustment.
If you look at park factor for all decades by club since 1901, the standard deviation is 7, which means two thirds of the teams fall between 107 and 93. The random factor is reduced to about 2 by using a ten year period.3
Dick Cramer and I were far from the only ones to make attempts over the years at coming up with a formula for relating batting stats to team runs. The legendary F.C. Lane of Baseball Magazine had some articles in the 1910s, which included run values for various events. George Lindsey did work on scoring probabilities, winning percentage, and hit values in the 1960s. Both sets of event values were very close to mine, although neither had a negative value for an out. Earnshaw Cook in the 1960s came up with a formula for what he called DX, which was number of times on base multiplied by total bases. Bill James’s runs created is on-base average times slugging average times at-bats.
Branch Rickey had a famous piece in Life magazine in 1954 which referred to research by scientists at Princeton, although Allan Roth, longtime staff statistician for the Dodgers, told us at a SABR meeting years ago that he was actually the uncredited inventor of the formula. It was close to OPS, as it used hits plus walks plus three-quarters times extra bases over at-bats. Extra bases count one for a double, two for a triple and three for a home run, otherwise known as isolated power. I suspect he did that because he didn’t want to count hits twice—however he should have. Using OBA plus three-fourths ISO gave a standard deviation of 37 runs per season, while OBA plus three-fourths SLG came out 28.
Another feature of the Rickey article was a listing of career leaders in on base average, perhaps the first time that had ever been presented. I did an article on it in the Baseball Research Journal in 1973 and, as a consultant to the American League, helped introduce OBA as an official statistic in 1979. The National League didn’t publish it until 1984 and The Sporting News didn’t show it in the Baseball Guide until 1987, covering 1986. I had not counted sacrifice flies as outs in the AL version, but the NL did. When it finally made the guide, the NL version was used. This had one big problem. The sacrifice fly rule was in effect from 1908 through 1930, 1939, and 1954 to date. But in the first two instances, sacrifice flies were not recorded separately from bunts. Although Retrosheet has filled in much of the older play-by-play, we can never determine the exact OBA for Babe Ruth, Ty Cobb, Lou Gehrig, or other players from that era. The only way to do it is assume no sacrifice flies. Ten sacrifice flies would reduce OBA by about five points.
Bob Creamer in Sports Illustrated in 1956 invented a very simple measure for players called runs produced, which was simply runs plus runs batted in minus home runs. Home runs were subtracted because that would give credit to the same run twice. However, these values turn out to be very close to the linear weight values. A walk is about .25 runs, a single .25 runs and .25 RBI, a double .4 runs and .4 RBIs, and a triple .6 run and .6 RBIs. However a homer is 1 run and 1.6 RBIs, which is 2.6, well above the linear weight value 1.4, so subtracting homers brings runs produced pretty much in line. A homer gets too much credit for RBI, since many of those runs would have scored anyway without the homer and a single too little because advancing a runner from first to second or third results in neither a run nor an RBI.
Steve Mann was one of the first analysts employed by a club—the Astros in the 1970s. (Steve and I created the BACBALL program for charting batted balls and pitches which the Phillies and Braves used for a number of years in the ’80s.) He developed Run Productivity Average which was linear weights that were equal to the average number of runs and RBIs for each event. But it was fairer because it did not favor players who have more opportunities for runs or RBIs because of their team or their lineup position. However I believe it over-values home runs.
Chuck Mullen invented a system in the 1960s with linear weights for each event multiplied by a clutch factor depending on the inning, score, and base-out situation. Bob Sudyk wrote a story in The Sporting News about it, although General Electric, who had the computer, got the credit and Chuck wasn’t even mentioned. The Cardinals used it for while then, and Chuck and I revived it in the 1990s for the Astros.
And the list goes on. Barry Codell invented Base-Out Percentage in the late 1970s, basically bases over outs. Tom Boswell’s Total Average came out about the same time and was similar. None of them really caught on, because I suspect people weren’t ready for them. OPS combines simplicity with reasonable accuracy and I think that is why it is popular.
PETE PALMER is the co-author with John Thorn of “The Hidden Game of Baseball” and co-editor with Gary Gillette of “The Barnes and Noble ESPN Baseball Encyclopedia” (five editions). Pete worked as a consultant to Sports Information Center, the official statisticians for the American League 1976–87. Pete introduced on-base average as an official statistic for the American League in 1979 and invented on-base plus slugging (OPS), now universally used as a good measure of batting strength. Among his many accolades, he won the SABR Bob Davids Award in 1989 and was selected as an inaugural recipient of the Henry Chadwick Award.
Sources
Adams, Dallas (1987). The distribution of runs scored. Baseball Analyst, Vol. 1, Num. 1 pages 8-10.
Bamberg, Paul (1959). “Mathematical Analysis of Batting Performance.”
Codell, Barry (1979). “The Base-Out Percentage.” SABR Baseball Research Journal, No. 8, pages 35-39.
Cook, Earnshaw with Wendell R. Garner (1966). Percentage Baseball. Cambridge, MA: MIT Press.
Cramer, Richard D. and Pete Palmer (1974). “The Batter’s Run Average (BRA).” SABR Baseball Research Journal, No. 3, pages 50-57.
Cramer, Richard D. (2019). When Big Data Was Small, Lincoln, NE; University of Nebraska Press.
Deford, Frank (1964). “Baseball is Played All Wrong,” Sports Illustrated, Vol 20, issue 12, pages 14-17.
James, Bill (1978). The 1978 Baseball Abstract. Lawrence, KS: Bill James.
Krabbenhoft, Herm (2009). “Who Invented Runs Produced?” SABR Baseball Research Journal, No. 38, pages 135-138.
Lane, F. C. (1917, January). “Why the system of batting averages should be reformed.” Baseball Magazine, pages 52-60.
Lane, F. C. (1917, March). “The Base on Balls.” Baseball Magazine, pages 93-95.
Lindsey, G. R. (1959). “Statistical Data Useful for the Operation of a Baseball Team.” Operations Research, Vol. 7, pages 197-207.
Lindsey, George R. (1963). “An Investigation of Strategies in Baseball.” Operations Research, Vol. 11, pages 477-501.
Mann, Steve (2005). Interview. http://hendricks-sports.com/interview2.html.
Okrent, Dan (1981). “He Does It by The Numbers.” Sports Illustrated, May 25, 1981.
Palmer, Pete (1973). “On-base Average.” SABR Baseball Research Journal, No. 2, pages 87-91.
Palmer, Pete (1978). “AL Home Park Effects on Performance.” SABR Baseball Research Journal, No. 7, pages 50-60.
Palmer, Pete (2009). “McCracken and Wang Revisited.” By the Numbers, Vol. 19 No. 1, pages 9-13.
Palmer, Pete (2017). “Intentional Walks Revisited.” By the Numbers, Vol. 27 No. 1, pages 16-25.
Pankin, Mark (2004). “Relative value of on-base pct. and slugging avg.” Presented at the annual SABR convention and available at http://www.pankin.com/baseball.htm
Pankin, Mark (2005). “More on OBP vs. SLG.” By the Numbers, Vol. 15 No. 4, pages 13-15.
Pankin, Mark (2006). “Additional on-base worth 3x additional slugging?” Presented at the 2006 SABR convention and available at the Retrosheet research page.
Rickey, Branch (1954, August 2). “Goodby to some old baseball ideas.” Life, pages 78-86 and 89.
Sudyk, Bob (1966, April 16). “Computer Picks Top Clutch Hitters,” The Sporting News, pages 13 and 20.
Tango, Tom, Mitchel G. Lichtman and Andrew E. Dolphin (2006). The Book: Playing the Percentages in Baseball. TMA Press.
Thorn, John, and Pete Palmer (1984). The Hidden Game of Baseball. Garden City, NY: Doubleday.
Wang, Victor (2006). “The OBP/SLG ratio: What does history say?” By the Numbers, Vol. 16 No. 3, pages 3-4.
Wang, Victor (2007). “A closer look at the OBP/SLG ratio.” By the Numbers, Vol. 17 No. 1, pages 10-14.
Notes
1 In the early days of the Statistical Analysis Committee, I suggested that SABR members send their work to Bill James, who had started his Baseball Abstract in 1977 and began the Baseball Analyst with input from his readers in 1982. This ran through 1989. Don Coffin then started the SABR version as the stats committee bulletin and called it By The Numbers. Don published fairly regular issues of the bulletin through 1995. Neal Traven took over in 1997. Phil Birnbaum assumed charge of the bulletin in 1999 and restarted quarterly issues in 2001, although since 2009 the number of bulletins has been reduced to one or two per year, due to lack of contributions. He later became chairman.
2 Dallas Adams, “Team Won/Lost Percentage as a Function of Runs and Opponents Runs,” Baseball Analyst newsletter, April 1983 issue, pages 10-12.
3 The top 17 park factors are all from Colorado, covering every overlapping decade of their existence, with values around 130. In 1996 the Denver ballpark averaged 15 runs per game. Colorado is the only extreme park since the leagues started an unbalanced schedule in 1969, so you have to calculate each team’s road factors separately, since having Colorado as a road park raises the road average by about five points. The only other park over 120 was Philadelphia’s Baker Bowl, which dates back to 1895, but did not become a hitter’s park until around 1915. It was replaced by Shibe Park in 1938. The low end is monopolized by the Dodgers in the 60s and Padres in the 90s at around 85.