Using Z-Scores to Measure Player Performance
This article was written by Benjamin Alter
This article was published in Spring 2025 Baseball Research Journal
In recent years, statistics have been developed to facilitate comparisons of player performances across seasons and across generations. One such statistic, OPS+, places a players’ OPS (on-base plus slugging) into the context of the league’s OPS, adjusted by a park factor. An .800 OPS in a pitcher’s year such as 1968 results in a higher OPS+ than an .800 OPS in a hitter’s year such as 1930. Unfortunately, OPS+ underplays the true measure of the outstanding achievement of exceptional events. For instance, Barry Bonds’ OPS+ of 268 in 2002 is the all-time record (excepting the Negro League statistics).1 He broke his own record of 259, set in 2001, which broke Fred Dunlap’s record of 256, set in 1884 with the St. Louis Maroons of the Union Association. However, this paper will demonstrate that Bonds’ OPS+ in 2002 was not incrementally better than Ruth’s career-best 255 OPS+ in 1920; it reached a whole new level of performance.2 In addition, Dunlap’s OPS+ does not reflect the weakness of the UA.
Raw statistics also suffer from a lack of context. Babe Ruth’s 54 home runs in 1920 were almost 15% of the home runs hit by the entire AL. But on the all-time record charts, it sits astride Jose Bautista’s 54 home runs in 2010, which were but 2.4% of the AL’s total that season. Fortunately, there is a basic statistic that puts these and other accomplishments into other proper perspective: the z-score.
USING Z-SCORES TO ASSESS HITTERS’ PERFORMANCES
Z-scores derive from standard deviations (SDs), which measure the variation of values from the mean of a dataset. Also known as “standard scores,” z-scores measure the number of SDs by which data points deviate from the mean. The lower the SD, the more data in a dataset cluster near the mean.
To illustrate the concept, below are two sets of hitters. Each set has a mean batting average of .270, and a hitter, Hitter A, who batted .310. The data in Set 2 are clustered closer together and therefore their .022 SD is smaller than .032 of Set 1. In Set 1, Hitter A’s batting average z-score is 1.25, which was calculated as follows:
(.310 – .270) / .032 = 1.25
In Set 2, Hitter A’s batting average z-score is 1.82, which was calculated as follows:
(.310 – .270) / .022 = 1.82
Hitter A in Set 2 had the more impressive season than Hitter A in Set 1 because his .310 average was much better than the other hitters, which can be deduced intuitively and is quantified by his higher z-score.
Table 1 shows the expected spread of data in a normal distribution. Most data will fall within one SD of the mean. The vast majority of datapoints fall within three SDs of the mean. To fall four or more SDs from the mean is extraordinary in a normal distribution. SDs at or beyond five SDs should be next to impossible in datasets such as the ones analyzed in this paper.
In this study, z-scores were calculated and analyzed for nine statistics:
- Batting Average (BA)
- On-Base Average (OBA)
- Slugging Average (SLG)
- OBA + SLG (OPS)
- Home Runs
- Runs Batted In (RBIs)
- Earned Run Average (ERA)
- Total Strikeouts (SO)
- Strikeouts per Nine Innings Pitched (SO/9)
PRIOR STUDIES USING Z-SCORES
Many researchers have applied z-scores to baseball statistics. Cottone & Wirchin (2015) used similar methodologies employed in this paper.3 However, their study was more narrowly focused and therefore missed certain significant data. For instance, in the Deadball Era, which they defined as 1901 to 1919, they only evaluated the home run z-scores for Gavy Cravath in 1915 and Babe Ruth in 1919.
Several other studies focused on one or a few particular statistics. Taylor and Krevisky (2006) calculated z-scores for home runs and slugging average to select the greatest sluggers of all time.4 James (2013) calculated OPS z-scores throughout baseball history.5 Colston (2011) calculated z-scores for batting average, home runs, and RBIs of selected player-seasons to determine which hitter had “the best season ever.”6 Schulman (2022) compared the z-scores for the record-breaking home run seasons of Babe Ruth in 1919 and Barry Bonds in 2001.7 Szymborski (2020) used z-scores to demonstrate the difficulty in batting .400.8 Cohen (2020) used z-scores from 2019 to predict players’ performances in the upcoming 2020 season.9 Kelly (2021) conducted a similar study to predict player performances in the upcoming 2022 season.10
METHODOLOGY
To calculate z-scores, a dataset’s mean and SD must be known. Each league’s average BA, OBA, SLG, and OPS for a given season are readily available, but SDs must be calculated. The SD of a dataset varies based on what is included in the dataset. A cutoff must be drawn to avoid including players with just a few at-bats, who would overwhelm the database and render the results useless.
Using only hitters who qualified for the batting title was rejected as an option because the criterion to qualify for batting titles has changed over time. The current criterion has been in place since 1957. Several different rules prevailed before 1957, and the NL and AL had different criteria from 1936 to 1949. Inconsistency in the datasets would impact the ability to compare the results across seasons. Taylor and Krevisky used 200 at-bats or 250 plate appearances, adding 5% to these numbers in the Expansion Era. James used 300 PAs as a cut-off. Cottone & Wirchin appear to have used the current batting title criterion of 3.1 plate appearances per team game played. Gould and Szymborski (2020) do not indicate their cut-offs.11
This study uses the current criteria defining the minimum number of plate appearances for a hitter to qualify for the batting title (3.1 plate appearances per game) and the minimum number of innings pitched by a pitcher (1 inning per team game) to qualify for the ERA title. Doing so excludes part-time players and pitchers from hitting datasets, so the focus is on the “regular” batters and pitchers. Although these criteria have changed over the decades, using consistent criteria allows for apples-to-apples comparisons between league-seasons. This is especially important for the shorter seasons in baseball history. For example, Negro League seasons were much shorter than contemporaneous AL/NL seasons, and the numbers of games played by teams within a given league often varied widely. An arbitrary cutoff of 100 plate appearances was applied to Negro League hitting data.
There are 17,569 hitter-seasons and 11,189 pitcher-seasons that meet these thresholds in the 538 league-seasons evaluated. SDs were calculated for each league-season for the nine statistics evaluated from which. Z-scores were derived for each player.
What does a typical dataset look like? Whereas a classical normal distribution is centered at the mean, the datasets analyzed for this study are skewed above the mean. For instance, Figure 1 shows the distribution of z-scores for batting averages from the National League in 1987. The result is a classic-looking bell curve, but one that is shifted to the right of the mean. The uptick at the far right of the curve is Tony Gwynn, whose all-time great season is discussed later in the paper.
This shift to the right is fully expected, and is present for all the databases analyzed in this study. Regulars are regulars because they are the best available players at their position. Lesser players and pitchers pull down the league average. The 56 regulars in the 1987 NL database batted a combined .284, well above the NL batting average of .261, and 0.94 SDs above the mean.
With the normal distribution shifted to the right, one could expect the probabilities of events to shift as well. In fact, it is much more common for data derived in this study to be 2 SDs and 3 SDs above the mean than the probabilities shown in Table 1 suggest. Data 4 SDs above the mean still are special, and data 5 SDs or more above the mean signify extraordinary events. This study uses 4 SDs as an arbitrary benchmark to signify an extraordinary achievement. The top 10 results are highlighted and discussed for databases with fewer than 10 datapoints at or above 4 SDs.
The next sections analyze the z-scores for the four hitting averages, starting with batting average (the metric covered most extensively in previous studies), followed by z-scores for home runs and RBIs.
BATTING AVERAGES
Table 2 lists the players with BA z-scores at or above 4.00. Surprises on the list include Harry Walker, whose .363 batting average in 1947 was a whopping 99 points above the NL average, and Ralph Garr, whose .353 batting average in 1974 was 98 points above the NL average.
Nap Lajoie (1901) and Tip O’Neill (1887) are the only .400 hitters on the list. Between 1902 and 1948, when most of the .400 averages were recorded, no hitters attained z-scores at or above 4.00. Ty Cobb had a 3.45 batting average z-score in 1911, when he batted .419. Rogers Hornsby’s .424 average in 1924 yielded a z-score of 3.75, while Ted Williams’ .406 average in 1941 yielded a z-score of 3.76. Due to the league’s lower batting average and SD, his .356 batting average in 1942 was a greater accomplishment, with a z-score of 3.90.
The early- to mid-1960s had overall low batting averages, but few individual hitters were able to post high batting averages during this period, with the notable exception of Norm Cash in 1961. In the 50 ensuing years, many players have attained high batting average z-scores due to relatively low batting averages and low SDs.
Tip O’Neill’s batting average z-score from 1887 is one of only three z-scores from the nineteenth century at or above 4.00 for the four averages evaluated in this paper. The others are O’Neill’s SLG z-score from the same season (see Table 4) and John McGraw’s OBA z-score in 1899 (see Table 3). Hitting datasets from the nineteenth century have much higher SDs than post-1900 datasets (other than datasets from the Negro Leagues, see below). Because of the higher SDs, it is much rarer for a hitting average datum from the nineteenth century to obtain a very high z-score.
A .400 batting average? Almost impossible in today’s game. Luis Arraez’s .354 batting average in 2023 yielded a z-score of 3.86. He would have needed a z-score of 5.08 to bat .400 that season. For hitters to have a reasonable chance to reach that lofty peak again, the tide would have to lift all boats. That is, league batting averages would have to be much higher, perhaps .280 or better.
The lowest batting average z-score is –2.74 by Jim West in 1931, who batted .133 for the Cleveland Cubs of the Negro National League (NNL) when the league average was .266. The lowest in modern times outside of the Negro Leagues is Chris Davis’s –2.59 in 2018, when he batted just .168 for the Baltimore Orioles.
ON-BASE AVERAGE
Barry Bonds occupies the top three slots on Table 3, which lists all OBA z-scores at or above 4.00. In 2004, his .609 on-base average set the all-time record as well as the all-time record for a z-score of any statistic evaluated in this study. More than six SDs from the mean, it is baseball’s equivalent of a unicorn, an event so extraordinary that it is unlikely to ever happen again. Even Bonds’ incredible OBA z-score of 5.31 in 2002 pales in comparison.
The other player dominating this list is Ted Williams, who has 5 of the 15 listings. Arky Vaughan is the only other player on the list between 1899 and 1961, a period of high average OBA SDs. Low OBAs in the 1960s opened the door for Norm Cash and Mickey Mantle, and an overall decrease in OBA SDs, especially since 2007, resulted in a high OBA z-scores for Juan Soto and Aaron Judge. Judge’s epic 2024 season also places him on the all-time lists for SLG and OPS (Table 4).
Missing from Table 3 are Billy Hamilton, Rogers Hornsby, Joe Kelley, Hugh Duffy, and Ed Delahanty, who posted OBAs above .500, but did so in years with high average league OBAs. The lowest OBA z-score is Germany Smith’s –2.54 in 1897, who ran a paltry .233 OB for the Brooklyn Grooms.
SLUGGING AVERAGE
Table 4, which lists all SLG z-scores at or above 4.00, features many familiar names. Hitters from every era are on this list. Babe Ruth appears five times, with four seasons each for Ted Williams and Barry Bonds. Honus Wagner, Rogers Hornsby, and Aaron Judge are on the list twice. Shohei Ohtani’s spectacular 2024 campaign places him fourth on this list and sixth on the OPS list (see Table 5).
Ironically, Ruth’s most famous season, 1927, does not appear on the list. Ruth and Lou Gehrig both ran top-10 all-time slugging averages, but their tandem exceptional seasons raised the SD for the AL, resulting in respective SLG z-scores of 3.85 and 3.78. Nick Punto’s .291 SLG for the Twins in 2007 netted him the lowest SLG z-score (-2.36) in baseball history.
Table 4 has two entries from the Negro Leagues: Josh Gibson in 1943, and Oscar Charleston in 1924. Both players also appear on the home run list (see Table 6, below). As with nineteenth century data, Negro League hitting datasets have much higher SDs than other contemporary major league datasets. Because of the higher SDs, it is much rarer for a hitting datapoint to obtain a very high z-score, making the achievements of Gibson and Charleston all the more extraordinary.
Table 5. Hitters with OPS Z-Scores at or Above 4.00
OPS
As with slugging average, the list of top OPS z-scores features the usual suspects in Bonds, Ruth, and Williams, who together occupy 18 of the 32 slots (see Table 5). Honus Wagner and Aaron Judge are the only other players who appear more than once.
Each player’s OPS+ is included for comparison. Since 1893, 51 hitters have attained an OPS+ of 200 or higher (not including the Negro Leagues). Most of the hitters on Table 5 have OPS+ scores above 200 and only 2 have OPS+ scores below 190, a good correlation. But the top OPS+ scores differ from the top OPS z-scores, as noted in the introduction. Not only is Bonds’ OPS z-score in 2004 the highest in history, it is in a class by itself. He owns all three z-scores calculated for the four averages (batting, on-base, slugging, and OPS) that exceed 5.00.
It is worth noting that Fred Dunlap, owner of the third highest OPS+ in baseball history, is not close to making this list. His OPS z-score of 3.59 in 1884 is due to the Union Association’s high OPS SD.
The lack of a park factor is the cause of some of the differences between the above OPS z-score list and the OPS+ list. For instance, Larry Walker’s dominant 1997 put him on the SLG list, but the large park factor adjustment applied to Colorado Rockies players lowered his OPS+ to a very good but not extraordinary 178 OPS+.
In 1932, Fermin Valdés of the Pollock’s Cuban Stars in the East-West League (EWL) posted an anemic .452 OPS, leading to an all-time low OPS z-score of -2.34. The lowest OPS z-score post-1893 and outside of the Negro Leagues is Tim Johnson’s -2.30 with the Milwaukee Brewers in 1973.
HOME RUNS
Unsurprisingly, Babe Ruth dominates the list of home run z-scores in Table 6. He accounts for nine of the 20 listings. His run of superior z-scores began in 1919 and continued into the late 1920s, indicating that it took the rest of the American League a decade to even begin catching up to the Bambino.
There are five home run z-scores above 5.00, more than the other eight statistics combined. A slugger before slugging became a valid strategy for winning ballgames, the all-but-forgotten Buck Freeman joins Bonds and Ruth as the only hitters to exceed 5.00. Freeman’s 25 home runs in 1899 more than doubled the production of the runner-up in that category and exceeded the totals of four teams. Pre-1920 sluggers Gavy Cravath, Tim Jordan, Jimmy Collins, Wally Pipp, and Harry Davis also posted double-digit home runs totals before such achievements became common.
Ned Williamson, who held baseball’s home run record for 35 years, is not on this list. His 27 home runs in 1884 yield a z-score of 3.64. Aided by short fences in his home ballpark, he and five of his teammates reached double digits that season, raising the league average and SD and lowering their z-scores. Also coming short are Roger Maris in 1961 (3.00), Mark McGwire in 1998 (3.64), and Barry Bonds in 2001 (3.42), due to the high level of home run production in their leagues.
Aaron Judge’s home run z-score for 2022 is the only entry on this list from the last 90 years. Even though plenty of home runs were hit in the AL that season, Judge’s production greatly exceeded the other players.
The lowest HR z-score (-2.37) is owned by Yolmer Sanchez, who hit but two home runs in 555 plate appearances for the White Sox in 2019. It stands in sharp contrast to the high average number of home runs (25.7) hit by regular AL players that season.
RUNS BATTED IN
Table 7 lists the top RBI z-scores since 1893. No hitter has achieved an RBI z-score at or above 4. Driving in runs did not undergo the same revolution as did home runs, nor have there been stand-outs through the ages as there were for the four averages analyzed for this study.
All 10 seasons came before 1938, indicating that it has become increasingly difficult to stand out. Cap Anson posted RBI z-scores above 3.50 four times, all before 1893. By the late 1930s, each league had many run producers. Cecil Fielder’s 132 RBIs in 1990 yielded an RBI z-score of 3.26, the highest mark in the AL since 1935. Mike Schmidt’s 91 RBIs in the strike-shortened 1981 season yielded an RBI z-score of 3.20, the highest in the NL since 1937. Hack Wilson’s record 191 ribbies in 1930? A very high but unextraordinary z-score of 2.93 due to the offense-saturated season when it occurred.
The lowest RBI z-score after 1893 belongs to Jemile Weeks, whose 20 RBIs in 511 PAs for the Oakland Athletics in 2012 resulted in a z-score of –2.49.
USING Z-SCORES TO ASSESS PITCHING PERFORMANCES
Similar analyses were performed on the two basic pitching statistics, ERA and strikeouts, and one less common statistic, strikeouts per nine innings.
EARNED RUN AVERAGE
Table 8 contains the 10 lowest ERA z-scores in baseball history (negative ERA z-scores are better because lower ERAs are better.) Just one pitcher has an ERA z-score four SDs from the mean: Pedro Martinez in 2000. Martinez’s record 291 ERA+ merely hints at his dominance in 2000, when he posted a 1.74 ERA, barely more than a third of the AL’s 4.91 ERA. His ERA z-score is more than 1 SD above all but two ERA z-scores, those of Zack Greinke in 2009 and Martinez himself in 1999.
Among the pitching achievements that just miss this list are Greg Maddux’s 1995 season (-3.11), Dwight Gooden’s 1985 season (-3.10), Bob Gibson’s 1968 season (-2.97), and Ron Guidry’s 1978 season (-2.96). As the number of pitchers qualifying for the ERA title has trended down, it has become easier for pitchers like Hyun Jin Ryu in 2019 and Blake Snell in 2023 to record impressive ERA z-scores.
Unique among the nine performance statistics analyzed in this paper, there are seven ERA z-scores that are more than three SDs below the mean. The highest (i.e., worst) is 3.56 by Bill Rotes of the Louisville Colonels in 1893. The highest ERA z-score in modern times outside of the Negro Leagues is 3.20, set by Jose Lima in 2005 when he posted a 6.99 ERA.
STRIKEOUTS
In contrast to the ERA z-scores, Table 9 shows the 11 strikeout z-scores above 4.00. Randy Johnson has three of the listings, but it is Dazzy Vance who really stands out. In 1924, he recorded 262 punchouts when just one other NL pitcher recorded more than 86 (teammate Burleigh Grimes, with 135). His strikeout total was quadruple that of the average qualifying pitcher, and his SO z-score of 5.13 is arguably the greatest season pitching achievement in baseball history.
While Jim LaMarque of the Kansas City Monarchs had a strikeout z-score above 4.00 in 1947, this achievement is qualified, because his record contains 50% more innings than any other pitcher in the Negro American League (NAL) that season. His strikeout rate that season didn’t even lead the league, so LaMarque’s z-score is more a reflection on the number of innings he threw than his ability to punch out hitters.
Ty Blach’s strikeout z-score of -2.41 in 2017 is the worst in baseball history. He struck out 73 hitters for the San Francisco Giants (which would have been an above-average haul in 1924).
STRIKEOUTS PER NINE INNINGS
Earlier in baseball history, when three- and four-man rotations were common, it was not unusual for pitchers to throw double the innings needed to qualify for the ERA title. Because such large variations can play a major part in the strikeout totals for a pitcher, it is more illuminating to evaluate their rate of strikeouts as opposed to their total strikeouts. Table 10 lists the pitchers with the highest z-scores based on strikeouts per nine innings pitched.
Using this metric, Dazzy Vance’s accomplishments stand out even more. Although he is the only pitcher in this table who never struck out 300 hitters in a season, he owns the two highest z-scores, and a whopping four positions among the top six. His 1924 campaign again easily tops the list. While not nearly as heralded as the others on this list, these data make a case for Vance being the greatest strikeout artist in the history of the game.
CONCLUSIONS
Z-scores provide a statistically rigorous, yet simple way to compare the performances of players throughout the history of baseball. They can be calculated for any hitting or pitching statistic, be it a raw statistic, an average, or even a sabermetric statistic such as WAR. Even though this analysis focused on players who qualified for their league’s batting or ERA titles, z-scores can be calculated for any player, even if they played in just one game. Career z-scores can be calculated by adding the weighted averages of the z-scores from each of their seasons.
Z-scores can be calculated for any other sport, or any other database amenable to statistics, such as economic and demographic data, as demonstrated in Cottone and Wirchin. Who was greater in their respective sport, the Bambino or the Great Gretzky? Was it more likely that Dazzy Vance would punch-out 262 hitters in 1924, or that you will live to the age of 110? Z-scores can provide the answers.
BENJAMIN ALTER has been a member of SABR’s Elysian Fields Chapter since 2019. Prior to retiring in 2021, he was a principal and shareholder at an environmental engineering firm. When not indulging his passion for baseball, Ben may be found singing and/or playing the piano, working out at the gym, or writing articles for charitable organizations.
Sources
Schall, Edward M. and Smith, Gary, “Do Baseball Players Regress to the Mean?” The American Statistician, 54, no. 4 (2000), 231–35.
Zimmerman, Jeff and Bell, Tanner, The Process (Self-published, 2004).
Notes
1. Baseball-Reference.com, last accessed January 23, 2025.
2. This paper uses statistics from the so-called PED Era at face value, without making any subjective evaluation regarding their validity.
3. John G. Cottone and Jason Wirchin, Z-Score: How a Statistic Used in Psychology Will Revolutionize Baseball (Story Bridge Books, 2016).
4. Randy Taylor, and Steve Krevisky, “Using Mathematics and Statistics to Analyze Who are the Great Sluggers in Baseball,” Seventh International Conference on Teaching Statistics Proceedings, 2006, https://iase-web.org/documents/papers/icots7/C432.pdf?1402524967.
5. Bill James, “Deviants At Work,” Bill James Online, June 11, 2013.
6. Gregory A. Colston Jr., “The Best Baseball Season Ever? A Triple Crown Perspective,” 2011, https://docslib.org/download/7954955/the-best-baseball-season-ever-a-triple-crown-perspective-gregory-a.
7. Kevin Schulman, “Data Analysis 101: The Z-Score is Your Friend,” The Agitator, August 10, 2022, https://agitator.thedonorvoice.com/data-analysis-101-the-z-score-is-your-friend/.
8. Dan Szymborski, “Toppling Ted: The 60-Game Season and the .400 Batting Average,” FanGraphs, July 21, 2020. https://blogs.fangraphs.com/toppling-ted-the-60-game-season-and-the-400-batting-average/.
9. Ariel Cohen, “Attacking Offensive Categories with Z-Scores,” RotoBaller, 2020, https://www.rotoballer.com/fantasy-baseball-adp-values-using-z-scores/753041.
10. Lucas Kelly, “Creating Your Rankings? Start with Z-Scores,” RotoGraphs, December 31, 2021, https://fantasy.fangraphs.com/creating-your-rankings-start-with-z-scores/.
11. Stephen Jay Gould, “Why No One Hits .400 Any More,” Triumph and Tragedy in Mudville (New York: W.W. Norton & Company, 2003), 151–72.
12. Negro League statistics are incomplete, especially for the NAL. Therefore, raw quantities often do not accurately reflect the performances of the players.