SABR

Does “Game Score” Still Work in Today’s High-Offense Game?

By Jeff Angus

This article was published in the Summer 2010 Baseball Research Journal.

When Bill James first made his Game Score widely public in the Historical Baseball Abstract (1988), he humbly called it a “garbage stat.” He did feature a three–page essay on it and sprinkled it about that book, his last Abstract. Since then, it’s been broadly used, but only shallowly, as though through his description of it (“my annual fun stat, a kind of garbage stat that I present not because it helps us understand anything in particular but because it is fun to play around with”) he has painted it a dull grey and buried the technique in the bottom of our cluttered toolboxes. 

The real value of the Game Score tool is different from what its inventor claimed. It was an astoundingly useful measure that, while it didn’t come anywhere close to describing everything you need to know about pitching, described something critical at the time and, importantly, was accessible to casual fans. 

James revealed the Game Score (GS) stat using data from the 1987 season to illustrate its use. The question I can answer for you easily is “Given all the changes in the decades since then, does the stat hold up as an indicator?” 

Jake Peavy: In 2007, Padres ace led MLB with 27 Game Score Wins, against only 7 Game Score Losses.Jake Peavy: In 2007, Padres ace led MLB with 27 Game Score Wins, against only 7 Game Score Losses.The answer, surprisingly, is “Yes. Unequivocally.” 

I’ll show you the particulars and explain why GS seems to hold up through pitching-rule changes, mutation of the ball, and the construction of new, mostly cozier ballparks that have led to what is popularly felt to be a hitter’s era. 

Later in this article, I’ll explain why GS is truly a significant measure that shows off the inventor’s brilliance and is something we should pay more attention to.

WHAT IS GAME SCORE AND HOW DO WE USE IT? 

Game Score is a measure of a starting pitcher’s performance, one that synthesizes the value of both a start’s quantity and its quality. 

The only widely distributed competitor is Quality Start, a binary (“yes,” it was a quality start; “no,” it wasn’t) measure. The Quality Start had a noble purpose: to free the starting pitcher from the oppression of the traditional won–lost record. And in its defense, it is simple to “measure”—a start of at least six innings where the pitcher gives up three or fewer earned runs is a Quality Start.

Its limits, though, are too constraining. The binary nature of the QS eliminates spectrum or shading. Further, the baseline for what constitutes a Quality Start should have been updated for the changed playing environment since MLB apparently juiced the ball after the 1993 season. (Since then, the average start is shorter and yields more runs on the average but still maintains an equal probability of helping the team win the game.) Game Score is more nuanced and useful. 

Bill James cleverly calibrated Game Score to a scale of 0 to 100 points, with 0 points being roughly the worst start a pitcher could have, 100 being the best, and 50 being the “average”. 

Graph 1 shows the frequency distribution for each Game Score, from 0 to 100, for the 2007 season. The peak incidence of Game Scores is between about 42 and 65, with more below than above. James has shown this shape to be pretty normal for distributions of baseball accomplishment. 

You may notice there are several Game Scores below the theoretical floor of zero. The low-end extremity for the 2007 season is a –12 delivered by Milwaukee’s promising Yovani Gallardo in an August 8 start against the Rockies in Denver. 

Brewers IP H R ER BB SO
Gallardo (L 4–2) 4.2 12 11 11 3 1

 

The high end was a 98 notched by Erik Bedard for Baltimore against the Rangers in Texas on July 7. 

Orioles IP H R ER BB SO
Bedard (W 7–4) 9 3 0 0 0 15

 

James cleverly set up GS so that someone who had mastered sixth-grade math could compute it from a scorecard or a box score in about 15 seconds. GS is accessible because it doesn’t require long division or decimal math, unlike ERA, which does. And, again, you can get all the components from a newspaper box score—such as the following pitching line (which I’ve truncated, leaving in only the items you’d use) for a roughly average start. 

Rays IP H R ER BB SO
Shields (ND 2–2) 7.0 6 4 2 1 2

 

Here’s the fastest way to compute it. Each game starts at 50 (which we’ll roll in as the last piece of computation). 

• +1 point for each out recorded (3 points for each full inning, 1 point for each additional third of an inning). In the example, 7 x 3 = 21 points. 

• +2 points for each full inning completed after the fourth inning. In the example, three full innings, 3 x 2 = 6 points. 

• –2 points for each hit surrendered. In the example, 6 x –2 = –12 points. The sum of strikeouts minus walks (usually a positive number you add, sometimes a negative number you subtract). In the example, 2 – 1 = +1 point. 

• –4 points for each earned run and –2 points for each unearned run surrendered. In the example, 2 x = –4 = –8 for the earned runs, and 2 x –2 for the other, not earned, run, so –4, added up to –12. 

• +50 as the baseline the pitcher starts with.

 

So, in the preceding Shields example: 21 + 6 equals 27 for the length of the start; minus 12 for the hits equals 15; plus 1 for the strikeouts minus walks equals 16; minus 12 for the runs earned (and not) equals 4. Add 50 for the starting threshold, and the Game Score is 54, what James designed the system to describe as a start a bit above average. This GS number, as you will see later, argues that Shields’s start was above average in several ways, and the GS more closely measures the value of his start than ERA does (which, at 2.57 for the game, probably overstates his contribution), or won–lost record, which, at 0–0, screams an existential nothingness about Shields’s effort. 

GAME SCORE AS USED TODAY 

Today, Game Score is applied too infrequently. 

Unlike a lot of the other sabermetric stats that James and other researchers such as Dick Cramer and Pete Palmer invented or brought to public attention, GS hasn’t been internalized into the warp and weft of fan or researcher discussions. None of the half–dozen major-league organizations I’ve discussed pitching with seem to use it for much. James, I discovered after I finished the research for this study, uses it consistently in his annuals, The Bill James Gold Mine, though without basing a lot of significant observation on it. Researchers Mike Webber, Steve Treder, Rich Lederer, and Dan Fox have made concrete mentions. Sean Forman (who crafted a beautiful raw dataset for me to work from for this study) presents GS as part of the exhaustive game lines for pitchers on game-log pages of his incomparable Baseball Reference site. But no one I can find has made an effort to promote or headline a starting pitcher’s contribution to the team by using GS as a significant (and easy to “get”) starting point. 

My investigation has led me to the conclusion that GS reveals enough about a pitching start that researchers should explore it further—not just for other researchers but as a tool we can broadcast to the larger, less sabermetric population. 

HOW HAVE GAME SCORE’S AVERAGE RESULTS AND TAILS CHANGED WITH MLB’S OFFENSIVE EFFLORESCENCE? 

Remarkably, almost not at all. The changes have been small.1 

Year Mean Average Game Score
1987 49.2
2007 48.3

 

The change over the past 20 years for the mean average GS has changed less than a single point. The median Game Score in MLB for the 2007 season was 49. 

So it’s consistent over time. But what makes it a good stat to proliferate beyond the stathead tribe? I believe there are three prerequisites for deciding which statistics are worth trying to popularize. 

  • The stat should mean something significant. 
  • The stat should retain the meaning of its numbers when you apply it in varying contexts (such as league and season). 
  • A stat one should try to popularize should not require the 50th-percentile fan to use a calculator. 

Further, the measure shouldn’t require adjustment by a pro, like Pete Palmer’s very valuable (but impossible to popularize) Adjusted Batting Runs or James’s own flotilla of Runs Created formulæ (about two dozen of them) that try to contextualize meaning over differing playing environments. 

Game Score, contrary to being a “garbage stat,” nails all three prerequisites. 

HOW MUCH HAVE GAME SCORES VARIED IN THE LAST 20 SEASONS? 

It’s fine to show that two seasons, 20 years apart, have a similar set of averages. But the average of any serious stat is never a truth in itself; the average is not the reality, though the average may illustrate a tiny facet of the reality. 

Johan Santana: In 2007, Twins ace led the American League with 26 Game Score Wins, against only 7 Game Score Losses.Johan Santana: In 2007, Twins ace led the American League with 26 Game Score Wins, against only 7 Game Score Losses.(An interesting aside of marginal relevance: For all the whining about the diminished endurance of starters, the numbers indicate that, while innings pitched per start is going down a little, the number of batters faced is essentially the same. See the chart in appendix C. It shows that plate appearances (batters faced) per start has gone down 1 per start in the past 20 years. A difference of 1 batter per start since then––when Michael Jackson was “Bad” and Twisted Sister was a hot ticket––and now.)

Before we explore how much Game Scores have varied over the years, it’s important to mention that the 1987 season, the one Bill James had as a backdrop for his tweaking the measure and sharing it, was an outlier itself. In 1987, there was an offensive uptick fueled by a homerun explosion. Sluggers like Kent Hrbek and Wally Joyner set their career highs in taters. So did more contact- oriented batters. Wade Boggs’s 24 home runs were more than twice his second-highest seasonal output. In between, gents such as Juan Samuel, whose 28 home runs that campaign eclipsed his second-most prolific season of 19 round trips, joined the Pounder’s Parade.2

But let me show you a chart that shows the rough variation of the individual components. It’s “rough” because one of the factors that shapes an individual Game Score result is the number of full innings from the fifth inning on that a pitcher labors. The following chart is a composite average: all starters’ stats combined, divided by the total number of starts. It is, therefore, not precise in cases where there’s a wide divergence in the distribution of outs recorded.3 We cannot derive the precise average Game Score from the average innings pitched per start because of the bonus for innings completed from the fifth on. 

But the numbers are close enough to be strong indicators of change in the composite GS and in all the measures except for the fifth-inning-on bonus. The fifth-inning-on bonus presented here is the composite, and therefore only an estimate. 

GS Points Difference from 1987
Year TOTAL Outs IP>4 H BB K ER UER
2007 –0.3 –0.3 –0.6 0.0 0.2 0.2 0.00 0.2

 

Negative numbers indicate an erosion of Game Score component averages. Positive numbers have raised the average GS since 1987. 

As you can see, none of the components varies even by a full point. The changes between the composite average in 1987 and 2007: 

  • Shorter average outings by starting pitching shaved about a point (.9) from composite average Game Score. 
  • More strikeouts per start and fewer walks per start added about half a point (.6) to composite average Game Score. 
  • A lower number of unearned runs added a soupçon (.2) to composite average GS. 

There’s a legitimate argument that an exceptional baseline year, such as 1987, is a bad foundation because comparing a baseline with extraordinarily high offense to a year such as 2007, which was normal for a big offensive era, is going to dampen larger differences. So, what about 1988 (a particularly good year for pitchers) or 1994 (powered by probably the liveliest ball since the 1950s)? 

Both years were extreme within the evolving norm for Major League Baseball. And both years varied noticeably from Game Score norms since 1987. But neither varied by enough to render Game Score a stat that needs a proliferation of special variants to make GS deliver the thumbnail results it aims to produce. 

More slugging appears to have led to harder swinging, which, apparently, increased strikeouts per starter inning while diminishing walks per starter inning. And while gross numbers of hits have gone up, outings by starters have also become slightly shorter, offsetting to some degree the effects of the increase in strikeouts and decrease in walks. 

I also believe (but have no numbers to support) that management and coaching tend to counter the kinds of trends that have mutated the game since 1987. In an environment where homers are more prevalent—which, in turn, makes walks more costly— pitching coaches develop tactics to help their charges diminish walks. They invest more in studying ways to limit exposure to homers. And pitchers who are walkor homer-prone are marginally less likely to be drafted or invested in once drafted. The game, in sum, is an evolving system with some gravitational fields that tend to counteract disruptive trends. 

Whether you agree with that last supposition or not, you can see from the following table just how minor the changes to components of composite average Game Score have been in the past 20 seasons. 

GS Points Difference from 1987
Year TOTAL Outs IP>4 H BB K ER UER
1988 2.1 0.3 0.6 0.1 0.2 –0.1 1.1 0.1
1989 1.7 0.1 0.2 0.2 0.1 –0.2 1.4 0.0
1990 1.3 0.0 0.0 0.2 0.1 –0.2 1.2 0.1
1991 1.3 0.0 0.0 0.2 0.1 –0.1 1.0 0.1
1992 1.8 0.1 0.2 0.1 0.1 –0.2 1.3 0.1
1993 0.3 0.0 0.1 –0.1 0.1 –0.2 0.3 0.1
1994 –0.4 0.0 0.0 –0.2 0.0 0.1 –0.4 0.1
1995 –0.3 –0.2 –0.3 0.0 0.0 0.1 0.0 0.0
1996 –0.9 –0.1 –0.2 –0.1 0.1 0.2 –0.6 0.0
1997 0.3 –0.1 –0.2 0.0 0.1 0.3 0.0 0.1
1998 0.0 0.0 0.0 –0.1 0.1 0.4 –0.3 0.1
1999 –1.3 –0.2 –0.4 –0.1 –0.1 0.1 –0.8 0.1
2000 –1.3 –0.2 –0.3 –0.1 –0.1 0.2 –0.9 0.1
2001 0.0 –0.2 –0.3 0.0 0.2 0.3 –0.1 0.1
2002 0.4 –0.2 –0.3 0.2 0.1 0.2 0.3 0.1
2003 0.0 –0.2 –0.4 0.1 0.2 0.1 0.1 0.1
2004 –0.3 –0.2 –0.4 0.1 0.2 0.2 –0.1 0.1
2005 0.6 –0.1 –0.2 0.0 0.3 0.1 0.3 0.2
2006 –0.5 –0.3 –0.5 0.0 0.2 0.1 –0.2 0.1
2007 –0.3 –0.3 –0.6 0.0 0.2 0.2 0.0 0.2

 

Game Scores were higher in 1988, as the leagues sacrificed some hitting for pitching. But 1988 shows the biggest divergence in average Game Score from 1987 since, well, 1987: 2 points per start. After 1993, the average GS started coming down, but since 2001 it’s been hovering in a narrow range, with variation affecting average GS being under a single point.

The average Game Score for starters in a season has been stable, certainly stable enough to validate GS as a great tool to describe the performance of starting pitchers through changing contexts.

IT’S EASY TO COMPUTE AND CONSISTENT ENOUGH OVER TIME, BUT WHAT MAKES IT SIGNIFICANT ENOUGH TO PAY ATTENTION TO? 

The Game Score proves to be a magnificent indicator of the most important thing a starting pitcher can do: Give his team a chance to win the ballgame. Bill James knows this. (He’s written about it, tangentially.) Everyone else I cited who uses GS knows this, I think. But they don’t follow up and apply that knowledge broadly. 

Let me make the argument for the significance of GS this way. A starting pitcher’s Game Score correlates remarkably well with ability of the starter’s team to win. That is, if you chart the winning percentage  for major-league teams at each Game Score, you see the correlation between the starter’s GS and the team’s likelihood of winning—the higher the GS, the greater the probability the team will win the game.

A team win is baseball's most basic currency. Anything a player does that increases the probability of his team winning is adding value.

The following chart reflects this distribution, showing the winning percentage for all games pitched by a starter at each Game Score. You’ll find the raw data for the chart as appendix A, at the end of the article. 

When James wrote his 1988 essay, he presented a table that showed the strong correlation between the starter’s Game Score and the team’s winning percentage. He used 10-point ranges to chunk the information. While this is sensible as a first cut at examining the results, I find his ranges more arbitrary than what he would have crafted were he looking for deeper significance. (With ranges such as 60–69 and 50–59, a game score of 62 is batched with a 68, six points away, but not with a 59, three points away.) I followed the pattern of his table for the 2007 data to show similarities and differences using his chosen ranges; they appear in their entirety as appendix B. Note, in the subset of that chart (see below), that every 10-point range features a higher win percentage than does the range below it, with one exception (the two bottom ranges for 1987, representing a small number of cases). The biggest difference worth noting between the 1987 and 2007 numbers is that Game Scores from 40 through 49 were 4 percent more likely to generate a win for the starter’s team in 2007 than in 1987. 

1987 Team Win % Range 2007 Team Win %
93% 90–99 100%
93% 80–89 93%
84% 70–79 82%
73% 60–69 72%
58% 50–59 60%
42% 40–49 46%
26% 30–39 28%
20% 20–29 18%
10% 10–19 10%
23% Up to 9 4%

 

MEANINGFUL RANGES FOR GAME SCORES 

I think moving ranges are more flexible and show that, even when more finely graded, the correlation between increasing GS and increasing team wins holds up. While James shows fixed 10-point ranges, I’ll show you that the relationship between GS and team wins is clear even in more graduated pieces. 

The table below shows the percentage of games a team won in 2007 with a specific Game Score plus or minus 2. For example, the row that shows a game score of 49 +/–2 shows the win percentage for a team whose starter notched a GS of 51 through 47, while 50 +/–2 shows win percentage for a team that had a starter with any GS of 52 through 48.4 

It’s worth noting that, in the previous table, while James originally hoped to design a measure where a Game Score of 50 would win 50 percent of the games for the starter’s team, even at 47 +/–2 a team will win a little more than that. A glance at appendix A will confirm it’s not a fluke of the +/–2, but the rawer numbers show scores as low as 46 being good enough to support a team’s winning more than 50 percent of the time. 

SO IF IT IS IMPORTANT, HOW SHOULD WE USE IT?

Bill James himself has suggested using the tool to adjust one’s perception of a starter’s seasonal won–lost record. When he first wrote about the measure—back when Dirty Dancing blew away the box-office numbers for Gone with the Wind and Buddy Biancalana celebrated his age-27 season with an OPS+ of 3—James suggested tracking two stats derivative of GS, Tough Losses and Cheap Wins. He suggested a Tough Loss was a game where a starter posted a Game Score of 50 or better but got the loss, and a Cheap Win was a start where he got the victory with a GS under 50.

Percentage of Games Won by Team When Starter Has Specific GS Ranges, 2007
GS range Win %   GS range Win %   GS range Win %
75+ 89%   55 +/˗ 2 61%   35 +/˗ 2 27%
74 +/˗ 2 82%   54 +/˗ 2 58%   34 +/˗ 2 27%
73 +/˗ 2 80%   53 +/˗ 2 56%   33 +/˗ 2 24%
72 +/˗ 2 79%   52 +/˗ 2 56%   32 +/˗ 2 25%
71 +/˗ 2 80%   51 +/˗ 2 56%   31 +/˗ 2 26%
70 +/˗ 2 79%   50 +/˗ 2 56%   30 +/˗ 2 23%
69 +/˗ 2 78%   49 +/˗ 2 54%   29 +/˗ 2 19%
68 +/˗ 2 78%   48 +/˗ 2 54%   28 +/˗ 2 18%
67 +/˗ 2 78%   47 +/˗ 2 51%   27 +/˗ 2 17%
66 +/˗ 2 77%   46 +/˗ 2 50%   26 +/˗ 2 17%
65 +/˗ 2 77%   45 +/˗ 2 46%   25 +/˗ 2 19%
64 +/˗ 2 74%   44 +/˗ 2 46%   24 +/˗ 2 21%
63 +/˗ 2 72%   43 +/˗ 2 42%   23 +/˗ 2 21%
62 +/˗ 2 67%   42 +/˗ 2 40%   22 +/˗ 2 19%
61 +/˗ 2 65%   41 +/˗ 2 37%   21 +/˗ 2 17%
60 +/˗ 2 62%   40 +/˗ 2 35%   20 +/˗ 2 16%
59 +/˗ 2 62%   39 +/˗ 2 33%   19 +/˗ 2 12%
58 +/˗ 2 63%   38 +/˗ 2 33%   18 & less 8%
57 +/˗ 2 64%   37 +/˗ 2 30%    
56 +/˗ 2 62%   36 +/˗ 2 28%    

 

Every season is stuffed with instances of starters who pitch consistently well with poor run support and have losing records (or, at the other extreme, average less than 5 innings for 20 starts while yielding 4.3 runs each start and have an 8–7 record over them).5

What I like about James’s suggestion is its cleanliness. The break point at 50 seems logical, and it’s easy to remember. And any rational proposal to fix the popular misperception that the won–lost record of an individual pitcher holds a lot of insight into his quality is a worthwhile effort.

However, I don’t propose we make use of GS by proliferating these derivatives. For one thing, James knew even then that Game Scores of 50 would yield better than .500 results (as did a GS of 49). So I believe his anchor point is misplaced. In addition, I think there’s a middle band of Game Scores that should qualify for neither; a grey zone in the middle where the team’s game prospects are between “should be confident of winning” and “can expect to lose.” 

A DRAFT PROPOSAL 

I have a draft proposal for a season measure that’s a derivative of Game Score. 

The measure isn’t as tidy-looking as James’s break point at 50, and I think if I looked at the fine details of multiple years of game logs, I would probably tweak the break points. But here’s a straw man we can play with that delivers a new version of won–lost record, using Game Scores, that reflects each starter’s contributions to his team’s record better than do the won–lost records currently tracked. 

For the purpose of this proposal, I’ll give this measure the working name “Game Score Won–Lost” (GSWL, pronounced “Gaz–Wall”). Let me pitch how it works. 

For all starts where the pitcher earns a GS of 55 or higher, the pitcher earns a Game Score Win, recognizing a game where his start gave his team a clear chance to win, whether the team went on to win or not. 

For all games where the pitcher earns a GS of 43 or lower, the pitcher earns a Game Score Loss, a game where he set his team up to lose, whether they went on to lose or not. 

For the roughly one quarter of all starts that fall between (a GS of 54 through 44, what I’ll call “Game Score Tweeners”), split them down the middle into two halves. Assign one half to Game Score Wins and one half to Game Score Losses. If the Game Score Tweeners are an odd number, round the Wins half “up” and truncate the Losses half down. This is not capricious, it’s based on the fact that team winning percentage when the starter pitches a game that earns between 54 and 44 is .528, a little higher than even. 

Relation of Game Score to Team Wins, 2007
Game Score Ranges Team Win % Team Wins Team Losses Number of Games % of Games
55+ .728 1,365 511 1,876 39
54 to 44 .528 626 559 1,185 24
43– .244 440 1,361 1,801 37

 

Total up the Game Score Wins and Game Score Losses and you get a season measure that looks like a traditional won–lost record, which I like because it’s easy for the uninitiated to map to an existing measure they think they understand. 

IS GSWL FAIR? IS IT ACCURATE? 

I think Game Score Won Lost (GSWL) is fair in that a pitcher gets credit for a “win” in the cases where the team can expect to win about three quarters of the time, and he gets a “loss” in the cases where his performance puts the team in a situation where they can expect to lose about three quarters of the time. 

We could just as easily ignore the Tweeners as divvy them up, but I lean toward leaving them in. For one thing, if you do, a pitcher’s GSWL more accurately reflects the number of starts the pitcher has (an improvement over the traditional W–L system). Parsing the leftover Tweener games allows you to allow for the pitchers who consistently throw in the middle range (2007 Kyle Kendrick, average GS = 50, 9 Tweeners of 20 starts) and reveals some differences from those who throw a higher concentration of GS Wins and GS Losses with the same average GS (2007 Ubaldo “No, You–Baldo” Jímenez, Average GS = 50, 2 Tweeners of 15 starts). 

So a GSWL that reflects number of starts helps a reader better ascertain quantity along with quality.

Starter Starts 2007 Avg Game Score GSWL w/o Tweeners GSL w/ Tweeners
Jímenez 15 50 8–5 9–6
Kendrick 20 50 6–5 11–9

 

Kendrick labored more and achieved the same season average Game Score. With the Tweeners removed, the stat would broadcast that Jímenez worked a little more and achieved a more positive result for the season (same number of losses, a couple of more wins). With the Tweeners added in, Kendrick’s bigger workload is reified and he looks almost comparable on quality. And either is more informative than Quality Starts (Kendrick 13, Jímenez 9). 

Another cool side-benefit of including Tweeners is that the result delivers season won–lost counts that look more like twentieth-century baseball. There are 20-game winners using GSWL. Here are some final numbers for GSWL (with Tweeners) for the 2007 season, including all the 20-game GSWL winners.6

Starters, GSWL 20-Game Winners, 2007
Starter GSW GSL   Starter GSW GSL
Peavy, SD 27 7   Kazmir, TB 22 11
Santana, Minn 26 7   Escobar, LAA 21 9
Lackey, LAA 25 7   Vazquez, ChA 21 9
Haren, Oak 25 9   Lilly, ChN 21 10
Sabathia, Cle 24 9   Oswalt, Hou 21 10
Smoltz, Atl 24 8   Verlander, Det 21 10
Webb, Az 24 10   Francis, Col 21 13
Bedard, Bal 23 5   Harang, Cin 21 13
Beckett, Bos 23 7   Young, SD 21 8
Snell, Pit 22 10   Meche, KC 21 12
Penny, LAN 22 11   Zambrano, ChN 21 12
Shields, TB 22 8   Halladay, Tor 20 10
Cain, SF 22 9   R. Hill, ChN 20 12
Carmona, Cle 22 9   Maine, NYN 20 12
Hudson, Atl 22 10   Pettitte, NYA 20 14

 

This example shows that, while GSWL tends to confirm many preconceptions (the appearance of any of the top 9 on the previous table should be a surprise to no one who paid a lot of attention to the 2007 season), the measure allows us to find some overshadowed achievers. 

How can you not love the total justice of Matt Cain (median GS = 58) getting a GSWL of 22–9, the poor bustard having pitched in the top 15 percent of starters only to be embossed with a traditional W–L mark of 7–16, below even the dreaded Boom-Boom Beck Line. 

A measure is worthwhile only if it shows off accomplishment at both ends of the spectrum. The bottom of the GSWL table shows off prolific losers pretty well, I think. 

Starters, GSWL, Lowest Scores, 2007
Starter GSW GSL   Starter GSW GSL
Willis, Fla 16 18   Millwood, Tx 12 18
Gaudin, Oak 15 18   Olsen, Fla 12 21
Suppan, Mil 14 19   Belisle, Cin 11 19
Jackson, TB 13 18   Davies, –– 10 18
Morris, Pit 13 18   Eaton, Phi 9 20
Byrd, Cle 12 18   Perez, KC 7 19
Chico, Was 12 19        

 

There were only a pair of 20-game GSWL losers in 2007: the Phillies’ Adam Eaton (his 10–10 traditional W–L record was enhanced by the Phils’ ability to score in his starts, notching 3 or fewer runs in only 8 of his starts), and a surprising guest appearance by the Marlins’ Scott Olsen. Olsen’s 10–15 traditional W–L record was actually hiding some of the deficits in his overall game-by-game performance. Odalis “Friend of David Hasselhoff” Perez, GSWL 7–19 . . . proving, I think for all time, that, if you have really marginal stuff, “pitching to contact”7 is Russian roulette with six bullets. 

CONCLUSION 

I believe I’ve shown compelling evidence that supports my idea that Game Score is the single most useful measure that a broad range of fans can calculate in real time to gauge the value of a starter’s performance to his team in the most important measure of success: the team’s ability to win a game. 

  • Game Score is a finer measure than Quality Start and appears to keep its relationship to winning through more contexts. 
  • Game Score measures beautifully a starter’s ability to deliver an important goal: the likelihood of his team winning the game. 
  • Game Score is calculated through universally accessible components, and the calculations required are accessible to all. Despite James’s humble stance about his invention, I believe it’s got some serious applications, and I’d like to see us popularize it beyond the SABR community.

JEFF ANGUS, a baseball writer and management consultant, is the author of "Management by Baseball: The Official Rules for Winning Management in Any Field" (HarperCollins, 2005).

  • 1. I didn’t compute the 1987 mean. Bill James wrote it in his Game Score essay that appeared in the 1988 Baseball Abstract.
  • 2. And 1987 was the year the Houston Astros gave up their explosion-in-apaint-factory unis in favor of muted, corporate-dull duds; this fact has nothing to with the offensive surge.
  • 3. So if, for example, in 1987, starters averaged 6 1/3 innings per start but with low variation and with 12 percent lasting under 5 innings (where they would start racking up bonus points for innings completed), while in 2007 starters averaged 6 innings per start but 19 percent didn’t complete the fifth innings, but more starts went deeper so as to equalize the average, these numbers would diverge from actuals, which I can’t compute for 1987 because I don’t have the raw data.
  • 4. I combined all scores of 75 and above and all scores of 18 and lower because each GS in those areas is relatively scarce (each tail is 5 percent of the total) and there are many missing slots in those ranges.
  • 5. Horacio Ramirez, Mariners, 2007.
  • 6. Note that there were 30 starters who were GSWL 20-game winners, and there are 30 MLB teams. Coincidence, synchronicity or “Intelligent Design”? Only Carl Everett knows. Or merely thinks he does.
  • 7. Grant Sterling said of pitching to contact, “Pitching to contact has exactly the same record of success as appeasing Hitler.” There’s no vital reason to cite this except it’s one of my favorite recent baseball quotes, because Dr. Sterling is one of the smartest baseball minds I know, and because I suspect it’s true for a lot more pitchers than pitching coaches would like to think.
Individual Memberships start at just $45/year

Become A Member Today

When you join SABR you are making a statement of support for baseball history. You are joining a worldwide community of people who love to read about, talk about and write about baseball.