John Burgeson on the variability of baseball statistics

Editor’s note: The following is a series of posts made by longtime SABR member John Burgeson to the SABR-L listserv in November 2000. They were mentioned in a Grantland.com story by Bess Kalb on Tuesday, April 10 about his pioneering efforts to create a baseball simulation game on his IBM 1620 computer fifty years ago.

The inherent variability of statistics
By John Burgeson
November 13, 2000

Post 1:

Baseball depends on statistics; I understand that.

But, I assert, perhaps it does so by taking them much too seriously.

This will be the first post in a series in which I look at this idea in, perhaps, a different way.

Lets start with batting averages. One could choose any measure, but that one is the easiest to focus upon.

Playing “god,” I just created twenty (20) pretty fair baseball players and placed them on major league teams. This is their rookie year. I’ll make sure each gets exactly 200 at bats, and the rest of their stats are about average (I’ll not measure them).

Each time one of these 20 players comes to bat, I’ll roll the dice in such a way that he has EXACTLY three chances in ten of getting a hit, excepting walks and HBPs. When the season ends, how will they look?

Of course, many possibilities exist. I ran the exercise on my PC; here is how the season ended:

  • HART 0.256
  • IVES 0.268
  • LUCAS 0.280
  • JONES 0.284
  • NORTON 0.288
  • FITCH 0.292
  • MORRIS 0.292
  • BARNES 0.300
  • PARKS 0.300
  • ADAMS 0.308
  • TURNER 0.308
  • KELLY 0.316
  • QUINN 0.316
  • GARCIA 0.320
  • UTLEY 0.320
  • OGDEN 0.324
  • RIEG 0.332
  • CODY 0.340
  • SPEAR 0.344
  • DOWNS 0.348

Well — Downs got favorable mention for “Rookie of the Year” while Hart was not sure he’d have a job the next season. Yet there was ABSOLUTELY NO DIFFERENCE between Downs and Hart. None at all. Nada. Zip. Next post I’ll look at the following season.


Post 2:

I[n] continuing the process in which I am the “god” that built these 20 players, each of which has exactly 3 chances in 10 of getting a hit every time he comes to the plate (walks, errors, HBPs, etc. excepted):

All 20 of my players were picked up for season 2, and I controlled that season so that each man got exactly 400 at bats during the season. As might be expected, the variability was down — perhaps not as much as one might expect.

  • KELLY   0.270
  • IVES    0.278
  • GARCIA  0.280
  • MORRIS  0.284
  • TURNER  0.286
  • QUINN   0.288
  • NORTON  0.302
  • ADAMS   0.304
  • DOWNS   0.304
  • UTLEY   0.306
  • BARNES  0.308
  • CODY    0.310
  • OGDEN   0.310
  • RIEG    0.310
  • PARKS   0.312
  • LUCAS   0.314
  • SPEAR   0.314
  • HART    0.316
  • FITCH   0.320
  • JONES   0.338

Downs dropped from his rookie year of .348 to .304; Hart improved from .256 to .316.

The sportswriter in Hart’s city wrote three columns on how Hart was improving; Baseball Weekly also mentioned him with favor. A “comer.” Jones also improved a lot — he even got a vote or two for MVP with his .338 average. Kelly’s year was a big disappointment in his city, dropping from .316 in his rookie year to .270.

But — there was ABSOLUTELY NO DIFFERENCE between any of these players — it was all chance that operated.

The teams that had these players, because I made them this way, had good years and went to the playoffs. All 20 of these guys played. That’s the subject of the next post. 


November 14, 2000

Post 3:

Continuing the process of evaluating 20 players, all of equal (.300) batting capability.

By the end of the playoffs, each man had batted 20 times, excepting walks, errors, etc. Here is the outcome:

  • QUINN   0.050
  • CODY    0.100
  • ADAMS   0.200
  • BARNES  0.200
  • PARKS   0.200
  • TURNER  0.200
  • IVES    0.250
  • NORTON  0.250
  • UTLEY   0.250
  • DOWNS   0.300
  • HART    0.300
  • MORRIS  0.300
  • SPEAR   0.300
  • KELLY   0.350
  • LUCAS   0.350
  • RIEG    0.350
  • JONES   0.400
  • OGDEN   0.400
  • FITCH   0.450
  • GARCIA  0.550

The writers, of course, gave Garcia the MVP award,  and had harsh words for Quinn, pointing out that he had batted .316 his rookie year, dropped to .288 in the past season, and, when facing the superior pitching of the playoffs, had gone only 1 for 20.

A variability of 500 points. And not one cause of that variabilty except chance. That did not stop the baseball writers, of course. They had a lot to say.

No — I’m not knocking baseball writers. But I am suggesting that chance may play a larger part on how the stats turn out — and how players are perceived, than some people think.

Tomorrow I’ll will jump 10 or 12 years into the future and see how these boys turned out. Then I’ll will do some more interesting analyses involving teams, and players within those teams.


Post 4:

This is the 4th in a series of experiments in which I look at the variability of baseball.

In posts 1, 2 and 3,  I followed the careers of 20 players, each endowed by god (me) with the capability to bat successfully in exactly 3 out of every 10 at bats.

We have seen their rookie years, where they played a half season, with 250 at bats, their first full time years, with 500 at bats,  their performance in the playoffs, with 20 at bats. In this post I jump ahead in time to see how they did in a complete career.

(BTW, I erred in posts 1 and 2; the times at bat were 250 and 500, not 200 and 400. Sorry.)

All players wound up (because I said so) with 3,000 at bats. Here is how they finished:

  • UTLEY   0.286
  • IVES    0.287
  • CODY    0.290
  • MORRIS  0.293
  • TURNER  0.293
  • KELLY   0.295
  • JONES   0.298
  • ADAMS   0.300
  • QUINN   0.301
  • FITCH   0.302
  • OGDEN   0.303
  • SPEAR   0.303
  • DOWNS   0.305
  • PARKS   0.306
  • BARNES  0.307
  • HART    0.307
  • NORTON  0.307
  • LUCAS   0.315
  • RIEG    0.315
  • GARCIA  0.318

You remember Garcia, don’t you? He is the one who was the MVP in the playoffs. He went on to bat .318 lifetime. Utley, on the other hand, started with a .320 in his rookie year, dropped to .306  in his second season, did poorly in the playoffs (.250) and finished with .286 lifetime. A credible career, but not one to remember particularly.

Yet, there was no difference at all between Garcia and Utley except chance — the vagrant gust of wind, the rough (or smooth) infield, the insect that encountered the pitched ball which changed the ball’s path ever so slightly. From a comfortable armchair, we SABRites look at Utley and Garcia, and while neither (at least on the basis of batting average alone) are HOF candidates, Garcia at least is worth initial consideration; Utley is not.

Baseball is, of course, much more than chance, and my thesis is not that stats are without value. But we agonize (sometimes) that Mantle missed .300 by so little — and do not acknowledge that if the universe was replayed 10 or 20 times, he might well have had a final batting average much different than .299 — perhaps higher — perhaps lower.

As you may suspect, I’m not done with this thesis. In the next post I’ll look at 10 players who are all genuine HOF candidates, again, only on the basis of their batting average. Then, in post #6 I’m going to turn my attention to the same kind of analysis, this time looking at team performance.

BTW, my protocol for the preceding was to set up a player as a spreadsheet, then run & print the spreadsheet exactly 20 times. I then wrote player names, in alphbetical order, on each of the 20 spreadsheets, and analysed the results. Clearly, I could do this n times, where n is any number I wanted. I did it exactly 20 times and stopped.

I could have also done it 100 times and selected the 20 I wanted. This protocol would clearly not have been a good example of anything.

The specific spreadsheet formula (uSoft WORKS) used for each at bat was =IF((RAND()-$B$6)<0,1,0)

where B6 was set to .3

Yes, I know that rigorous statistical analyses are also possible. But they (in general) don’t show what might actually happen. Much like an analysis of bridge hands is useful — but an actual deal will give a player more insight, even though that particular deal is so rare that he will likely never see it again.


Post 5:

This is post #5 on baseball variability. It is the last one I will make on player batting averages considered alone.

I reran the simulator using players with a .35 chance of a hit in each at bat. Each of these HOF-caliber players had 8,000 at bats. I also ran each player through 20 at bats in four world series. Here are the results:

Name Life avg WS#1 avg WS#2 avg WS#3 avg WS#4 avg
Abner .344 .400 .400 .300 .150
Baker .347 .350 .300 .200 .400
Champ .350 .150 .550 .350 .450
Dempsey .338 .300 .250 .400 .450
Epsley .344 .250 .400 .150 .400
Folger .358 .300 .300 .300 .300
Grimes .353 .300 .450 .300 .350
Hanes .350 .600 .350 .350 .500
Isley .343 .300 .350 .400 .350
Jenkins .353 .300 .250 .400 .250

 

This simulation is much less interesting. By the time 8,000 at bats are attained, the variability is down a great deal, the lifetime range above being only between Folger at .358 and Dempsey at .338. And all of these “greats” generally excelled in World Series play.

Still — would we not regard a .358 lifetime hitter as significantly better than one who hit .338?

In the next post I’ll look at team variability.


November 15, 2000

Post 6:

This is the 6th in a series of posts that discusses the inherent variability of baseball.

I’ve looked at batting average variability, and argued that chance can account for a wide range of results for any player, regardless of how good he is.

Now what about teams?

My protocol is as follows: I have created a league of eight teams. each of these teams has the inherent capability, in terms of averages, ERAs, etc. of the 1948 Cleveland Indians. That team went 97 and 59. How would a season look if all eight teams were exactly the same as the 1948 Indians?

I have to differentiate somehow between the teams, so I’ll do so by color. I ran five seasons. Here are the results, which are somewhat surprising to me: 

Season #1
Team    Record
Pink    87 67
Green   86 68
Red     81 73
Aqua    79 75
Brown   78 76
Blue    72 82
Yellow  67 97
White  66 88

The managers of White and Yellow got fired. But their team was EXACTLY the same as that of the others.

Season #2
Team    Record
Brown   85 69
Blue    85 69
Aqua    80 74
Green   77 77
Pink    75 79
Red     75 79
Yellow  72 82
White   67 87

Interesting that Yellow and White again finished last. The manager of Brown was praised to the skies for bringing his team up from 5th to first place.

Season #3
Team    Record
Yellow  85 69
Green   81 73
Pink    79 75
Blue    79 75
Brown   75 79
Red     75 79
White   73 81
Aqua    69 85

This year all the kudos went to Yellow. 

Season #4
Team    Record
Green   86 68
Red     83 71
Brown   78 76
Yellow  78 76
Pink    78 76
Aqua    73 81
Blue    72 82
White   68 86

Four years at or near the cellar and the owners of White are getting frustrated! 

 

Season #5
Team    Record
White   83 71
Brown   81 73
Yellow  80 74
Pink    80 74
Red     76 78
Aqua    75 79
Green   72 82
Blue    69 85

No — I didn’t “make” white win at last. That’s just the way it turned out. There was no difference at all between the eight teams.

So the next time the Cubs finish 14 games off of first, can I say it is just chance that did it? I think not — but one can say that chance has a role to play.


November 16, 2000

Post 7:

As an aside, before going ahead, thanks to several who sent me private emails of appreciation for this series (nobody has said, yet, anything bad about them).

I replied privately to all of these but at least one reply was returned because of a full recipient mailbox. So if you sent me a comment and did not get a reply, check your mailbox limits.

The question is how the last experiment might easily be replicated. I have the code; it is a variant of a computer baseball exercise written about ten years ago and sold as shareware (I bought it). I will package a set of files which will allow the preceding experiment to be performed easily but the main application program is disabled. If anyone wants a copy — email me privately; I’ll send them, with instructions (simple) as a ZIP file.

One correspondent thought that there had been a SABR article about 10 years ago which argued much the same thing, concluding that by the end of a season a variation of plus or minus 9 games for any team was within the probable error (50% chance).  That sounds somewhat high to me — does anyone remember the article in question?

Later…


November 17, 2000

Post 8:

I’ve looked (tests 1) at players, and I’ve looked (tests 2) at teams. For the last tests (3) exercise, I’ll look at players within teams.

Tests #1 can be replicated by anyone who is interested using a PC spreadsheet.

Tests #2 can be replicated by anyone who is interested by asking me for the code to do it.

Tests #3 can be replicated by anyone who has the PC shareware program SIMBASE. This was written about 1989, and may not be available any longer. The author / address is:

Phillip Smith
PMS Software of Canada
109 Tripp Crescent
Nepean, Ontario
Canada K2J 1E2

I think I paid $15 for the program; it is an excellent simulator.

What I will do in this series of tests is to take the 1987 Indians and have them play against each other, first for a season of 154 games; then for a stretch of 600 games, approximating four seasons (the code limits are 600 games at a time).

I’ll set up the same team of nine players for each game, and compare how they do vs one another. Here are the players I will use:

1. Julio Franco         1. Julio Franco
2. Brook Jacoby         2. Brook Jacoby
3. Joe Carter           3. Joe Carter
4. Mel Hall             4. Mel Hall
5. Cory Snyder          5. Cory Snyder
6. Carmelo Castillo     6. Carmelo Castillo
7. Eddie Williams       7. Eddie Williams
8. Junior Noboa         8. Junior Noboa
9. Tommy Hinzo          9. Tommy Hinzo

A set of fairly complete stats will be kept.

Results in the next post.


November 18, 2000

Post 9:

Here are the actual batter statistics for 1987 against all pitchers in the league. I will have Tom Candiotti pitch the games, and as he was somewhat different than average that year, the results will have some differences based on pitcher characteristics as well as chance.

 

                       +-------------------------+
Cleveland Indians   AB  1B  2B  3B  HR   H  BB  SO  OO     BA     SA

Julio Franco       495 123  24   3   8 158  60  56 281  0.319  0.428
Brook Jacoby       540 100  26   4  32 162  78  73 305  0.300  0.540
Joe Carter         588  94  27   2  32 155  36 105 328  0.263  0.479
Mel Hall           485  96  21   1  18 136  21  68 281  0.280  0.439
Cory Snyder        577  76  25   2  33 136  32 166 275  0.235  0.457
Carmelo Castillo   220  27  17   0  11  55  16  52 113  0.250  0.477
Eddie Williams     283  55  12   0  15  82  40  56 145  0.289  0.491
Junior Noboa       511  89  36   5  19 149  40  41 321  0.291  0.493
Tommy Hinzo        257  53   9   3   3  68  12  49 140  0.264  0.357

 

I just played 154 games. The results:

 ---------------------------------------------------------------------------
                     Visiting Team                 Home Team
                   CLEVELAND INDIANS           CLEVELAND INDIANS

                Runs    Hits   Errors       Runs    Hits   Errors

  Total number   900    1461      161        854    1368      195
  Average number 5.8     9.5      1.0        5.5     8.9      1.3
  Std. deviation 3.3     3.2        1        2.9     3.1      1.2
  Variance      10.9    10.3      0.9        8.5     9.8      1.5

              Number      Percentage      Number      Percentage
  Wins          74            48.1          80            51.9

 

————————————————————————–
And the player stats for the year:

                       +---------------------------+
 Cleveland Indians AB  1B  2B  3B  HR   H  BB  SO  OO     BA     SA
 (Visitors)
 Julio Franco     649 140  31   1   9 181  93  63 405  0.278  0.371
 Brook Jacoby     616 107  31   7  53 198 107  55 363  0.321  0.652
 Joe Carter       664  84  20   2  40 146  47  86 432  0.219  0.436
 Mel Hall         675 114  26   1  25 166  19  73 436  0.245  0.398
 Cory Snyder      621  65  37   0  44 146  54 136 339  0.235  0.507
 Carmelo Castillo 615  89  50   0  36 175  43 102 338  0.284  0.541
 Eddie Williams   551  86  18   0  35 139  99  84 328  0.252  0.475
 Junior Noboa     579  90  36  10  25 161  47  37 381  0.278  0.504
 Tommy Hinzo      576 109  19  10  11 149  32  67 360  0.258  0.383                   

 

                        +---------------------------+
 Cleveland Indians AB  1B  2B  3B  HR   H  BB  SO  OO     BA     SA
 (Home)
 Julio Franco     627 129  32   6   8 175  76  48 404  0.279  0.387
 Brook Jacoby     589  97  31   5  31 164  97  67 358  0.278  0.505
 Joe Carter       626  83  24   0  48 155  47  85 386  0.247  0.515
 Mel Hall         628 119  23   2  22 166  33  63 399  0.264  0.412
 Cory Snyder      594  76  22   2  43 143  50 145 306  0.240  0.501
 Carmelo Castillo 577  59  39   0  22 120  56 117 340  0.207  0.389
 Eddie Williams   527 103  15   0  31 149  85  86 292  0.282  0.487
 Junior Noboa     543 100  36   3  27 166  49  42 335  0.305  0.532
 Tommy Hinzo      541  99  15  10   6 130  34  74 337  0.240  0.338

 

Franco hit .278 and .279 -- pretty close.   difference  .001
But Jacoby hit .321 and .278.               difference -.043
Carter hit .219 and .247                    difference  .028
Hall hit .245 and .264                      difference  .019
Snyder hit .235 and .240                    difference  .005
Castillo hit .284 and .207!                 difference -.077
Williams hit .252 and .282                  difference  .030
Noboa hit .278 and .305                     difference  .027
Hinzo hit .258 and .240                     difference -.018

Since I don’t know the innards of the SIMBASE program, I don’t know if there is a home team / visiting team bias built in. There might be. But that bias is not likely to explain the differences shown above. Interested people can easily compare the other statistics. Castillo’s stats alone sort of boggle the mind. One guy we’d be giving a bonus to — the other is likely out of a job. Yet both are the same player, with the same capabilities, playing on the same team.

I wanted to look at possible home team bias, so I ran two tests of 600 games each, the equivalent of about four seasons each.

In test 1, the home team won, 301 to 299. The widest variance I found in the batters was Williams, who batted .292 as a member of the visiting team and .264 as a member of the home team. All the other variances were, however, in single digits.

In test 2, the visitors prevailed, 310 to 290. Batting variances were in a range of 1 to 16 points, most in double digits.

This seemed to indicate no home team bias, but not being sure, I ran 20 more series of 600 games. Here are the results (including the tests above:

Test    Home    Visitors        Avg
                       
1       301     299     50.2%
2       290     310     48.3%
3       292     308     48.7%
4       301     299     50.2%
5       306     294     51.0%
6       336     264     56.0%
7       318     282     53.0%
8       313     287     52.2%
9       302     298     50.3%
10      330     270     55.0%
11      327     273     54.5%
12      289     311     48.2%
13      300     300     50.0%
14      301     299     50.2%
15      309     291     51.5%
16      305     295     50.8%
17      300     300     50.0%
18      308     292     51.3%
19      297     303     49.5%
20      301     299     50.2%
21      289     311     48.2%
22      305     295     50.8%
                       
Totals  6720    6480    50.9%

This suggests to me that the “no home team bias” assumption might be true, but can not be supported. However, since the generally accepted notion of home team advantage is pretty well understood to be larger than that measured (51%), it does appear that if this simulator has one, it is lower than the accepted rates.

I’m going to quit this series here. I’ve made the argument that chance plays a large part in baseball — and that its influence on the outcome of games as well as the resulting statistics is often overlooked by some of us, fans, SABRites, writers and broadcasters. That does not diminish, in my judgement, either the inherent worth nor the enjoyment of statistics. The thesis simply enjoins us to take them for what they are worth, imperfect measures of imperfect players made by imperfect people, some better than others, all talented far beyond the average person, who have given us over a hundred years of great enjoyment and will continue to do so for years to come. As a Christian, I fully expect to see many sports played in heaven. Baseball will prominent among them. What delights we shall still see. Ruth batting against Feller? What joy.


Post 10:

I could not resist running one more test. Back in 1960/61,  I built a computer simulator for the IBM 1620 computer, and, over the next year, fooled around with tests on it a great deal.

One of the issues I was concerned with in that day was the relative worth of a “super-slugger” in the lineup. One of the tests I made was to create two teams equal in every way overall, but with one having every player of equal capability and the other having eight players of lesser capability with a super-slugger batting fourth. The question I had was — how much better would the second team be than the first?

That was 40 years ago — I recall that I could run about 100 games on that old clunker in 10 to 15 seconds; I recall running it many times. Notes indicate that overall I saw the balanced team win more frequently than the unbalanced one — which would argue against spending one’s salary dollars accordingly. But I did not keep the results, and so those tests are rubbish history except for the question they pose.

Today I pulled out SIMBASE again and went into the team data section to see what mischief I could do. I created two teams, exactly equal, except one had nine players with:

  • a .322 batting average (186 for 578),
  • a .425 slugging average,
  • 5 homers,
  • 8 triples,
  • and 29 doubles,

and the other had eight players with

  • a .299 batting average (173 for 578),
  • a .380 slugging average,
  • 3 homers,
  • 6 triples,
  • and 26 doubles,

and one player, batting fourth, with

  • a .501 batting average (290 for 578),
  • a .785 slugging average,
  • 21 homers,
  • 24 triples,
  • and 53 doubles.

The net of these two teams was that they were the same statistically, taken as a team.

Well — I played 10 sets of 600 games between these teams. They wound up dead even; 3000 wins each.

Which shoots down my original thesis, but does still suggest that perhaps a super-slugger may not be worth the salary money he is paid if  lesser players have to be used to fill out the roster.

When an owner picks a super-player, of course, he also looks for the intangibles — how much inspiration he might be to other players — how many extra fans he will bring in, and so forth. I know that it was Feller on the Indians of the 30s to 50s that brought me out to the ballpark. Over that period of time I’d guess his pitching probably accounted for at least a couple dozen “extra” games for our family.

Another 2c worth.

John Burgeson (2001 has GOT to be the Tribe’s year!)

 

SABR members, you can subscribe to the SABR-L listserv at SABR.org/about/sabr-l.



Originally published: April 10, 2012. Last Updated: April 10, 2012.