John Burgeson on the variability of baseball statistics
Editor’s note: The following is a series of posts made by longtime SABR member John Burgeson to the SABR-L listserv in November 2000. They were mentioned in a Grantland.com story by Bess Kalb on Tuesday, April 10 about his pioneering efforts to create a baseball simulation game on his IBM 1620 computer fifty years ago.
The inherent variability of statistics
By John Burgeson
November 13, 2000
Post 1:
Baseball depends on statistics; I understand that.
But, I assert, perhaps it does so by taking them much too seriously.
This will be the first post in a series in which I look at this idea in, perhaps, a different way.
Lets start with batting averages. One could choose any measure, but that one is the easiest to focus upon.
Playing “god,” I just created twenty (20) pretty fair baseball players and placed them on major league teams. This is their rookie year. I’ll make sure each gets exactly 200 at bats, and the rest of their stats are about average (I’ll not measure them).
Each time one of these 20 players comes to bat, I’ll roll the dice in such a way that he has EXACTLY three chances in ten of getting a hit, excepting walks and HBPs. When the season ends, how will they look?
Of course, many possibilities exist. I ran the exercise on my PC; here is how the season ended:
- HART 0.256
- IVES 0.268
- LUCAS 0.280
- JONES 0.284
- NORTON 0.288
- FITCH 0.292
- MORRIS 0.292
- BARNES 0.300
- PARKS 0.300
- ADAMS 0.308
- TURNER 0.308
- KELLY 0.316
- QUINN 0.316
- GARCIA 0.320
- UTLEY 0.320
- OGDEN 0.324
- RIEG 0.332
- CODY 0.340
- SPEAR 0.344
- DOWNS 0.348
Well — Downs got favorable mention for “Rookie of the Year” while Hart was not sure he’d have a job the next season. Yet there was ABSOLUTELY NO DIFFERENCE between Downs and Hart. None at all. Nada. Zip. Next post I’ll look at the following season.
Post 2:
I[n] continuing the process in which I am the “god” that built these 20 players, each of which has exactly 3 chances in 10 of getting a hit every time he comes to the plate (walks, errors, HBPs, etc. excepted):
All 20 of my players were picked up for season 2, and I controlled that season so that each man got exactly 400 at bats during the season. As might be expected, the variability was down — perhaps not as much as one might expect.
- KELLY 0.270
- IVES 0.278
- GARCIA 0.280
- MORRIS 0.284
- TURNER 0.286
- QUINN 0.288
- NORTON 0.302
- ADAMS 0.304
- DOWNS 0.304
- UTLEY 0.306
- BARNES 0.308
- CODY 0.310
- OGDEN 0.310
- RIEG 0.310
- PARKS 0.312
- LUCAS 0.314
- SPEAR 0.314
- HART 0.316
- FITCH 0.320
- JONES 0.338
Downs dropped from his rookie year of .348 to .304; Hart improved from .256 to .316.
The sportswriter in Hart’s city wrote three columns on how Hart was improving; Baseball Weekly also mentioned him with favor. A “comer.” Jones also improved a lot — he even got a vote or two for MVP with his .338 average. Kelly’s year was a big disappointment in his city, dropping from .316 in his rookie year to .270.
But — there was ABSOLUTELY NO DIFFERENCE between any of these players — it was all chance that operated.
The teams that had these players, because I made them this way, had good years and went to the playoffs. All 20 of these guys played. That’s the subject of the next post.
November 14, 2000
Post 3:
Continuing the process of evaluating 20 players, all of equal (.300) batting capability.
By the end of the playoffs, each man had batted 20 times, excepting walks, errors, etc. Here is the outcome:
- QUINN 0.050
- CODY 0.100
- ADAMS 0.200
- BARNES 0.200
- PARKS 0.200
- TURNER 0.200
- IVES 0.250
- NORTON 0.250
- UTLEY 0.250
- DOWNS 0.300
- HART 0.300
- MORRIS 0.300
- SPEAR 0.300
- KELLY 0.350
- LUCAS 0.350
- RIEG 0.350
- JONES 0.400
- OGDEN 0.400
- FITCH 0.450
- GARCIA 0.550
The writers, of course, gave Garcia the MVP award, and had harsh words for Quinn, pointing out that he had batted .316 his rookie year, dropped to .288 in the past season, and, when facing the superior pitching of the playoffs, had gone only 1 for 20.
A variability of 500 points. And not one cause of that variabilty except chance. That did not stop the baseball writers, of course. They had a lot to say.
No — I’m not knocking baseball writers. But I am suggesting that chance may play a larger part on how the stats turn out — and how players are perceived, than some people think.
Tomorrow I’ll will jump 10 or 12 years into the future and see how these boys turned out. Then I’ll will do some more interesting analyses involving teams, and players within those teams.
Post 4:
This is the 4th in a series of experiments in which I look at the variability of baseball.
In posts 1, 2 and 3, I followed the careers of 20 players, each endowed by god (me) with the capability to bat successfully in exactly 3 out of every 10 at bats.
We have seen their rookie years, where they played a half season, with 250 at bats, their first full time years, with 500 at bats, their performance in the playoffs, with 20 at bats. In this post I jump ahead in time to see how they did in a complete career.
(BTW, I erred in posts 1 and 2; the times at bat were 250 and 500, not 200 and 400. Sorry.)
All players wound up (because I said so) with 3,000 at bats. Here is how they finished:
- UTLEY 0.286
- IVES 0.287
- CODY 0.290
- MORRIS 0.293
- TURNER 0.293
- KELLY 0.295
- JONES 0.298
- ADAMS 0.300
- QUINN 0.301
- FITCH 0.302
- OGDEN 0.303
- SPEAR 0.303
- DOWNS 0.305
- PARKS 0.306
- BARNES 0.307
- HART 0.307
- NORTON 0.307
- LUCAS 0.315
- RIEG 0.315
- GARCIA 0.318
You remember Garcia, don’t you? He is the one who was the MVP in the playoffs. He went on to bat .318 lifetime. Utley, on the other hand, started with a .320 in his rookie year, dropped to .306 in his second season, did poorly in the playoffs (.250) and finished with .286 lifetime. A credible career, but not one to remember particularly.
Yet, there was no difference at all between Garcia and Utley except chance — the vagrant gust of wind, the rough (or smooth) infield, the insect that encountered the pitched ball which changed the ball’s path ever so slightly. From a comfortable armchair, we SABRites look at Utley and Garcia, and while neither (at least on the basis of batting average alone) are HOF candidates, Garcia at least is worth initial consideration; Utley is not.
Baseball is, of course, much more than chance, and my thesis is not that stats are without value. But we agonize (sometimes) that Mantle missed .300 by so little — and do not acknowledge that if the universe was replayed 10 or 20 times, he might well have had a final batting average much different than .299 — perhaps higher — perhaps lower.
As you may suspect, I’m not done with this thesis. In the next post I’ll look at 10 players who are all genuine HOF candidates, again, only on the basis of their batting average. Then, in post #6 I’m going to turn my attention to the same kind of analysis, this time looking at team performance.
BTW, my protocol for the preceding was to set up a player as a spreadsheet, then run & print the spreadsheet exactly 20 times. I then wrote player names, in alphbetical order, on each of the 20 spreadsheets, and analysed the results. Clearly, I could do this n times, where n is any number I wanted. I did it exactly 20 times and stopped.
I could have also done it 100 times and selected the 20 I wanted. This protocol would clearly not have been a good example of anything.
The specific spreadsheet formula (uSoft WORKS) used for each at bat was =IF((RAND()-$B$6)<0,1,0)
where B6 was set to .3
Yes, I know that rigorous statistical analyses are also possible. But they (in general) don’t show what might actually happen. Much like an analysis of bridge hands is useful — but an actual deal will give a player more insight, even though that particular deal is so rare that he will likely never see it again.
Post 5:
This is post #5 on baseball variability. It is the last one I will make on player batting averages considered alone.
I reran the simulator using players with a .35 chance of a hit in each at bat. Each of these HOF-caliber players had 8,000 at bats. I also ran each player through 20 at bats in four world series. Here are the results:
Name | Life avg | WS#1 avg | WS#2 avg | WS#3 avg | WS#4 avg |
---|---|---|---|---|---|
Abner | .344 | .400 | .400 | .300 | .150 |
Baker | .347 | .350 | .300 | .200 | .400 |
Champ | .350 | .150 | .550 | .350 | .450 |
Dempsey | .338 | .300 | .250 | .400 | .450 |
Epsley | .344 | .250 | .400 | .150 | .400 |
Folger | .358 | .300 | .300 | .300 | .300 |
Grimes | .353 | .300 | .450 | .300 | .350 |
Hanes | .350 | .600 | .350 | .350 | .500 |
Isley | .343 | .300 | .350 | .400 | .350 |
Jenkins | .353 | .300 | .250 | .400 | .250 |
This simulation is much less interesting. By the time 8,000 at bats are attained, the variability is down a great deal, the lifetime range above being only between Folger at .358 and Dempsey at .338. And all of these “greats” generally excelled in World Series play.
Still — would we not regard a .358 lifetime hitter as significantly better than one who hit .338?
In the next post I’ll look at team variability.
November 15, 2000
Post 6:
This is the 6th in a series of posts that discusses the inherent variability of baseball.
I’ve looked at batting average variability, and argued that chance can account for a wide range of results for any player, regardless of how good he is.
Now what about teams?
My protocol is as follows: I have created a league of eight teams. each of these teams has the inherent capability, in terms of averages, ERAs, etc. of the 1948 Cleveland Indians. That team went 97 and 59. How would a season look if all eight teams were exactly the same as the 1948 Indians?
I have to differentiate somehow between the teams, so I’ll do so by color. I ran five seasons. Here are the results, which are somewhat surprising to me:
Season #1
Team Record
Pink 87 67
Green 86 68
Red 81 73
Aqua 79 75
Brown 78 76
Blue 72 82
Yellow 67 97
White 66 88
The managers of White and Yellow got fired. But their team was EXACTLY the same as that of the others.
Season #2
Team Record
Brown 85 69
Blue 85 69
Aqua 80 74
Green 77 77
Pink 75 79
Red 75 79
Yellow 72 82
White 67 87
Interesting that Yellow and White again finished last. The manager of Brown was praised to the skies for bringing his team up from 5th to first place.
Season #3
Team Record
Yellow 85 69
Green 81 73
Pink 79 75
Blue 79 75
Brown 75 79
Red 75 79
White 73 81
Aqua 69 85
This year all the kudos went to Yellow.
Season #4
Team Record
Green 86 68
Red 83 71
Brown 78 76
Yellow 78 76
Pink 78 76
Aqua 73 81
Blue 72 82
White 68 86
Four years at or near the cellar and the owners of White are getting frustrated!
Season #5
Team Record
White 83 71
Brown 81 73
Yellow 80 74
Pink 80 74
Red 76 78
Aqua 75 79
Green 72 82
Blue 69 85
No — I didn’t “make” white win at last. That’s just the way it turned out. There was no difference at all between the eight teams.
So the next time the Cubs finish 14 games off of first, can I say it is just chance that did it? I think not — but one can say that chance has a role to play.
November 16, 2000
Post 7:
As an aside, before going ahead, thanks to several who sent me private emails of appreciation for this series (nobody has said, yet, anything bad about them).
I replied privately to all of these but at least one reply was returned because of a full recipient mailbox. So if you sent me a comment and did not get a reply, check your mailbox limits.
The question is how the last experiment might easily be replicated. I have the code; it is a variant of a computer baseball exercise written about ten years ago and sold as shareware (I bought it). I will package a set of files which will allow the preceding experiment to be performed easily but the main application program is disabled. If anyone wants a copy — email me privately; I’ll send them, with instructions (simple) as a ZIP file.
One correspondent thought that there had been a SABR article about 10 years ago which argued much the same thing, concluding that by the end of a season a variation of plus or minus 9 games for any team was within the probable error (50% chance). That sounds somewhat high to me — does anyone remember the article in question?
Later…
November 17, 2000
Post 8:
I’ve looked (tests 1) at players, and I’ve looked (tests 2) at teams. For the last tests (3) exercise, I’ll look at players within teams.
Tests #1 can be replicated by anyone who is interested using a PC spreadsheet.
Tests #2 can be replicated by anyone who is interested by asking me for the code to do it.
Tests #3 can be replicated by anyone who has the PC shareware program SIMBASE. This was written about 1989, and may not be available any longer. The author / address is:
Phillip Smith
PMS Software of Canada
109 Tripp Crescent
Nepean, Ontario
Canada K2J 1E2
I think I paid $15 for the program; it is an excellent simulator.
What I will do in this series of tests is to take the 1987 Indians and have them play against each other, first for a season of 154 games; then for a stretch of 600 games, approximating four seasons (the code limits are 600 games at a time).
I’ll set up the same team of nine players for each game, and compare how they do vs one another. Here are the players I will use:
1. Julio Franco 1. Julio Franco
2. Brook Jacoby 2. Brook Jacoby
3. Joe Carter 3. Joe Carter
4. Mel Hall 4. Mel Hall
5. Cory Snyder 5. Cory Snyder
6. Carmelo Castillo 6. Carmelo Castillo
7. Eddie Williams 7. Eddie Williams
8. Junior Noboa 8. Junior Noboa
9. Tommy Hinzo 9. Tommy Hinzo
A set of fairly complete stats will be kept.
Results in the next post.
November 18, 2000
Post 9:
Here are the actual batter statistics for 1987 against all pitchers in the league. I will have Tom Candiotti pitch the games, and as he was somewhat different than average that year, the results will have some differences based on pitcher characteristics as well as chance.
+-------------------------+
Cleveland Indians AB 1B 2B 3B HR H BB SO OO BA SA
Julio Franco 495 123 24 3 8 158 60 56 281 0.319 0.428
Brook Jacoby 540 100 26 4 32 162 78 73 305 0.300 0.540
Joe Carter 588 94 27 2 32 155 36 105 328 0.263 0.479
Mel Hall 485 96 21 1 18 136 21 68 281 0.280 0.439
Cory Snyder 577 76 25 2 33 136 32 166 275 0.235 0.457
Carmelo Castillo 220 27 17 0 11 55 16 52 113 0.250 0.477
Eddie Williams 283 55 12 0 15 82 40 56 145 0.289 0.491
Junior Noboa 511 89 36 5 19 149 40 41 321 0.291 0.493
Tommy Hinzo 257 53 9 3 3 68 12 49 140 0.264 0.357
I just played 154 games. The results:
---------------------------------------------------------------------------
Visiting Team Home Team
CLEVELAND INDIANS CLEVELAND INDIANS
Runs Hits Errors Runs Hits Errors
Total number 900 1461 161 854 1368 195
Average number 5.8 9.5 1.0 5.5 8.9 1.3
Std. deviation 3.3 3.2 1 2.9 3.1 1.2
Variance 10.9 10.3 0.9 8.5 9.8 1.5
Number Percentage Number Percentage
Wins 74 48.1 80 51.9
————————————————————————–
And the player stats for the year:
+---------------------------+
Cleveland Indians AB 1B 2B 3B HR H BB SO OO BA SA
(Visitors)
Julio Franco 649 140 31 1 9 181 93 63 405 0.278 0.371
Brook Jacoby 616 107 31 7 53 198 107 55 363 0.321 0.652
Joe Carter 664 84 20 2 40 146 47 86 432 0.219 0.436
Mel Hall 675 114 26 1 25 166 19 73 436 0.245 0.398
Cory Snyder 621 65 37 0 44 146 54 136 339 0.235 0.507
Carmelo Castillo 615 89 50 0 36 175 43 102 338 0.284 0.541
Eddie Williams 551 86 18 0 35 139 99 84 328 0.252 0.475
Junior Noboa 579 90 36 10 25 161 47 37 381 0.278 0.504
Tommy Hinzo 576 109 19 10 11 149 32 67 360 0.258 0.383
+---------------------------+
Cleveland Indians AB 1B 2B 3B HR H BB SO OO BA SA
(Home)
Julio Franco 627 129 32 6 8 175 76 48 404 0.279 0.387
Brook Jacoby 589 97 31 5 31 164 97 67 358 0.278 0.505
Joe Carter 626 83 24 0 48 155 47 85 386 0.247 0.515
Mel Hall 628 119 23 2 22 166 33 63 399 0.264 0.412
Cory Snyder 594 76 22 2 43 143 50 145 306 0.240 0.501
Carmelo Castillo 577 59 39 0 22 120 56 117 340 0.207 0.389
Eddie Williams 527 103 15 0 31 149 85 86 292 0.282 0.487
Junior Noboa 543 100 36 3 27 166 49 42 335 0.305 0.532
Tommy Hinzo 541 99 15 10 6 130 34 74 337 0.240 0.338
Franco hit .278 and .279 -- pretty close. difference .001
But Jacoby hit .321 and .278. difference -.043
Carter hit .219 and .247 difference .028
Hall hit .245 and .264 difference .019
Snyder hit .235 and .240 difference .005
Castillo hit .284 and .207! difference -.077
Williams hit .252 and .282 difference .030
Noboa hit .278 and .305 difference .027
Hinzo hit .258 and .240 difference -.018
Since I don’t know the innards of the SIMBASE program, I don’t know if there is a home team / visiting team bias built in. There might be. But that bias is not likely to explain the differences shown above. Interested people can easily compare the other statistics. Castillo’s stats alone sort of boggle the mind. One guy we’d be giving a bonus to — the other is likely out of a job. Yet both are the same player, with the same capabilities, playing on the same team.
I wanted to look at possible home team bias, so I ran two tests of 600 games each, the equivalent of about four seasons each.
In test 1, the home team won, 301 to 299. The widest variance I found in the batters was Williams, who batted .292 as a member of the visiting team and .264 as a member of the home team. All the other variances were, however, in single digits.
In test 2, the visitors prevailed, 310 to 290. Batting variances were in a range of 1 to 16 points, most in double digits.
This seemed to indicate no home team bias, but not being sure, I ran 20 more series of 600 games. Here are the results (including the tests above:
Test Home Visitors Avg
1 301 299 50.2%
2 290 310 48.3%
3 292 308 48.7%
4 301 299 50.2%
5 306 294 51.0%
6 336 264 56.0%
7 318 282 53.0%
8 313 287 52.2%
9 302 298 50.3%
10 330 270 55.0%
11 327 273 54.5%
12 289 311 48.2%
13 300 300 50.0%
14 301 299 50.2%
15 309 291 51.5%
16 305 295 50.8%
17 300 300 50.0%
18 308 292 51.3%
19 297 303 49.5%
20 301 299 50.2%
21 289 311 48.2%
22 305 295 50.8%
Totals 6720 6480 50.9%
This suggests to me that the “no home team bias” assumption might be true, but can not be supported. However, since the generally accepted notion of home team advantage is pretty well understood to be larger than that measured (51%), it does appear that if this simulator has one, it is lower than the accepted rates.
I’m going to quit this series here. I’ve made the argument that chance plays a large part in baseball — and that its influence on the outcome of games as well as the resulting statistics is often overlooked by some of us, fans, SABRites, writers and broadcasters. That does not diminish, in my judgement, either the inherent worth nor the enjoyment of statistics. The thesis simply enjoins us to take them for what they are worth, imperfect measures of imperfect players made by imperfect people, some better than others, all talented far beyond the average person, who have given us over a hundred years of great enjoyment and will continue to do so for years to come. As a Christian, I fully expect to see many sports played in heaven. Baseball will prominent among them. What delights we shall still see. Ruth batting against Feller? What joy.
Post 10:
I could not resist running one more test. Back in 1960/61, I built a computer simulator for the IBM 1620 computer, and, over the next year, fooled around with tests on it a great deal.
One of the issues I was concerned with in that day was the relative worth of a “super-slugger” in the lineup. One of the tests I made was to create two teams equal in every way overall, but with one having every player of equal capability and the other having eight players of lesser capability with a super-slugger batting fourth. The question I had was — how much better would the second team be than the first?
That was 40 years ago — I recall that I could run about 100 games on that old clunker in 10 to 15 seconds; I recall running it many times. Notes indicate that overall I saw the balanced team win more frequently than the unbalanced one — which would argue against spending one’s salary dollars accordingly. But I did not keep the results, and so those tests are rubbish history except for the question they pose.
Today I pulled out SIMBASE again and went into the team data section to see what mischief I could do. I created two teams, exactly equal, except one had nine players with:
- a .322 batting average (186 for 578),
- a .425 slugging average,
- 5 homers,
- 8 triples,
- and 29 doubles,
and the other had eight players with
- a .299 batting average (173 for 578),
- a .380 slugging average,
- 3 homers,
- 6 triples,
- and 26 doubles,
and one player, batting fourth, with
- a .501 batting average (290 for 578),
- a .785 slugging average,
- 21 homers,
- 24 triples,
- and 53 doubles.
The net of these two teams was that they were the same statistically, taken as a team.
Well — I played 10 sets of 600 games between these teams. They wound up dead even; 3000 wins each.
Which shoots down my original thesis, but does still suggest that perhaps a super-slugger may not be worth the salary money he is paid if lesser players have to be used to fill out the roster.
When an owner picks a super-player, of course, he also looks for the intangibles — how much inspiration he might be to other players — how many extra fans he will bring in, and so forth. I know that it was Feller on the Indians of the 30s to 50s that brought me out to the ballpark. Over that period of time I’d guess his pitching probably accounted for at least a couple dozen “extra” games for our family.
Another 2c worth.
John Burgeson (2001 has GOT to be the Tribe’s year!)
SABR members, you can subscribe to the SABR-L listserv at SABR.org/about/sabr-l.
Originally published: April 10, 2012. Last Updated: April 10, 2012.