# John Burgeson on the variability of baseball statistics

Editor's note: The following is a series of posts made by longtime SABR member John Burgeson to the SABR-L listserv in November 2000. They were mentioned in a Grantland.com story by Bess Kalb on Tuesday, April 10 about his pioneering efforts to create a baseball simulation game on his IBM 1620 computer fifty years ago.

The inherent variability of statistics
By John Burgeson
November 13, 2000

Post 1:

Baseball depends on statistics; I understand that.

But, I assert, perhaps it does so by taking them much too seriously.

This will be the first post in a series in which I look at this idea in, perhaps, a different way.

Lets start with batting averages. One could choose any measure, but that one is the easiest to focus upon.

Playing "god," I just created twenty (20) pretty fair baseball players and placed them on major league teams. This is their rookie year. I'll make sure each gets exactly 200 at bats, and the rest of their stats are about average (I'll not measure them).

Each time one of these 20 players comes to bat, I'll roll the dice in such a way that he has EXACTLY three chances in ten of getting a hit, excepting walks and HBPs. When the season ends, how will they look?

Of course, many possibilities exist. I ran the exercise on my PC; here is how the season ended:

• HART 0.256
• IVES 0.268
• LUCAS 0.280
• JONES 0.284
• NORTON 0.288
• FITCH 0.292
• MORRIS 0.292
• BARNES 0.300
• PARKS 0.300
• TURNER 0.308
• KELLY 0.316
• QUINN 0.316
• GARCIA 0.320
• UTLEY 0.320
• OGDEN 0.324
• RIEG 0.332
• CODY 0.340
• SPEAR 0.344
• DOWNS 0.348

Well -- Downs got favorable mention for "Rookie of the Year" while Hart was not sure he'd have a job the next season. Yet there was ABSOLUTELY NO DIFFERENCE between Downs and Hart. None at all. Nada. Zip. Next post I'll look at the following season.

Post 2:

I[n] continuing the process in which I am the "god" that built these 20 players, each of which has exactly 3 chances in 10 of getting a hit every time he comes to the plate (walks, errors, HBPs, etc. excepted):

All 20 of my players were picked up for season 2, and I controlled that season so that each man got exactly 400 at bats during the season. As might be expected, the variability was down -- perhaps not as much as one might expect.

• KELLY   0.270
• IVES    0.278
• GARCIA  0.280
• MORRIS  0.284
• TURNER  0.286
• QUINN   0.288
• NORTON  0.302
• DOWNS   0.304
• UTLEY   0.306
• BARNES  0.308
• CODY    0.310
• OGDEN   0.310
• RIEG    0.310
• PARKS   0.312
• LUCAS   0.314
• SPEAR   0.314
• HART    0.316
• FITCH   0.320
• JONES   0.338

Downs dropped from his rookie year of .348 to .304; Hart improved from .256 to .316.

The sportswriter in Hart's city wrote three columns on how Hart was improving; Baseball Weekly also mentioned him with favor. A "comer." Jones also improved a lot -- he even got a vote or two for MVP with his .338 average. Kelly's year was a big disappointment in his city, dropping from .316 in his rookie year to .270.

But -- there was ABSOLUTELY NO DIFFERENCE between any of these players -- it was all chance that operated.

The teams that had these players, because I made them this way, had good years and went to the playoffs. All 20 of these guys played. That's the subject of the next post.

November 14, 2000

Post 3:

Continuing the process of evaluating 20 players, all of equal (.300) batting capability.

By the end of the playoffs, each man had batted 20 times, excepting walks, errors, etc. Here is the outcome:

• QUINN   0.050
• CODY    0.100
• BARNES  0.200
• PARKS   0.200
• TURNER  0.200
• IVES    0.250
• NORTON  0.250
• UTLEY   0.250
• DOWNS   0.300
• HART    0.300
• MORRIS  0.300
• SPEAR   0.300
• KELLY   0.350
• LUCAS   0.350
• RIEG    0.350
• JONES   0.400
• OGDEN   0.400
• FITCH   0.450
• GARCIA  0.550

The writers, of course, gave Garcia the MVP award,  and had harsh words for Quinn, pointing out that he had batted .316 his rookie year, dropped to .288 in the past season, and, when facing the superior pitching of the playoffs, had gone only 1 for 20.

A variability of 500 points. And not one cause of that variabilty except chance. That did not stop the baseball writers, of course. They had a lot to say.

No -- I'm not knocking baseball writers. But I am suggesting that chance may play a larger part on how the stats turn out -- and how players are perceived, than some people think.

Tomorrow I'll will jump 10 or 12 years into the future and see how these boys turned out. Then I'll will do some more interesting analyses involving teams, and players within those teams.

Post 4:

This is the 4th in a series of experiments in which I look at the variability of baseball.

In posts 1, 2 and 3,  I followed the careers of 20 players, each endowed by god (me) with the capability to bat successfully in exactly 3 out of every 10 at bats.

We have seen their rookie years, where they played a half season, with 250 at bats, their first full time years, with 500 at bats,  their performance in the playoffs, with 20 at bats. In this post I jump ahead in time to see how they did in a complete career.

(BTW, I erred in posts 1 and 2; the times at bat were 250 and 500, not 200 and 400. Sorry.)

All players wound up (because I said so) with 3,000 at bats. Here is how they finished:

• UTLEY   0.286
• IVES    0.287
• CODY    0.290
• MORRIS  0.293
• TURNER  0.293
• KELLY   0.295
• JONES   0.298
• QUINN   0.301
• FITCH   0.302
• OGDEN   0.303
• SPEAR   0.303
• DOWNS   0.305
• PARKS   0.306
• BARNES  0.307
• HART    0.307
• NORTON  0.307
• LUCAS   0.315
• RIEG    0.315
• GARCIA  0.318

You remember Garcia, don't you? He is the one who was the MVP in the playoffs. He went on to bat .318 lifetime. Utley, on the other hand, started with a .320 in his rookie year, dropped to .306  in his second season, did poorly in the playoffs (.250) and finished with .286 lifetime. A credible career, but not one to remember particularly.

Yet, there was no difference at all between Garcia and Utley except chance -- the vagrant gust of wind, the rough (or smooth) infield, the insect that encountered the pitched ball which changed the ball's path ever so slightly. From a comfortable armchair, we SABRites look at Utley and Garcia, and while neither (at least on the basis of batting average alone) are HOF candidates, Garcia at least is worth initial consideration; Utley is not.

Baseball is, of course, much more than chance, and my thesis is not that stats are without value. But we agonize (sometimes) that Mantle missed .300 by so little --- and do not acknowledge that if the universe was replayed 10 or 20 times, he might well have had a final batting average much different than .299 -- perhaps higher -- perhaps lower.

As you may suspect, I'm not done with this thesis. In the next post I'll look at 10 players who are all genuine HOF candidates, again, only on the basis of their batting average. Then, in post #6 I'm going to turn my attention to the same kind of analysis, this time looking at team performance.

BTW, my protocol for the preceding was to set up a player as a spreadsheet, then run & print the spreadsheet exactly 20 times. I then wrote player names, in alphbetical order, on each of the 20 spreadsheets, and analysed the results. Clearly, I could do this n times, where n is any number I wanted. I did it exactly 20 times and stopped.

I could have also done it 100 times and selected the 20 I wanted. This protocol would clearly not have been a good example of anything.

The specific spreadsheet formula (uSoft WORKS) used for each at bat was =IF((RAND()-\$B\$6)<0,1,0)

where B6 was set to .3

Yes, I know that rigorous statistical analyses are also possible. But they (in general) don't show what might actually happen. Much like an analysis of bridge hands is useful -- but an actual deal will give a player more insight, even though that particular deal is so rare that he will likely never see it again.

Post 5:

This is post #5 on baseball variability. It is the last one I will make on player batting averages considered alone.

I reran the simulator using players with a .35 chance of a hit in each at bat. Each of these HOF-caliber players had 8,000 at bats. I also ran each player through 20 at bats in four world series. Here are the results:

Name Life avg WS#1 avg WS#2 avg WS#3 avg WS#4 avg
Abner .344 .400 .400 .300 .150
Baker .347 .350 .300 .200 .400
Champ .350 .150 .550 .350 .450
Dempsey .338 .300 .250 .400 .450
Epsley .344 .250 .400 .150 .400
Folger .358 .300 .300 .300 .300
Grimes .353 .300 .450 .300 .350
Hanes .350 .600 .350 .350 .500
Isley .343 .300 .350 .400 .350
Jenkins .353 .300 .250 .400 .250

This simulation is much less interesting. By the time 8,000 at bats are attained, the variability is down a great deal, the lifetime range above being only between Folger at .358 and Dempsey at .338. And all of these "greats" generally excelled in World Series play.

Still -- would we not regard a .358 lifetime hitter as significantly better than one who hit .338?

In the next post I'll look at team variability.

November 15, 2000

Post 6:

This is the 6th in a series of posts that discusses the inherent variability of baseball.

I've looked at batting average variability, and argued that chance can account for a wide range of results for any player, regardless of how good he is.

My protocol is as follows: I have created a league of eight teams. each of these teams has the inherent capability, in terms of averages, ERAs, etc. of the 1948 Cleveland Indians. That team went 97 and 59. How would a season look if all eight teams were exactly the same as the 1948 Indians?

I have to differentiate somehow between the teams, so I'll do so by color. I ran five seasons. Here are the results, which are somewhat surprising to me:

`Season #1Team    RecordPink    87 67Green   86 68Red     81 73Aqua    79 75Brown   78 76Blue    72 82Yellow  67 97White  66 88`

The managers of White and Yellow got fired. But their team was EXACTLY the same as that of the others.

`Season #2Team    RecordBrown   85 69Blue    85 69Aqua    80 74Green   77 77Pink    75 79Red     75 79Yellow  72 82White   67 87`

Interesting that Yellow and White again finished last. The manager of Brown was praised to the skies for bringing his team up from 5th to first place.

`Season #3Team    RecordYellow  85 69Green   81 73Pink    79 75Blue    79 75Brown   75 79Red     75 79White   73 81Aqua    69 85`

This year all the kudos went to Yellow.

`Season #4Team    RecordGreen   86 68Red     83 71Brown   78 76Yellow  78 76Pink    78 76Aqua    73 81Blue    72 82White   68 86`

Four years at or near the cellar and the owners of White are getting frustrated!

`Season #5Team    RecordWhite   83 71Brown   81 73Yellow  80 74Pink    80 74Red     76 78Aqua    75 79Green   72 82Blue    69 85`

No -- I didn't "make" white win at last. That's just the way it turned out. There was no difference at all between the eight teams.

So the next time the Cubs finish 14 games off of first, can I say it is just chance that did it? I think not -- but one can say that chance has a role to play.

November 16, 2000

Post 7:

As an aside, before going ahead, thanks to several who sent me private emails of appreciation for this series (nobody has said, yet, anything bad about them).

I replied privately to all of these but at least one reply was returned because of a full recipient mailbox. So if you sent me a comment and did not get a reply, check your mailbox limits.

The question is how the last experiment might easily be replicated. I have the code; it is a variant of a computer baseball exercise written about ten years ago and sold as shareware (I bought it). I will package a set of files which will allow the preceding experiment to be performed easily but the main application program is disabled. If anyone wants a copy -- email me privately; I'll send them, with instructions (simple) as a ZIP file.

One correspondent thought that there had been a SABR article about 10 years ago which argued much the same thing, concluding that by the end of a season a variation of plus or minus 9 games for any team was within the probable error (50% chance).  That sounds somewhat high to me -- does anyone remember the article in question?

Later...

November 17, 2000

Post 8:

I've looked (tests 1) at players, and I've looked (tests 2) at teams. For the last tests (3) exercise, I'll look at players within teams.

Tests #1 can be replicated by anyone who is interested using a PC spreadsheet.

Tests #2 can be replicated by anyone who is interested by asking me for the code to do it.

Tests #3 can be replicated by anyone who has the PC shareware program SIMBASE. This was written about 1989, and may not be available any longer. The author / address is:

Phillip Smith
109 Tripp Crescent
Nepean, Ontario

I think I paid \$15 for the program; it is an excellent simulator.

What I will do in this series of tests is to take the 1987 Indians and have them play against each other, first for a season of 154 games; then for a stretch of 600 games, approximating four seasons (the code limits are 600 games at a time).

I'll set up the same team of nine players for each game, and compare how they do vs one another. Here are the players I will use:

`1. Julio Franco         1. Julio Franco2. Brook Jacoby         2. Brook Jacoby3. Joe Carter           3. Joe Carter4. Mel Hall             4. Mel Hall5. Cory Snyder          5. Cory Snyder6. Carmelo Castillo     6. Carmelo Castillo7. Eddie Williams       7. Eddie Williams8. Junior Noboa         8. Junior Noboa9. Tommy Hinzo          9. Tommy Hinzo`

A set of fairly complete stats will be kept.

Results in the next post.

November 18, 2000

Post 9:

Here are the actual batter statistics for 1987 against all pitchers in the league. I will have Tom Candiotti pitch the games, and as he was somewhat different than average that year, the results will have some differences based on pitcher characteristics as well as chance.

`                       +-------------------------+Cleveland Indians   AB  1B  2B  3B  HR   H  BB  SO  OO     BA     SAJulio Franco       495 123  24   3   8 158  60  56 281  0.319  0.428Brook Jacoby       540 100  26   4  32 162  78  73 305  0.300  0.540Joe Carter         588  94  27   2  32 155  36 105 328  0.263  0.479Mel Hall           485  96  21   1  18 136  21  68 281  0.280  0.439Cory Snyder        577  76  25   2  33 136  32 166 275  0.235  0.457Carmelo Castillo   220  27  17   0  11  55  16  52 113  0.250  0.477Eddie Williams     283  55  12   0  15  82  40  56 145  0.289  0.491Junior Noboa       511  89  36   5  19 149  40  41 321  0.291  0.493Tommy Hinzo        257  53   9   3   3  68  12  49 140  0.264  0.357`

I just played 154 games. The results:

` ---------------------------------------------------------------------------                     Visiting Team                 Home Team                    CLEVELAND INDIANS           CLEVELAND INDIANS                Runs    Hits   Errors       Runs    Hits   Errors  Total number   900    1461      161        854    1368      195  Average number 5.8     9.5      1.0        5.5     8.9      1.3  Std. deviation 3.3     3.2        1        2.9     3.1      1.2  Variance      10.9    10.3      0.9        8.5     9.8      1.5              Number      Percentage      Number      Percentage  Wins          74            48.1          80            51.9`

--------------------------------------------------------------------------
And the player stats for the year:

`                       +---------------------------+ Cleveland Indians AB  1B  2B  3B  HR   H  BB  SO  OO     BA     SA (Visitors) Julio Franco     649 140  31   1   9 181  93  63 405  0.278  0.371 Brook Jacoby     616 107  31   7  53 198 107  55 363  0.321  0.652 Joe Carter       664  84  20   2  40 146  47  86 432  0.219  0.436 Mel Hall         675 114  26   1  25 166  19  73 436  0.245  0.398 Cory Snyder      621  65  37   0  44 146  54 136 339  0.235  0.507 Carmelo Castillo 615  89  50   0  36 175  43 102 338  0.284  0.541 Eddie Williams   551  86  18   0  35 139  99  84 328  0.252  0.475 Junior Noboa     579  90  36  10  25 161  47  37 381  0.278  0.504 Tommy Hinzo      576 109  19  10  11 149  32  67 360  0.258  0.383                    `

`                        +---------------------------+ Cleveland Indians AB  1B  2B  3B  HR   H  BB  SO  OO     BA     SA (Home) Julio Franco     627 129  32   6   8 175  76  48 404  0.279  0.387 Brook Jacoby     589  97  31   5  31 164  97  67 358  0.278  0.505 Joe Carter       626  83  24   0  48 155  47  85 386  0.247  0.515 Mel Hall         628 119  23   2  22 166  33  63 399  0.264  0.412 Cory Snyder      594  76  22   2  43 143  50 145 306  0.240  0.501 Carmelo Castillo 577  59  39   0  22 120  56 117 340  0.207  0.389 Eddie Williams   527 103  15   0  31 149  85  86 292  0.282  0.487 Junior Noboa     543 100  36   3  27 166  49  42 335  0.305  0.532 Tommy Hinzo      541  99  15  10   6 130  34  74 337  0.240  0.338`

`Franco hit .278 and .279 -- pretty close.   difference  .001But Jacoby hit .321 and .278.               difference -.043Carter hit .219 and .247                    difference  .028Hall hit .245 and .264                      difference  .019Snyder hit .235 and .240                    difference  .005Castillo hit .284 and .207!                 difference -.077Williams hit .252 and .282                  difference  .030Noboa hit .278 and .305                     difference  .027Hinzo hit .258 and .240                     difference -.018`

Since I don't know the innards of the SIMBASE program, I don't know if there is a home team / visiting team bias built in. There might be. But that bias is not likely to explain the differences shown above. Interested people can easily compare the other statistics. Castillo's stats alone sort of boggle the mind. One guy we'd be giving a bonus to -- the other is likely out of a job. Yet both are the same player, with the same capabilities, playing on the same team.

I wanted to look at possible home team bias, so I ran two tests of 600 games each, the equivalent of about four seasons each.

In test 1, the home team won, 301 to 299. The widest variance I found in the batters was Williams, who batted .292 as a member of the visiting team and .264 as a member of the home team. All the other variances were, however, in single digits.

In test 2, the visitors prevailed, 310 to 290. Batting variances were in a range of 1 to 16 points, most in double digits.

This seemed to indicate no home team bias, but not being sure, I ran 20 more series of 600 games. Here are the results (including the tests above:

`Test    Home    Visitors        Avg                        1       301     299     50.2%2       290     310     48.3%3       292     308     48.7%4       301     299     50.2%5       306     294     51.0%6       336     264     56.0%7       318     282     53.0%8       313     287     52.2%9       302     298     50.3%10      330     270     55.0%11      327     273     54.5%12      289     311     48.2%13      300     300     50.0%14      301     299     50.2%15      309     291     51.5%16      305     295     50.8%17      300     300     50.0%18      308     292     51.3%19      297     303     49.5%20      301     299     50.2%21      289     311     48.2%22      305     295     50.8%                        Totals  6720    6480    50.9%`

This suggests to me that the "no home team bias" assumption might be true, but can not be supported. However, since the generally accepted notion of home team advantage is pretty well understood to be larger than that measured (51%), it does appear that if this simulator has one, it is lower than the accepted rates.

I'm going to quit this series here. I've made the argument that chance plays a large part in baseball -- and that its influence on the outcome of games as well as the resulting statistics is often overlooked by some of us, fans, SABRites, writers and broadcasters. That does not diminish, in my judgement, either the inherent worth nor the enjoyment of statistics. The thesis simply enjoins us to take them for what they are worth, imperfect measures of imperfect players made by imperfect people, some better than others, all talented far beyond the average person, who have given us over a hundred years of great enjoyment and will continue to do so for years to come. As a Christian, I fully expect to see many sports played in heaven. Baseball will prominent among them. What delights we shall still see. Ruth batting against Feller? What joy.

Post 10:

I could not resist running one more test. Back in 1960/61,  I built a computer simulator for the IBM 1620 computer, and, over the next year, fooled around with tests on it a great deal.

One of the issues I was concerned with in that day was the relative worth of a "super-slugger" in the lineup. One of the tests I made was to create two teams equal in every way overall, but with one having every player of equal capability and the other having eight players of lesser capability with a super-slugger batting fourth. The question I had was -- how much better would the second team be than the first?

That was 40 years ago -- I recall that I could run about 100 games on that old clunker in 10 to 15 seconds; I recall running it many times. Notes indicate that overall I saw the balanced team win more frequently than the unbalanced one -- which would argue against spending one's salary dollars accordingly. But I did not keep the results, and so those tests are rubbish history except for the question they pose.

Today I pulled out SIMBASE again and went into the team data section to see what mischief I could do. I created two teams, exactly equal, except one had nine players with:

• a .322 batting average (186 for 578),
• a .425 slugging average,
• 5 homers,
• 8 triples,
• and 29 doubles,

and the other had eight players with

• a .299 batting average (173 for 578),
• a .380 slugging average,
• 3 homers,
• 6 triples,
• and 26 doubles,

and one player, batting fourth, with

• a .501 batting average (290 for 578),
• a .785 slugging average,
• 21 homers,
• 24 triples,
• and 53 doubles.

The net of these two teams was that they were the same statistically, taken as a team.

Well -- I played 10 sets of 600 games between these teams. They wound up dead even; 3000 wins each.

Which shoots down my original thesis, but does still suggest that perhaps a super-slugger may not be worth the salary money he is paid if  lesser players have to be used to fill out the roster.

When an owner picks a super-player, of course, he also looks for the intangibles -- how much inspiration he might be to other players -- how many extra fans he will bring in, and so forth. I know that it was Feller on the Indians of the 30s to 50s that brought me out to the ballpark. Over that period of time I'd guess his pitching probably accounted for at least a couple dozen "extra" games for our family.

Another 2c worth.

John Burgeson (2001 has GOT to be the Tribe's year!)

SABR members, you can subscribe to the SABR-L listserv at SABR.org/about/sabr-l.

This page was last updated April 13, 2012 at 3:14 pm MST.