Simpson’s Paradox: Stats Often Can Deceive

November 11, 1984/in Articles.1984-BRJ13 /by admin

This article was written by Karl Hiester

This article was published in 1984 Baseball Research Journal

A PARADOX, ACCORDING to Webster, is “a statement that seems contradictory, unbelievable, or absurd but that may actually be true in fact.” This is an apt description for several statements that could be made about the notorious split season of 1981. It is especially fitting for the well-publicized and lamentable fact that neither of the two National League teams with the best season won-lost percentages made the post-season playoffs that year.

It will be recalled that neither the Cardinals (59-43 .578) nor the Reds (66-42 .611) were able to lead their divisions in either of the two halves of that season. The Cards were topped by the Phillies in the first half and by the Expos in the second half, while the Reds lost out to the Dodgers and Astros, respectively.

What we’re seeing here is a variant of a little-known but remarkable statistical wrinkle called Simpson’s paradox named thus for the mathematician who gave it a careful examination.

The pure and unvarnished version of this paradox is stranger yet. Consider the pitcher who states to a fellow moundsman: “Sure you had a better won-lost percentage than I did last year, and okay, you did again this year, but my won-lost record was better over the two-year span!”

Now, on the face of it, that claim appears to be contradictory and maybe even absurd, yet that is precisely the statement Tim Lollar of the San Diego Padres might have made to Oakland’s Steve McCatty at the conclusion of the 1983 season. Witness the numbers: In 1982 McCatty was 6-3 for a percentage of .667, and Lollar was 16-9 for .640. In 1983 McCatty’s record was 6-9 .400 and Lollar’s was 7-12 .368. McCatty had a better record each year, but note the overall records for 1982 and 1983 combined: McCatty 12-12 .500, Lollar 23-21 .523. So Lollar’s seemingly paradoxical statement is nevertheless a true one!

Before attempting to shed some light on this little mystery, let us look at some batting averages (which are, in the precise sense, percentages) and see that Simpson’s paradox shows up here too.

Thanks to the two-part breakdowns of season batting averages provided in Bill James’ invaluable Baseball Abstract, we have the following comparison of the performances of Andre Dawson and Lee Lacy against right-handed and left-handed pitching in 1983:

Dawson	AB	H	Ave.	Lacy	AB	H	Ave.
Vs. RHP	474	134	0.283	Vs. RHP	139	37	0.266
Vs. LHP	159	55	0.346	Vs. LHP	149	50	0.336
Overall	633	189	0.299	Overall	288	87	0.302

We see that, while Dawson outhit Lacy against each kind of pitching, it was Lacy who ended up with the higher season average.

American Leaguers are by no means immune to Mr. Simpson’s curious twist of arithmetic either. This time we will consider the 1983 batting records of Toronto’s Damaso Garcia and the Yankees’ Ken Griffey on grass and artificial turf, respectively:

Garcia	AB	H	Ave.	Griffey	AB	H	Ave.
Grass	183	44	0.240	Grass	399	118	0.296
Art. Turf	342	117	0.342	Art. Turf	59	22	0.373
Overall	525	161	0.307	Overall	458	140	0.306

Notice that the relatively large differences between the separate averages – Garcia was outhit by 56 points on grass and by 31 points on artificial turf- make Damaso’s higher average for the season seem even more unlikely.

Without being too mathematically burdensome, we will now try to see what lies behind this peculiar goings-on. The key thing to observe is that a hitter’s batting average for an entire season is a weighted average of his batting averages for two (or possibly more) parts of the season. The weights are proportional to the numbers of at-bats for the past seasons. Referring to the last example, we see that Griffey’s lower average on grass receives almost seven times the weight of his higher artificial turf average. On the other hand, Garcia’s higher average on turf gets nearly twice the weight that his lower average on grass receives. This proportional weighting is just sufficient to push Garcia’s overall season average slightly ahead of Griffey’s.

Finally, this hypothetical example may help to clarify the mechanism just described. We’ll christen our batters “Babe” and “Casey” and examine their averages over two consecutive imaginary seasons:

Babe	AB	H	Ave.	Casey	AB	H	Ave.
1st season	498	497	0.998	1st season	2	2	1.000
2nd season	2	0	0.000	2nd season	498	1	0.002
Overall	500	497	0.994	Overall	500	3	0.006

The extreme figures here are preposterous, of course, but they illustrate the effect of the weightings very clearly.

So, while Simpson’s paradox may be only a curiosity of passing interest, it does serve to remind us that sometimes things are not what they seem to be. Such reminders are especially meaningful to interpreters of data of whatever kind. If we are occasionally too rigid or complacent in our statistical dealings, we might see behind some page of figures the faint visage of Simpson gazing back at us from the mathematical twilight zone. It is easy to imagine him giving a sly wink to Messrs. Griffey, Dawson and McCatty, to the Cardinals and the Reds, and ultimately to all of us.

Search the Research Collection

SABR Analytics Conference

Simpson’s Paradox: Stats Often Can Deceive

Support SABR today!