Simpson’s Paradox: Stats Often Can Deceive

This article was written by Karl Hiester

This article was published in 1984 Baseball Research Journal


  A PARADOX, ACCORDING to Webster, is “a statement that seems contradictory, unbelievable, or absurd but that may actually be true in fact.” This is an apt description for several statements that could be made about the notorious split season of 1981. It is especially fitting for the well-publicized and lamentable fact that neither of the two National League teams with the best season won-lost percentages made the post-season playoffs that year.

  It will be recalled that neither the Cardinals (59-43 .578) nor the Reds (66-42 .611) were able to lead their divisions in either of the two halves of that season. The Cards were topped by the Phillies in the first half and by the Expos in the second half, while the Reds lost out to the Dodgers and Astros, respectively.

  What we’re seeing here is a variant of a little-known but remarkable statistical wrinkle called Simpson’s paradox named thus for the mathematician who gave it a careful examination.

  The pure and unvarnished version of this paradox is stranger yet. Consider the pitcher who states to a fellow moundsman: “Sure you had a better won-lost percentage than I did last year, and okay, you did again this year, but my won-lost record was better over the two-year span!”

   Now, on the face of it, that claim appears to be contradictory and maybe even absurd, yet that is precisely the statement Tim Lollar of the San Diego Padres might have made to Oakland’s Steve McCatty at the conclusion of the 1983 season. Witness the numbers: In 1982 McCatty was 6-3 for a percentage of .667, and Lollar was 16-9 for .640. In 1983 McCatty’s record was 6-9 .400 and Lollar’s was 7-12 .368. McCatty had a better record each year, but note the overall records for 1982 and 1983 combined: McCatty 12-12 .500, Lollar 23-21 .523. So Lollar’s seemingly paradoxical statement is nevertheless a true one!

  Before attempting to shed some light on this little mystery, let us look at some batting averages (which are, in the precise sense, percentages) and see that Simpson’s paradox shows up here too.

  Thanks to the two-part breakdowns of season batting averages provided in Bill James’ invaluable Baseball Abstract, we have the following comparison of the performances of Andre Dawson and Lee Lacy against right-handed and left-handed pitching in 1983:

Dawson

AB

H

Ave.

 

Lacy

AB

H

Ave.

 Vs. RHP

474

134

0.283

 

 Vs. RHP

139

37

0.266

 Vs. LHP

159

55

0.346

 

 Vs. LHP

149

50

0.336

 Overall

633

189

0.299

 

 Overall

288

87

0.302

   We see that, while Dawson outhit Lacy against each kind of pitching, it was Lacy who ended up with the higher season average.

   American Leaguers are by no means immune to Mr. Simpson’s curious twist of arithmetic either. This time we will consider the 1983 batting records of Toronto’s Damaso Garcia and the Yankees’ Ken Griffey on grass and artificial turf, respectively:

Garcia

AB

H

Ave.

 

Griffey

AB

H

Ave.

Grass

183

44

0.240

 

Grass

399

118

0.296

Art. Turf

342

117

0.342

 

Art. Turf

59

22

0.373

Overall

525

161

0.307

 

Overall

458

140

0.306

    Notice that the relatively large differences between the separate averages – Garcia was outhit by 56 points on grass and by 31 points on artificial turf- make Damaso’s higher average for the season seem even more unlikely.

 Without being too mathematically burdensome, we will now try to see what lies behind this peculiar goings-on. The key thing to observe is that a hitter’s batting average for an entire season is a weighted average of his batting averages for two (or possibly more) parts of the season. The weights are proportional to the numbers of at-bats for the past seasons. Referring to the last example, we see that Griffey’s lower average on grass receives almost seven times the weight of his higher artificial turf average. On the other hand, Garcia’s higher average on turf gets nearly twice the weight that his lower average on grass receives. This proportional weighting is just sufficient to push Garcia’s overall season average slightly ahead of Griffey’s.

  Finally, this hypothetical example may help to clarify the mechanism just described. We’ll christen our batters “Babe” and “Casey” and examine their averages over two consecutive imaginary seasons:

Babe

AB

H

Ave.

 

Casey

AB

H

Ave.

1st season

498

497

0.998

 

1st season

2

2

1.000

2nd season

2

0

0.000

 

2nd season

498

1

0.002

Overall

500

497

0.994

 

Overall

500

3

0.006

   The extreme figures here are preposterous, of course, but they illustrate the effect of the weightings very clearly.

   So, while Simpson’s paradox may be only a curiosity of passing interest, it does serve to remind us that sometimes things are not what they seem to be. Such reminders are especially meaningful to interpreters of data of whatever kind. If we are occasionally too rigid or complacent in our statistical dealings, we might see behind some page of figures the faint visage of Simpson gazing back at us from the mathematical twilight zone. It is easy to imagine him giving a sly wink to Messrs. Griffey, Dawson and McCatty, to the Cardinals and the Reds, and ultimately to all of us.