# Albert: Was Joe DiMaggio streaky?

From SABR member Jim Albert at Baseball Prospectus on September 24, 2013:

Joe DiMaggio‘s 56-game hitting streak in the 1941 season is one of baseball’s most famous events. This record has fascinated many people, including many academics. Much of the discussion of the accomplishment centers on how often a streak of this magnitude should occur, and whether we should expect to see it equaled or surpassed in our lifetimes. But this article is not about the extreme nature of DiMaggio’s improbable feat. I’m not going to compute the probability of a batter going through a 56-game hitting streak during a season. Rather, I’m wondering whether DiMaggio’s pattern of hitting during the 1941 season, at the at-bat level, was unusually streaky.

Since the distinction between game-level streakiness and at-bat-level streakiness can be confusing, let me explain in terms of coin flipping. Suppose you flip a coin 500 times. I simulated this process on a computer and found some interesting streaky patterns. I had a streaky of eight consecutive heads, six occurrences of six consecutive heads, and so on. Should I be excited by these clusters? Of course not—I’m just observing some streaky patterns that are a by-product of chance. A coin is truly consistent in that the chance of heads is always 50 percent and outcomes of different flips are independent.

Let’s move to the hitting outcomes of a baseball player like Joe DiMaggio. We observe his sequence of individual hits and outs during the 1941 season, but unlike the coin, we don’t really know DiMaggio’s true hitting behavior. There are two possibilities. Maybe DiMaggio is truly consistent (like the coin) and the chance that he gets a hit does not change across the season. Or maybe DiMaggio is truly streaky—this would mean that during part of the season, he’s hot and hitting with a higher-than-average probability, and other times, he’s cold and hitting with a lower probability. By the way, baseball players and managers believe in hot and cold hitting, and managers make decisions about lineups on the basis of these beliefs about hot and cold.

Until recently, we had access only to DiMaggio’s game-to-game hitting data for the 1941 season. But this summer Retrosheet made available play-by-play data for some early seasons, including 1941. We now can look more carefully at DiMaggio’s sequence of hit/out data for each of his at-bats. We can use this 1941 data to see if DiMaggio’s pattern of hitting is any different from that of a truly consistent hitter with a constant chance of hitting throughout the season.