56-Game Hitting Streaks Revisited
This article was written by Michael Freiman
This article was published in 2002 Baseball Research Journal
In an article in the 1994 Baseball Research Journal, Charles Blahous explained a system to determine the probability of various players in various seasons putting together a 56-game hitting streak.
I will describe some improvements to Mr. Blahous’s method, which I believe result in probabilities that are more accurate and, in almost all cases, lower than the probabilities he found. Also, I will answer what is probably the most interesting question: What is the probability that there would be some player, at some point in the history of major league baseball, who would have a 56-game hitting streak?
Mr. Blahous began by determining the probability of a given player — for example, Lave Cross — getting a hit in a given game. His method was reasonable, but I modified it so that, I hope, it will more accurately reflect the player’s chances.
During each game, Cross had a limited number of plate appearances in which to attempt to get a hit. For each plate appearance, the likelihood that Cross got a hit is just the ratio of his hits to his plate appearances for the season. The probability of Cross’s not getting a hit in a given game is one minus his probability of getting a hit in a given opportunity, to the power of his number of opportunities per game; the probability of getting a hit in the game is one minus the probability of not getting a hit.
Now we have to determine the number of plate appearances that Cross received in a given game. In 1901 for example, Cross had 450 plate appearances in 100 games played, which works out to 4.50 plate appearances per game. This presents something of a problem, as clearly Cross did not have any games during the 1901 season (or any season) in which he had exactly 4.50 plate appearances.
We solve this problem by assuming that Cross had at least four plate appearances in each game, adding a fifth one in however many games are necessary to make the average 4.50. In this case, we assume Cross had four plate appearances in 50 games, and five in the other 50.
So to figure out the probability of Cross’s having a hit in any one game in 1901, we consider each game to have a 50 percent chance of being a four-plate appearance game and a 50 percent chance of being a five-plate appearance game. Then the probability of Cross’s getting a hit in a game in 1901 is just the average of his probability of getting a hit in a four-plate appearance game and his probability of getting a hit in a five-plate appearance game. (In most cases, this does not work out as nicely as in the case of Cross. If a player had 4.77 plate appearances per game, we would have to take a weighted average of his probability of a hit in four-plate appearance games and in five-plate appearance games, with the five-plate appearance games having 77% of the weight.)
Having figured out a player’s probability of getting a hit in a given game, Mr. Blahous then determines the probability of the player’s having a 56-game hitting streak in a given 56-game span, which is just the prob-ability of a hit in a given game, taken to the 56th power. He then finds the probability of the player’s not having a 56-game hitting streak in each of the overlapping 56-game spans making up his season (a player who plays 155 games in a season may be considered to have 100 56-game spans: games 1 through 56, 2 through 57, etc., up to 100 through 155).
Mr. Blahous multiplies these probabilities together to find the probability of the player’s not having a 56-game streak during the entire season, and subtracts this probability from 1 to find the likelihood that the player would have a 56-game hitting streak at some point during the season.
In this last multiplication lies a subtle but major flaw in Mr. Blahous’s method. It is true that we can sometimes find the probability of multiple events all occurring (in this case a player failing to have a 56-game hitting streak in various 56-game spans) by multiplying together their probabilities, but this method works only when the events whose probabilities are being multiplied are unrelated to each other, or, in mathematical terms, when the events are independent.
It should be clear that since many of the 56- game spans Blahous examines overlap (for example, games 1 through 56 overlap with games 2 through 57), the probabilities of the player’s not having a 56-game hitting streak in these spans are not independent.
To take a more concrete example, suppose there is a player who plays a three-game season and has a 50% chance of getting a hit in any given game. There are eight equally likely possibilities of which games this player can get a hit in (for example, he could get a hit in all three games, no games, just the first game, just the second and third games, etc.). Of these, only three possibilities (hits in the first two games only, the last two games only, or all three games) result in his having a two-game hitting streak.
Clearly then, the probability of the player having a two-game hitting streak is 3/8, or 37½%. Yet using Mr. Blahous’s method, the probability is found to be 7/16, or 43¾%. The difference between 3/8 and 7/16 may not seem like much, but over the course of a full season, the correct probabilities and those arrived at by Mr. Blahous can differ by a factor of 8 or more.
The method used to correct this problem is somewhat more complicated than Mr. Blahous’s method, but it does not use any mathematics beyond basic algebra. First let us define p as the probability that a player (let’s use Joe DiMaggio this time) gets a hit in any given game; Also, let q equal the probability of DiMaggio’s having a 56-game hitting streak in any particular 56-game span. Then q equals p to the 56th power.
Let us denote by D(n) the probability that DiMaggio has a 56-game hitting streak at some point during the first n games of the season. Clearly D(0)=D(1)=D(2)= … = D(55)=0, because it is impossible for DiMaggio to have a 56-game hitting streak before he has played 56 games. Also, D(56)=q, since in order to have a 56-game hitting streak in the first 56 games, DiMaggio must get a hit in every game.
Now consider the first n games of the season, where n is a number greater than 56. In order for DiMaggio to have a 56-game hitting streak in the first n games, he must either have a 56-game hitting streak in the first n-1 games or have his first 56-game hitting streak in the last 56 games. (Here I am considering streaks of, say, 57 games as two overlapping 56-game streaks.)
The probability of a 56-game hitting streak in the first n-1 games is D(n-1). In order to have his first 56-game hitting streak in the last 56 games, DiMaggio must not have a 56-game hitting streak in the first n-57 games (the probability of which is 1-D(n-57)), then not get a hit in game number n-56 (the probability of which is 1-p), and then get a hit in each of games n-55 through n (the probability of which is q). Hence the probability that DiMaggio has a 56-game hitting streak in the first n games is
D(n) = D (n-1) + (1-D (n-57)) (1-p) q
If we want to find the probability of DiMaggio’s having a 56-game hitting streak during the whole season, we first find D(1), then D(2), and continue until we find D(g), where g is the total number of games DiMaggio plays during the entire season. This formula can be implemented without too much trouble on any spreadsheet. (For a given number of games, this formula also reduces to a polynomial in the variable p, which is easier to use than the recursive formula.)
Okay, now for the good stuff. Table 1 lists the 45 players who have had the best chance to have a 56-game hitting streak in a given season. (The columns list the player’s name, year, batting average, hits per plate appearance, probability of getting a hit in any given game, and probability of having a 56-game hitting streak at some point during the season.)
Table 1. Best Chances for a 56-Game Hit Streak in a Single Season
PLAYER | YEAR | AVG H/PA | HIT PROB | STREAK PROB |
---|---|---|---|---|
Hugh Duffy | 1894 | .440 | 90.8% | 3.28% (1 in 31) |
Ross Barnes | 1876 | .385 | 93.0% | 2.93% (1 in 34) |
Willie Keeler | 1897 | .424 | 90.2% | 2.50% (1 in 40) |
Tip O’Neill | 1887 | .435 | 89.7% | 1.84% (1 in 54) |
Jesse Burkett | 1896 | .410 | 89.4% | 1.69% (1 in 59) |
Nap Lajoie | 1901 | .426 | 89.2% | 1.53% (1 in 65) |
Fred Dunlap | 1884 | .412 | 89.9% | 1.42% (1 in 71) |
Sam Thompson | 1895 | .392 | 88.8% | 1.06% (1 in 94) |
George Sisler | 1922 | .420 | 88.3% | 1.04% (1 in 96) |
Sam Thompson | 1894 | .407 | 89.2% | 0.96% (1 in 104) |
Ty Cobb | 1911 | .420 | 87.8% | 0.84% (1 in 119) |
Ed Delahanty | 1894 | .407 | 88.3% | 0.74% (1 in 134) |
Jesse Burkett | 1895 | .409 | 87.8% | 0.71% (1 in 142) |
George Sisler | 1920 | .407 | 87.2% | 0.65% (1 in 154) |
Lave Cross | 1894 | .386 | 87.8% | 0.59% (1 in 170) |
Sam Thompson | 1893 | .370 | 87.4% | 0.54% (1 in 185) |
Al Simmons | 1925 | .387 | 86.9% | 0.52% (1 in 194) |
Tuck Turner | 1894 | .416 | 88.8% | 0.47% (1 in 211) |
Bill Terry | 1930 | .401 | 86.7% | 0.47% (1 in 211) |
Willie Keeler | 1894 | .371 | 87.1% | 0.47% (1 in 215) |
Willie Keeler | 1898 | .385 | 86.6% | 0.46% (1 in 216) |
Lefty O’Doul | 1929 | .398 | 87.0% | 0.44% (1 in 227) |
Billy Hamilton | 1894 | .404 | 86.9% | 0.43% (1 in 231) |
Ed Delahanty | 1899 | .401 | 86.7% | 0.41% (1 in 243) |
Willie Keeler | 1896 | .371 | 85.7% | 0.39% (1 in 255) |
Ed Delahanty | 1893 | .385 | 85.7% | 0.37% (1 in 269) |
Pete Browning | 1887 | .398 | 86.0% | 0.36% (1 in 278) |
Rogers Hornsby· | 1922 | .404 | 86.1% | 0.35% (1 in 286) |
Ed Delahanty | 1895 | .357 | 85.6% | 0.34% (1 in 294) |
Ty Cobb | 1912 | .386 | 85.6% | 0.33% (1 in 301) |
Paul Hines | 1879 | .408 | 85.8% | 0.32% (1 in 309) |
Chuck Klein | 1930 | .363 | 85.9% | 0.31% (1 in 320) |
Joe Jackson | 1911 | .377 | 85.8% | 0.30% (1 in 329) |
Steve Brodie | 1894 | .366 | 85.4% | 0.30% (1 in 333) |
Harry Heilmann | 1921 | .331 | 85.2% | 0.29% (1 in 343) |
Rogers Hornsby | 1924 | .394 | 85.6% | 0.28% (1 in 352) |
Hughie Jennings | 1896 | .408 | 85.0% | 0.27% (1 in 369) |
Billy Hamilton | 1895 | .389 | 86.3% | 0.27% (1 in 369) |
Ed Delahanty | 1896 | .397 | 85.0% | 0.26% (1 in 383) |
Jesse Burkett | 1901 | .345 | 86.3% | 0.25% (1 in 399) |
Babe Herman | 1930 | .424 | 85.0% | 0.25% (1 in 404) |
Sam Thompson | 1887 | .401 | 85.2% | 0.24% (1 in 417) |
Heinie Manush | 1928 | .389 | 85.6% | 0.23% (1 in 431) |
Dan Brouthers | 1883 | .320 | 85.0% | 0.23% (1 in 437) |
“Hit Prob” is the probability of a batter getting a hit in a given game.
“Streak Prob” is the probability of a 56-game hitting streak during a season.
Of these players, a majority played during the nineteenth century, and no player made the list in a season after 1930. DiMaggio did not come close to being on the list; his probability of having a 56-game hitting streak in 1941 was only .01% (1 in 9,545). In fact, 1941 was only DiMaggio’s fourth most likely season to put together such a streak, behind 1936, 1939 and 1937. Note that even Duffy, the leader, would have to play for 21 seasons at his 1894 level to have even a 50-50 chance of a 56-game hitting streak.
It is often stated that DiMaggio’s 56-game hitting streak is a record that will last forever. However, such statements are rarely accompanied by an explanation of any way the game has changed since 1941 that would preclude the possibility of such a streak. It is true that a 56-game streak is unlikely now, but this analysis shows that it was unlikely in 1941 also (indeed, if such a streak were to have happened at all, it “should” have been before 1941, when the players on the list below were playing).
In fact, there have been several instances in the 1990s alone in which a player has had a significantly better probability of having such a streak than DiMaggio had in 1941, including such less-than-legendary players as Lance Johnson in 1996.
So, while the long odds demonstrated by the calculations above show that it is unlikely that there will be a 56-game hitting streak in any given decade, or maybe even any given century, the fact that DiMaggio was able to have a 56-game streak illustrates the fact that such odds can be overcome, if only very rarely. In short, it may be very, very hard to break Joltin’ Joe’s record, but forever is a long time.
It is interesting to note that a player’s having a large number of walks works against his chances to have a long hitting streak, since as far as hitting streaks are concerned, a walk is a missed opportunity to get a hit. Thus Babe Ruth, who until 2001 held the single-season and career records for walks, failed to have even a 1-in-10,000 chance to have a 56-game hitting streak in any season, despite the fact that he batted over .370 six times.
Similarly, Ted Williams, in his fabled 1941 season, had a chance of only 1 in 41,058 to have such a long streak, less than one fourth of DiMaggio’s likelihood the same year, even though Williams’ batting average was 49 points higher than DiMaggio’s. The reason for this disparity is that Williams walked 147 times that year, thus “wasting” 147 of his opportunities to get a hit, as compared to 76 for DiMaggio.
Table 2. Players Most Likely to Have a 56-Game Hitting Streak During Their Careers
PLAYER | PROBABILITY |
Willie Keeler | 4.23% (1 in 24) |
Tip O’Neill | 3.52% (1 in 28) |
Hugh Duffy | 3.42% (1 in 29) |
Ross Barnes | 2.93% (1 in 34) |
Sam Thompson | 2.73% (1 in 37) |
Ed Delahanty | 2.20% (1 in 45) |
George Sisler | 1.93% (1 in 52) |
Nap Lajoie | 1.89% (1 in 53) |
Ty Cobb | 1.66% (1 in 60) |
Fred Dunlap | 1.43% (1 in 70) |
Jesse Burkett | 1.25% (1 in 80) |
Al Simmons | 0.91% (1 in 110) |
Rogers Hornsby | 0.89% (1 in 112) |
Billy Hamilton | 0.79% (1 in 127) |
Bill Terry | 0.64% (1 in 156) |
Pete Browning | 0.63% (1 in 159) |
Lave Cross | 0.61% (1 in 164) |
Lefty O’Doul | 0.53% (1 in 190) |
Tuck Turner | 0.48% (1 in 210) |
Cap Anson | 0.44% (1 in 226) |
Chuck Klein | 0.42% (1 in 241) |
Harry Heilmann | 0.40% (1 in 251) |
Joe Jackson | 0.39% (1 in 257) |
Paul Hines | 0.36% (1 in 277) |
Dan Brouthers | 0.36% (1 in 281) |
Paul Waner | 0.34% (1 in 295) |
Hughie Jem1h1gs | 0.34% (1 in 298) |
Heinie Manush | 0.30% (1 in 330) |
Steve Brodie | 0.27% (1 in 369) |
Dave Orr | 0.27% (1 in 374) |
Babe Herman | 0.26% (1 in 384) |
Lloyd Waner | 0.25% (1 in 408) |
Freddy Lindstrom | 0.24% (1 in 409) |
Jim O’Rourke | 0.22% (1 in 454) |
Tony Gwynn | 0.21% (1 in 478) |
Joe Medwick | 0.21% (1 in 484) |
Fred Clarke | 0.20% (1 in 493) |
Jack Tobin | 0.19% (1 in 530) |
Deacon White | 0.18% (1 in 545) |
Rod Carew | 0.18% (1 in 566) |
Bobby Lowe | 0.15% (1 in 651) |
Cal McVey | 0.15% (1 in 675) |
Tris Speaker | 0.15% (1 in 688) |
Stan Musial | 0.14% (1 in 715) |
Roger Connor | 0.14% (1 in 717) |
In a similar vein, in Table 2 are the 45 most likely players to have a 56-game hitting streak in their careers, making the assumptions that what happens in each season is independent of what happens in any other season and that streaks spread across multiple years don’t count.
Though this career list is not as skewed toward the earlier part of baseball history as the single-season list, there is still a dramatic paucity of recent players here. Among the top 45, only Stan Musial, Rod Carew, and Tony Gwynn played in any season after 1948. As before, Babe Ruth and Ted Williams are absent from the list (neither even made the top 100), and Joe DiMaggio just misses (1 in 826).
It is also worth noting that for the purpose of hitting streaks it is better to have one ridiculously good season than to be very good over a long period of time, an observation that is illustrated by Hugh Duffy’s being more than twice as likely as Ty Cobb to have a 56-game hitting streak at some point in his career, even though everyone would agree that Cobb was better than Duffy at getting hits. Without his monstrous 1894 season, Duffy’s probability of a 56-game streak would fall to .15%, or 1 in 660.
Another question you might ask is whether DiMaggio’s 56-game streak is the most unlikely hitting streak that anyone has put together in the history of the major leagues. It appears that it is (at least among streaks of 30 games or more). Other unlikely streaks include Tony Eusebio’s 24-game streak in 2000 (there was only a 1 in 7,436 chance that he would have a streak that long that year), Pete Rose’s famous 44-game hitting streak in 1978 (1 in 5,159), Rowland Office’s 29-game streak in 1976 (1 in 2,136) and Ken Landreaux’s 31-gamer in 1980 (1 in 1,918).
Though DiMaggio’s streak seems to be the most unlikely hitting streak of the usual variety (at least one hit in the most consecutive games), there are other sorts of offensive streaks that one may consider. The following four streaks were even less likely than DiMaggio’s.
Table 3. Probability of Some Other Streaks
PLAYER | YEAR | TYPE OF STREAK | PROBABILITY |
Earl Sheely | 1926 | extra-base hits in 7 consecutive at-bats | 39,703 |
Walt Dropo | 1952 | hits in 12 consecutive at-bats | 1 in 12,281 |
Dale Long | 1956 | home runs in 8 consecutive games | 1 in 12,048 |
Paul Waner | 1927 | extra-base hits in 14 consecutive games | 1 in 11,024 |
The probability listed in Table 3 is the likelihood that the player would have the streak listed in the given year. Amazingly, Earl Sheely’s chance of having 7 consecutive extra base hits in 1926 was smaller than Hugh Duffy’s chance of having a 115-game hitting streak in 1894!
It is easy to be blown away by the sheer improbability of some of these streaks. However, given the number of players who have played major league baseball, it is inevitable that some extreme long shots would materialize.
One shortcoming in the methodology used here is that it does not take into account the number of plate appearances each player had in each game of the season, but instead assumes that his number of plate appearances per game is almost constant (except for differences of one plate appearance between games to make the averages work out right).
The significance of this shortcoming is minimized by using plate appearances instead of at bats as the measure of the number of opportunities a player has to get a hit, since presumably the number of plate appearances a player has in each game varies less over the course of a season than his number of at bats does.
To fix this completely would require a list of how many plate appearances each player had in each game of the season, which is not readily available for all players. This modification would also make the computations involved considerably more complicated, and would make the resulting probabilities somewhat lower.
Finally, if one considers the probability of every major-league player in every season having a 56-game hitting streak, the overall probability of such a streak occurring at some point in the history of the major leagues from 1876 to 2002, using the method I have described, is around 39%. (If you consider National Association players, who are not included in the lists above, as major leaguers, the probability rises to 45%.)
So, while one would expect that there would probably not be a 56-game hitting streak in major league history, it is not a great surprise that someone, at some time, would put together such a streak. However, the probability of such a streak occurring in the past 72 years (1931-2002) is a mere 5%, as compared to 36% for the first 55 years of major league play. (The numbers 36% and 5% do not add up to 39% because of the possibility that there would be a 56-game streak in both periods.) Thus the most surprising part of the DiMaggio streak may not be that it happened at all, but that it happened so late in the history of baseball.
MICHAEL FREIMAN is an undergraduate math major at the University of Pennsylvania. He has been an enthusiastic baseball fan since 1993, when he discovered John Kruk, Mitch Williams and Total Baseball.
STATISTICAL SOURCES
The calculations for this article were performed with the data from Sean Lahman’s baseball database, which is available at www.baseball1.com. Some data on historical streaks were taken from Retrosheet (www.retrosheet.org) and from The Sporting News Complete Baseball Record Book. I would also like to thank Pete Palmer for providing data on sacrifice hits in 1894, which were necessary to compute the number of plate appearances for players in that year, and for clearing up some statistical issues.