Rosenheck: Spring training stats matter — yes, they do

From Dan Rosenheck at The Economist on March 4, 2015:

IT WAS still snowing in the north-eastern United States, but two American rituals of seasonal renewal got underway on March 3rd. The first is the start of games during the month-long spring training period in Florida and Arizona that precedes the baseball season. Invariably, the welcome return of the crack of the bat was accompanied by a torrent of tweets and blogs by statistical analysts of the sport, intended to preemptively disabuse gullible fans of the perilous notion that these contests could contain any useful information.

There are few areas where the consensus among quantitative baseball researchers is stronger than the truism that “spring training stats don’t matter”, because the players are simply shaking the rust off and getting back into shape rather than trying to win. “Spring training stats are meaningless,” wrote Joe Sheehan of Baseball Prospectus back in 2008. “It’s the single most important thing to keep in mind every March.” Dave Cameron of Fangraphs echoed this view in 2010, saying that “Spring training numbers just don’t mean a thing. At all. Anything…Ignore the numbers coming from the Cactus and Grapefruit Leagues.” The conventional wisdom has not budged since then. Indeed, many fans who lack the ability to actually play baseball compete instead to find the most egregious example of a major-leaguer who showed up to camp “in the best shape of his life”, crushed the ball during spring training, and inevitably returned to mediocrity once the games started to count.

There’s no doubt that spring-training matchups are a far cry from meaningful baseball games. Half of them take place in high-and-dry Arizona, where balls regularly fly out of stadiums; the rest are held near sea level in humid Florida, where the heavy air turns would-be home runs into harmless pop flies. Players show up in a wide range of conditions: some spend their offseasons fishing or sunbathing, while others compete in Latin American winter leagues and arrive in midseason form. Pitchers often use the games to test out a new type of a pitch; some players are learning new defensive positions. The quality of competition varies wildly, from green teenage prospects to established superstars. And spring training does not last long enough to generate robust samples of performance: a typical player will get just 50-100 plate appearances or batters faced, a small fraction of the 600 for hitters and 800 for pitchers that they will see over the course of the year.

Yet in spite of all these caveats, the claim that spring-training numbers are useless is wrong. Not a little bit wrong, not debatably wrong—demonstrably and conclusively wrong. To be sure, the figures are noisy. But they still contain a signal. At the MIT Sloan Sports Analytics Conference held in Boston on February 27th-28th, I presented a study (see slides) that explained how to extract the statistical golden nuggets buried in this troublesome dataset, and offered some lessons this example provides for the practice of quantitative sports research more broadly.

