Bergstrom: Where the (ballpark) weather comes from

From SABR member Richard Bergstrom at The Hardball Times on June 6, 2017, with mention of SABR members Mark Pankin and David Kagan:

In February, I decided to build a Retrosheet database to help analyze some data on the Colorado Rockies. I was a bit surprised, at first, to see there was temperature, wind speed and wind direction data in the data set. I say “at first,” because I realized that often when I’ve seen a box score on the internet or in a newspaper, it usually includes those weather details in the same section as the official attendance. Since weather gets wacky in Colorado, especially in April and May, I thought that would be a neat topic.

A few weeks after building the database, I was at the SABR Analytics conference in Phoenix and found I had a problem. A conversation with Mitchel Lichtman, co-author of The Book: Playing the Percentages, turned toward the Rockies, then toward the ideas I had for weather research. He said I should check the data because they might be unreliable. He had heard the weather readings might not even be taken from the stadium itself.

So research into the validity of the data began. In the Retrosheet database, box score weather has been recorded in 10-30 percent of the games from 1950 to 1987. In 1988, there were attempts to report it more regularly, with individual years like 1992 reporting a temperature for each game. However, even as recently as 1994, 24.7 percent of major league games didn’t have weather data. In 1996, only six percent of the games didn’t have temperature, and since 1998, only a Toronto Blue Jays game on April 13, 1999 is missing a box score temperature.

Read the full article here:

Originally published: June 7, 2017. Last Updated: June 7, 2017.