Retrosheet announces Summer 2013 update
By David W. Smith
I am very pleased to announce that Retrosheet.org has made its annual summer release and upgrade to our website. Our last update was in November of 2012 and we have accomplished a lot in the intervening 8 months. It is been quite some time since I gave a detailed report, so I will take this opportunity to summarize our activities by topic and indicate the contributions of volunteers for each portion.
- Event file additions. We have added five more seasons and the second league for another. The new seasons are 1943, 1942, 1941, 1940 and 1938. The 1931 AL has also been added whereas we only had the 1931 NL before. As always, there are many games for these seasons that are not included. However, the new files do contain over 3000 games that are posted for the first time, which is something to be very proud of. The World War II years are sparse, as expected. For example, the event files for those years representing only 37% of the games that were played. On the other hand, we now have over 1000 games for 1931, or 81% of that season. The total number of regular season games now on our site is over 130,000. We also have all the All-Star games and all postseason games (except for the 19th century World Series games).
- The process for getting event files ready for release is arduous. Each player in each game must be checked against the official totals for that game. The large majority of this work for most of the newly released games was done by Clem Comly. Clem identified all the discrepancies between our files and the official data and then resolved them by looking at scoresheets and newspapers and making common sense decisions. This can be especially challenging for the proper assignments of earned runs and RBI where official scorers were often apparently not following the rules for these determinations. For the 1938 files alone, we have over 1000 discrepancies with the official records, about half of which were identified by Clem in recent months. Tom Ruane found the first half when the discrepancy files were created a few years ago. Clem also did the work to complete the 1931 AL along with the games from 1942. Without Clem's diligence and rigor, we would be nowhere near as far along on our backwards journey through time.
- Event file editing. David Vincent went through all of our released files from 2012 back to 1927 and checked for syntax consistency. This included establishing detailed formats for ejections and for umpire changes during games, along with other things such as inserting debut dates for players and umpires. As a result, our event files will parse much more smoothly and make many investigations easier to do and more reliable with the nicely systematic records.
- Box score files and discrepancy files for seasons without event files. Tom Ruane has managed this enormous effort (along with so much else on our site) for several years. He has help from several volunteers, notably Dave Lamoureaux and Tom Bradley among others. As a result, we now have box scores back to 1914, except for the 1914 Federal League, which means the entire career of Babe Ruth is covered! Discrepancy files are complete back to 1915, except for the 1915 Federal League. This work gets harder as we go further back, often because the official records have less information and in many cases are less reliable. We now have the box scores for all games in the last 99 seasons with this fall's release of the 2013 data taking us to 100. Many thanks to Tom and his collaborators and to Ted Turocy who was instrumental in the creation of the box score file format many years ago.
- Inputting. We continue to enter more games into our event file format on a steady basis. The two who are doing the most in this area at this time are Dick Cramer and Cliff Blau. Both of them spent a lot of time deciphering and entering play by play accounts from the 1930s and I hope that event files for one or more of those seasons will be included in our release this fall. In addition to Dick and Cliff, Clem Comly has entered many as part of his proofing I mentioned above and additional games have been completed by Jim Herdman (our Federal League expert), David Vincent, Pete Cottrell, Sheldon Miller, Bob Kapla and Doug Burks. I also enter games as needed to fill in specific targets, such as the completion of the 1941 Yankees, so we have now posted complete play by play details for Joe DiMaggio's 56 game hitting streak.
- Of course we depend mightily on the continued acquisition of new play by play accounts. They come from three main sources: a) scorebooks with many games from writers, announcers and some fans; b) newspaper accounts, especially for pre-World War II games; and c) fan scorecards. Luke Kraemer is an avid scorebook collector and has found many of these treasure troves in on-line auctions. He then generously shares copies with us. I have also copied about 3000 games from scorebooks in the collection of the National Baseball Hall of Fame Library, whose staff has very graciously allowed me to make several multi-day visits to make photographic copies. For newspaper accounts, Walter LeConte has worked steadily to get images from a number of newspapers, primarily St. Louis. Walt Wilson did a very thorough and systematic sweep through Chicago newspapers covering over 50 years. Herm Krabbenhoft finds play by play accounts during his many research projects and Mike Cantor in Detroit has done a remarkable job in finding play by play accounts on-line from some unexpected sources (Rochester, Portsmouth and Youngstown) along with more conventional suppliers such as the New York Post. Bill Ivimey is very efficient at collecting Pittsburgh Press accounts from the web and Jim Considine in Baltimore has spent a lot of time obtaining images of the Baltimore Star for the 1914 and 1915 Federal League Terrapins, which is a great help because the only library we have found with this paper requires that it is used at their site and Jim has gone there for us. Finally, scorecards from fans continue to be very valuable. There are a surprising number of these scorecards available on-line for auction and Joe Stillwell is extremely focused in finding them, often persuading the sellers to make separate scans of their items for us.
- There is also a lot of effort expended on deducing play by play accounts for games that we cannot find scorecards or newspaper accounts for. This work requires sophisticated detective work in piecing together information from multiple newspaper stories. The final products are remarkably consistent with the official records for basic batting and pitching data, but it is not an effort for the faint-hearted. Dick Cramer has dubbed it "baseball Sudoku" and the most prolific current practitioners are Richard Weston, Mark Pankin and John Gabcik although others have given their time in the past, including Dick Cramer. Sheldon Miller, Clem Comly and Cliff Blau also do deductions to complete the inputting of a game for which we have six or seven innings of play by play information from a newspaper.
- Another broad category of volunteer help is the conversion of the official daily data to digital format. We start with the official ledger images as pdf files and they must be entered line by line into spreadsheets. Dave Lamoureaux leads the way is this essential and painstaking work and several others have contributed as well: Ron Weaver, Walter LeConte, Ryan Jones, Jack Myers and Howard Johnson. Many thanks to this group for their very important efforts.
- Finally, I would like to thank the many individuals who contact us when they find problems with items on our site. Our webmaster, Mark Pankin, fields most of these and I am grateful to him for being the front man on them. Baseball Reference, which uses much of our data, also hears from fans who have found problems and they pass those along to us as well.
Thanks to everyone for your patience in reading this long discussion of Retrosheet activities and I hope you find the new items of interest in your own baseball research.
David W. Smith is the founder and president of Retrosheet.
This page was last updated July 29, 2013 at 12:55 pm MST.