Thanks to SABR member Pete Palmer, the following invaluable research files are being made available on SABR.org.
Frank Williams is a SABR member from Phoenix, who spent most of his life in Bridgeport, Connecticut until he retired. He is a great Boston fan of all sports and compiled many records. He received a SABR Salute in 1999. Frank authored an article in the premiere issue of The National Pastime in 1982 called “All the Record Books are Wrong.” This article outlined the many errors in the official pitching statistics of the American League from 1901-19. His research has been used by all of the major sources of MLB statistics, the Macmillan Baseball Encyclopedia, Total Baseball, the ESPN Baseball Encyclopedia, and Baseball-Reference.com.
The PDF files in the link above contain a listing by year for each AL team from 1901-19. Each pitcher has a record by date of every game pitched, whether the appearance was a start, complete game or relief, and the decision or save if any. Unfortunately the American League official statistics in this period were riddled with errors. Players would have their records copied on to the wrong player sheet, pitchers would be given the stats of their opponent, some games would have 2 wins or 2 losses, others would have none. There were probably twenty to thirty errors in wins and losses alone every year. Frank discovered two wins for Walter Johnson in 1910 that were marked as losses officially.
Alex Haas was a SABR member from Berkeley, California, who spent many years doing baseball research. He received a SABR Salute in 1980. He turned over 13,000 pages of his research to the Hall of Fame, which unfortunately has not yet been cataloged. Alex was a retired railroad worker who had a pass to ride anywhere in the country, and he made many trips to various libraries to look up stats. Among this research was a daily record of batters’ hit by pitch from 1909 until the leagues started compiling the data officially (1917 NL and 1920 NL). Pitching data had been kept by the leagues since 1902 in the NL and 1908 in the AL, but not batting. His primary source was the New York Times, although he did have to consult other papers in a few cases.
The PDF files in the link above contain a listing each year by team of all the batters’ hit by pitch, with the date and opposing pitcher.
This data was used by Eldon and Harlan Mills to compute Player Win Averages. The data does not describe the play, but it does show the batter and pitcher for each play and the results after the play (outs, base runners, runs scored on the play). See the data.txt file for the format. Player Win Averages calculated the difference in win probability for the batter and pitcher before after each at-bat. Although the book only came out after 1969 (the 1970 data was never published), others have carried on the work using the Mills’ principles up through the present day. Click here to read Pete Palmer’s article, “Player Win Averages (1946-2015),” in the Spring 2016 Baseball Research Journal.