2018 SABR Analytics Conference research presentations
SABR and Baseball Info Solutions are pleased to announce the research presentations for the seventh annual SABR Analytics Conference, which will be held March 9-11, 2018, at the Hyatt Regency Phoenix in Phoenix, Arizona. See below for full abstracts and presenter bios. For more information, visit SABR.org/analytics.
Click on a link below for audio highlights and PowerPoint slides, where available.
2:00-2:30 p.m., Friday, March 9
RP1: WAR: We'll Try to Resist all Puns and Still Tell You What It Is Good For
Sean Forman and Hans Van Slooten
- Audio: Listen to Sean Forman's presentation at SABR Analytics (MP3; 32:44)
- Slides: Download Sean Forman's presentation slides from SABR Analytics (PDF)
Baseball-Reference.com added Sean Smith's (rallymonkey at BaseballThinkFactory.org) Wins Above Replacement in 2010 and over the past eight years has built on Sean's work, adding additional factors and calculations to the metric. Sean Forman and Hans Van Slooten will walk you through the main ideas of WAR, some implementation details, a discussion of the differences between the three primary WAR systems, and answer your questions.
Sean Forman launched Baseball Reference in the Spring of 2000 while avoiding his Ph.D. dissertation at the University of Iowa. He eventually completed his dissertation in Applied Mathematics and taught math and computer science for six years at Saint Joseph's University in Philadelphia. In the fall of 2007, Sports Reference LLC was formed, combining baseball, basketball, and football sites, and has now grown to ten employees and includes college sports, hockey, and soon world football. Sean was named a Henry Chadwick Award winner in 2011 and continues to serve as Sports Reference's President. Sports Reference has been named a top 50 site by Time Magazine, won a Sloan Conference Alpha Award in 2013, and is the current statistical partner for the National Baseball Hall of Fame.
Hans Van Slooten is entering his third year as the Manager of Baseball Operations at Sports Reference, after serving in the same role for Hockey-Reference.com beginning in 2013. Hans is a graduate of the University of Illinois at Urbana-Champaign in Computer Science and served as a software engineer for comScore and OLSON prior to joining Sports Reference. He is currently the membership chair of SABR's Halsey Hall Chapter in the Twin Cities.
9:45-10:45 a.m., Saturday, March 10
RP2 and RP3 will take place back-to-back in a single session.
RP2: Differentiating Performance vs. Potential in Player Development
- Audio: Listen to Vince Gennaro's presentation at SABR Analytics (MP3; 37:39)
- Slides: Download Vince Gennaro's presentation slides from SABR Analytics (.pptx)
Identifying and developing talented young players who will materialize into major leaguers is a challenge for every MLB team. It is also the foundation of an efficiently constructed, competitive roster. And for many low-revenue MLB teams, it's their only hope to contend for the elusive championship. Why do some prospects perform well at AA and AAA, but they fail at the major league level, while others who seem to be only average minor league performers, become bona fide major leaguers? By employing a tool used by many major corporations to assess management talent potential, major league organizations may be able to improve their ability to identify and develop prospects into future major leaguers.
Vince Gennaro is the President of SABR's Board of Directors, author of Diamond Dollars: The Economics of Winning in Baseball, and host of a weekly national radio show, Behind the Numbers: Baseball SABR Style on SiriusXM. He is also the Associate Dean and Clinical Associate Professor of NYU's Preston Robert Tisch Institute for Global Sport. He is a consultant to MLB teams and appears regularly on MLB Network. He is also the architect of the Diamond Dollars Case Competition series, which brings together students and MLB team and league executives and serves as a unique learning experience, as well as a networking opportunity for aspiring sports executives.
RP3: Defense in PARTs: Splitting Positioning, Airballs, Range, and Throwing
- Audio: Listen to Brian Reiff's presentation at SABR Analytics (MP3; 25:14)
- Slides: Download Brian Reiff's presentation slides from SABR Analytics (.pptx)
Introduction: With batted ball location and timer data for plays, one can estimate the defensive value of players by comparing their ratios of outs made per opportunity to the ratios of other players on similar balls in play. However, some have found such metrics limiting because they do not attempt to isolate factors such as starting positioning, especially with the dramatic increase in shift usage across baseball. Our research incorporates new data from Baseball Info Solutions such as fielder starting position coordinates and baserunner times to break down infield defense into separate components of positioning, range, and throwing.
Methods: Our evaluations are based on four calculations for each play.
A. The probability the play is made, calculated by comparing similar batted balls based on Baseball Info Solutions’ batted ball location and timer data.
B. The probability the play is made given fielders’ starting positioning relative to the batted ball location.
C. The probability the play is made at the point when the fielder touches the ball based on his proximity to the bag and time to throw
D. Whether the play resulted in an out or not.
Those calculations relied on a plus/minus mathematical approach where we bucketed plays based on the similarity of relevant factors (e.g. ball in play location and velocity).
With those four pieces, we calculated the runs defenders have saved with their positioning (B minus A), range (C minus B), and throws (D minus C).
Brian Reiff is a Research Associate at Baseball Info Solutions. He initially joined the R&D group as an intern during his senior year of school at Lehigh University and became a full-time member of the staff after graduating in May 2017.
2:30-3:30 p.m., Saturday, March 10
RP4 and RP5 will take place back-to-back in a single session.
RP4: Knowledge and Evolution in the Free Agent Market
- Audio: Listen to Matt Swartz's presentation at SABR Analytics (MP3; 25:02)
- Slides: Download Matt Swartz's presentation slides from SABR Analytics (.pptx)
Earlier in the sabermetric revolution, I published numerous articles exploring which groups of players provided the most value on the free agent market. However, unlike hard science where findings are usually unconditional, empirical economic evidence can change as market participants learn. The physics of batted ball contact do not change if the players learn about them, nor do the mathematics of run scoring. Yet when front offices learn that shortstops provide more production per dollar, they can bid up their prices in free agency, nullifying the future applicability of the finding itself.
In this presentation, I will demonstrate which of my prior findings still hold as teams have learned more, and which value differences have eroded as teams’ knowledge has increased. I will revisit whether teams still overpay for production from free agents signed away from other teams, or if they have redirected more resources towards re-signing their own players. I will also check whether the same positions on the diamond that provided the most value per dollar continue to remain bargains, or if teams have changed spending patterns by position. In addition, I will explore whether certain statistical profiles still correspond to better value as well, or if teams have started spending more on good base runners, contact hitters, and pitchers with strong peripherals.
Throughout the presentation, I will explain the economics behind the results and what economic theory tells us about their durability or lack thereof. This will give better insight into how the smartest teams will extract more wins from their free agent dollars going forward.
Dr. Matt Swartz’s research and writing have appeared at MLB Trade Rumors, The Hardball Times, FanGraphs, Baseball Prospectus, and SB Nation, and he does the arbitration salary projection model for MLB Trade Rumors. Matt consults for a major-league team with whom he connected at this very conference in 2013. He graduated from University of Pennsylvania in 2009 with a Ph.D. in Economics, and also from UPenn in 2003 with a B.A. in Mathematics and Economics. Matt is a native Philadelphian, and lives there now with his wife, Laura, and his 4-year-old daughter, Maya.
RP5: A Revised Look at Clutch Hitting
Rob Mains and Pete Palmer
- Audio: Listen to Rob Mains's presentation at SABR Analytics (MP3; 32:48)
- Slides: Download Rob Mains's presentation slides from SABR Analytics (.pptx)
In 1977, Dick Cramer wrote the seminal “Do Clutch Hitters Exist?” in SABR’s Baseball Research Journal, confirming earlier work by Pete Palmer that answered the rhetorical question in the negative. In 2004, Bill James wrote “Underestimating the Fog,” also in the BRJ, suggesting that prior analyses, including Cramer’s, are limited by the data with which we can do our analyses, and that there clutch hitters may, in fact, exist, though we may not know how to identify them. Many subsequent studies have attempted to reach a conclusion, and while most fall on Cramer’s side, the question remains open.
We performed a new analysis by creating a simulation of 18 equal players over an entire season and measured the difference between their player win averages (i.e., the situation-dependent difference in win probability before and after each appearance) and the run expectancy of each appearance (which is not situation-dependent). We simulated 1,000 seasons and calculated standard deviations of the difference between the two measures as our base.
We then analyzed every player with 500 or more plate appearances between 1946 and 2017 and calculated a z-score equal to the difference between the player’s situation-dependent and non-situation-dependent batting performance divided by the standard deviation calculated above. This enabled us to identify batters who performed better or worse in high-leverage situations compared to their overall norms.
We found that in each season, there were, in fact, some players who performed very well in the clutch and some who performed poorly. However, we feel this is more than countered by two other findings:
Performing better in pressure situations was not a replicable skill. Only one batter was in the top 5% in clutch hitting more than four times in his career, compared to players who consistently are among the best at other metrics.
The players who performed best in the clutch and those who performed worse were in many cases incongruous. Several players were among the top 5% and the bottom 5% at various points in their careers, highlighting the apparent randomness.
This is not to say that some batters do not perform better when the pressure’s on, but rather that those who do tend to be those who do best in all situations, clutch or not.
Rob Mains is an author at Baseball Prospectus, where his "Flu-Like Symptoms" column runs twice weekly. He is also a contributor to Banished to the Pen. He is a SABR member, a Retrosheet volunteer, and a former Wall Street equities analyst.
Pete Palmer is the co-author with John Thorn of The Hidden Game of Baseball and co-editor with Gary Gillette of the Barnes and Noble ESPN Baseball Encyclopedia (five editions). Pete worked as a consultant to Sports Information Center, the official statisticians for the American League from 1976 to 1987. Pete introduced on-base average as an official statistic for the American League in 1979 and invented on-base plus slugging (OPS), now universally used as a good measure of batting strength. He won the SABR Bob Davids Award in 1989 and was selected by SABR in 2010 for the inaugural class for the Henry Chadwick Award.
8:30-9:00 a.m., Sunday, March 11
RP6: Augmenting Statcast Hit Probability with Hit Direction and Sprint Speed
- Audio: Listen to Travis Petersen's presentation at SABR Analytics (MP3; 20:44)
- Slides: Download Travis Petersen's presentation slides from SABR Analytics (PDF)
Statcast hit probability measures the likelihood that a batted ball will become a hit by using exit velocity and launch angle. The metric attempts to strip away effects of defense and ballpark to determine the hit probability of a batted ball the moment it leaves the bat.
Utilizing machine learning and statistical techniques, hit direction and sprint speed have been added in order to improve the accuracy of the hit probability model. These additional variables allow us to better understand the potential for hits on infield ground balls and deep fly balls.
Travis Petersen is a Senior Data Scientist at MLB Advanced Media and an adjunct professor at Fordham University. He previously worked at IBM and has a master’s degree in Business Analytics from Fordham University.
10:15-11:15 a.m., Sunday, March 11
RP7 and RP8 will take place back-to-back in a single session.
RP7: Modeling Batter Characteristics Using Hit-Tracking Data
Glenn Healey and Shiyuan Zhao
- Audio: Listen to Glenn Healey's presentation at SABR Analytics (MP3; 27:36)
- Slides: Download Glenn Healey's presentation slides from SABR Analytics (.pptx)
The Statcast system measures several physical parameters of batted balls. We use multidimensional distributions defined over these parameters to develop a new approach for characterizing hitters which leads to a batter similarity measure that captures information about bat speed and swing plane. The new metric is applied to Statcast data from the 2016 and 2017 seasons to identify batter groups and unique batters. By applying the metric to individual players over time, we identify batters with swings that are the most stable and the least stable. Leaderboards and visualizations generated using non-metric multidimensional scaling are presented to illustrate the new approach.
The new metric can be exploited to advance a range of application areas. The measure allows the direct comparison of batter swing characteristics across various contexts including MLB, MiLB, amateur, and foreign leagues. The identification of similar batters increases the sample sizes that can be used to predict the outcome of batter/pitcher matchups and supports regression to more appropriate population means by forecasting systems. The measure can also be used to develop improved models for the aging characteristics associated with different swing types. The new approach also allows pitchers to optimize pitch selection strategy according to batter strengths and weaknesses recovered by applying machine learning techniques to groups of similar batters. In addition, the measure allows batters to be monitored over time to detect changes in swing mechanics or health.
Glenn Healey is a professor of electrical engineering and computer science at the University of California, Irvine where he is director of the computer vision laboratory. He received the B.S.E. degree in Computer Engineering from the University of Michigan and the M.S. degree in computer science, the M.S. degree in mathematics, and the Ph.D. degree in computer science from Stanford University. Dr. Healey's professional life is dedicated to combining physics, statistical signal processing, and machine learning methods for the development of algorithms that extract information from large sets of data.
Shiyuan Zhao is a Ph.D. student in Electrical Engineering and Computer Science at the University of California, Irvine. She received the B.S.E. degree in Aerospace Engineering from Beihang University and the M.S. degree in Electrical Engineering from the University of California, Irvine.
RP8: Has the Shift Seen Its Day?
- Audio: Listen to Mark Simon's presentation at SABR Analytics (MP3; 18:32)
- Slides: Download Mark Simon's presentation slides from SABR Analytics (.ppt)
Despite work done in the past on the success of defensive shifting, the strategy continues to have its naysayers. Sports Info Solutions has chosen to revisit the topic and look at the 2017 season to see if the positive effects of shifting remain true or if things have changed. This presentation will show our research, which indicates that the shift is continuing to have a net positive effect on what it was intended to do — reduce hits on ground balls and short line drives. We’ll also update how much more valuable a full Ted Williams shift is over a partial shift.
Mark Simon is a Senior Research Analyst for Baseball Info Solutions, joining the company in February 2018 after working for nearly 16 years as a researcher at ESPN. He worked for nine years on ESPN's "Baseball Tonight" and five years helping run the @ESPNStatsInfo Twitter feed and blog. He also wrote regularly about baseball for ESPN.com, voiced features for ESPN Radio, and was co-host of the podcast “Baseball Today.” He is the author of The Yankees Index, published by Triumph Books in 2016.
12:45-1:45 p.m., Sunday, March 11
RP9 and RP10 will take place back-to-back in a single session.
RP9: Synthetic Statcast Data
- Audio: Listen to Alex Vigderman's presentation at SABR Analytics (MP3; 23:02)
- Slides: Download Alex Vigderman's presentation slides from SABR Analytics (.pptx)
Major League Baseball’s Statcast product has given teams and fans alike the ability to assess players’ skill sets in a way that was previously very difficult to accomplish. This is particularly true of batted ball data, as baseball fans have become increasingly comfortable with terms like “exit velocity” and “launch angle” through online access to the data and the use of those metrics on broadcasts. That data is not without its limitations, though, and so there are a lot of instances where a pitch is missing or a batted ball is either not tracked or tracked poorly. With that in mind, Baseball Info Solutions (BIS) created Synthetic Statcast Data, which uses BIS’s proprietary hit location and timer data to model the primary batted ball launch characteristics that are associated with Statcast (e.g., exit velocity and launch angle).
Our presentation will describe the issues with Statcast data that Synthetic Statcast resolves, the method by which it is created, and illustrate some examples of the data at work. In particular, this will highlight the use of Synthetic Statcast to create batted ball launch data for levels and seasons where such data is not publicly available, in particular the high minor leagues dating back to 2013 and the major leagues dating back to 2010.
Alex Vigderman is a Research Associate at Baseball Info Solutions, where he takes proprietary baseball and football charting data and bakes it into exciting analysis. He was previously an intern with the Boston Red Sox in their analytics department after graduating from the University of Pennsylvania with a degree in Psychology and working in the healthcare software industry.
RP10: Visual Tracking: An Indicator of Health and Performance
Dorion Liston, Raine Chen, Quinn Kennedy, Philip Okafor, Li Li, Maheen Adamson
- Audio: Listen to Dorion Liston's presentation at SABR Analytics (MP3; 35:15)
- Slides: Download Dorion Liston's presentation slides from SABR Analytics (.pptx)
Eye movements are a window to monitor brain health and performance. For decades, researchers in psychology, neurology, and physiology have compiled an extensive catalogue of oculomotor signs of brain insult, injury, and disease. Recent developments in oculomotor physiology have delivered standardized, deployable tests that yield quantitative metrics of dynamic vision along the continuum of health and performance, from disease states to normal to supranormal function.
Ted Williams’ classic text The Science of Hitting (1972) relates an exchange between Williams and Ty Cobb that will frame our discussion of dynamic visual performance. Cobb is quoted as saying, “Williams sees more of the ball than any man alive …”, suggesting exceptional innate dynamic visual skills. In response, Williams thoughtfully noted that his 20/10 static visual acuity was not unusual in professional baseball and therefore cannot account for a 0.400 lifetime batting average. While we cannot resolve this historical argument, physiological studies of component skills underling batting have shown substantial across-subject variability in aspects of dynamic vision (Bahill and LaRitz, 1984, American Scientist, 72: 249-253) distinct from static visual acuity, and we are left with related questions that modern baseball analytics can address: Do the gifted hitters of today show visual performance metrics that correlate with batting? Do their visual skills extend beyond static visual acuity? Are these visual skills inherent or have they been modified by deliberate practice?
We will describe methods and an analytical framework to address these questions, discussed within the context of visual tracking data from civilian, veteran, and active military populations with brain injury or disease (Parkinson’s disease, traumatic brain injury, hepatic encephalopathy), healthy controls, and professional baseball players.
Dorion Liston cofounded neuroFit, a startup company that develops products to make objective, eye-movement-based measurements of neural health and function, stemming from projects for the US Air Force, the US Navy, and NASA.
Raine Chen recently completed her Ph.D. in Cognitive Psychology at Hong Kong University and is Lecturer at United International College.
Quinn Kennedy is Senior Lecturer in the Operations Research Department at the Naval Postgraduate School.
Philip Okafor is Assistant Professor of Gastroenterology and Hepatology at the Stanford School of Medicine.
Li Li is Associate Professor of Neural Science and Psychology at the New York University campus in Shanghai.
Maheen Adamson is Associate Professor of Neurosurgery/Psychiatry, and Behavioral Sciences at the Stanford School of Medicine, and the Senior Scientific Research Director at the Defense & Veterans Brain Injury Center at VA Palo Alto.
For more coverage of the 2018 SABR Analytics Conference, visit SABR.org/analytics.
This page was last updated March 14, 2018 at 12:45 am MST.