2019 SABR Analytics Conference research presentations
SABR and Baseball Info Solutions are pleased to announce the research presentations for the eighth annual SABR Analytics Conference, presented by MLB and KinaTrax, which was held March 8-10, 2019, at the Hyatt Regency Phoenix in Phoenix, Arizona.
Click on a link below for audio highlights and PowerPoint slides, where available.
2:00-2:30 p.m., Friday, March 8
RP1: The Effectiveness of Strategic Outfield Positioning
Brian Reiff
- Audio: Listen to Brian Reiff’s presentation at SABR Analytics (MP3; 22:34)
- Slides: Download Brian Reiff’s presentation slides from SABR Analytics (.pptx)
About a decade ago, teams started consistently repositioning their infielders to counter batters with extreme groundball tendencies, often moving three or more infielders to one side of the field to prevent a potential base hit. A similar trend has taken off in the outfield, as teams are starting to move their outfielders more and more to counter a batter’s flyball tendencies. Using Statcast data, Baseball Info Solutions has analyzed these movements to get a better idea of how teams are employing this tactic. This presentation will address teams’ usage of strategic outfield positioning and how it has changed over the years for which data is available. It will also explore the strategy’s effectiveness at different magnitudes and against different groups of hitters.
Brian Reiff is a Research Associate at Baseball Info Solutions. He initially joined the R&D group as an intern during his senior year of school at Lehigh University and became a full-time member of the staff after graduating in May 2017.
4:00-5:00 p.m., Friday, March 8
RP2 and RP3 will take place back-to-back in a single session.
RP2: Introducing Pitch Score, Pitch Matrices, and Rapscore
Rohan Gupta
- Audio: Listen to Rohan Gupta’s presentation at SABR Analytics (MP3; 12:16)
- Slides: Download Rohan Gupta’s presentation slides from SABR Analytics (.pptx)
Currently, scouting is fairly subjective, based largely on the “eye test.” Teams grade players on certain tools according to the 20-80 scale, where 50 is major league average and 10 represents one standard deviation. In particular, pitchers are graded by each of their pitch types, as well as other overarching descriptive qualities. The method has several drawbacks. Primarily, the subjective nature of the art means that each scout will have his own grading standards. Even when data is used in grading, it is limited in scope. While velocity is a starting point, other components influence the effectiveness of certain pitches, such as command, movement and deception, so it makes sense to formalize these components.
Using 882,717 pitches thrown by Rapsodo users and 7,747,112 pitches thrown in MLB since 2008 (via Statcast), we are able to more holistically understand the variables that affect pitch quality. The pitch score metric will allow us to grade individual pitches quickly and objectively. It was created by modeling measures of velocity, movement and location to predict the likelihood of a swinging strike, since we wanted to best evaluate a pitcher’s “stuff” in terms of his ability to deceive and overpower hitters.
We also developed four supplementary statistics to visualize and quantify the interaction between a player’s pitch types in terms of location, movement, release and velocity, called the pitch matrices. Each tool can be used to impartially evaluate thousands of pitchers at every level — MLB, college, high school and youth — and creatively to aid in player development: Pitchers and coaches can receive instantaneous feedback, gauge progress and generate training plans.
Rohan Gupta is a senior at Washington University in St. Louis majoring in mathematics and economics and strategy. He and the Washington University team won the Diamond Dollars Case Competition at the 2018 SABR Analytics Conference. Currently a sports analytics intern with Rapsodo, he will join the New York Yankees as a baseball operations associate in June 2019.
RP3: Wall Balls: Incorporating the Outfield Wall into Defensive Runs Saved
Andrew Kyne
- Audio: Listen to Andrew Kyne’s presentation at SABR Analytics (MP3; 20:48)
- Slides: Download Andrew Kyne’s presentation slides from SABR Analytics (.pptx)
Plays at the outfield wall are inherently difficult. The presence of a physical obstacle alters the approach fielders must take to get to the ball. They cannot always continue running at full speed and sometimes must leave their feet to make the play. Using batted ball data and wall distance measurements, Baseball Info Solutions (BIS) quantified this difficulty and found that plays within three feet of the wall are converted into outs much less frequently than similar flyballs that are further from the wall. Correspondingly, it was found that the expected out rates on those plays were being overestimated in the company’s Defensive Runs Saved (DRS) metrics. As a result, outfielders tended to be penalized too much for not making plays at the wall and not be rewarded enough when they did. This presentation will discuss that research and how to incorporate the wall into outfield defense evaluation.
Andrew Kyne is a Research Associate at Baseball Info Solutions. He is a 2018 graduate of Duquesne University, where he studied Information Systems and Economics. While in college, he interned in R&D with BIS and in Baseball Informatics with the Pittsburgh Pirates
9:45-10:45 a.m., Saturday, March 9
RP4 and RP5 will take place back-to-back in a single session.
RP4: Pitching: The Power of Pitch Sequencing vs. Stuff
Vince Gennaro
- Audio: Listen to Vince Gennaro’s presentation at SABR Analytics (MP3; 26:08)
- Slides: Download Vince Gennaro’s presentation slides from SABR Analytics (.pptx)
Analysts often attribute pitch outcomes (e.g., quality of contact, swinging strikes, called strikes, etc.) to the characteristics of a given pitch. Others will argue that pitch outcomes are highly dependent on context, such as a sequence of pitches leading up to an outcome, represented by their characteristics and differences from one another. This analysis attempts to assess and isolate the impact of the current pitch and the previous pitch, on the outcome of a pitch. The analysis also attempts to identify MLB pitchers who are “sequencers” — where outcomes are highly dependent on the current and previous pitch.
Vince Gennaro is the President of SABR’s Board of Directors, author of Diamond Dollars: The Economics of Winning in Baseball, and host of a weekly national radio show, Behind the Numbers: Baseball SABR Style on SiriusXM. He is also the Associate Dean and Clinical Associate Professor of NYU’s Preston Robert Tisch Institute for Global Sport. He is a consultant to MLB teams and appears regularly on MLB Network. He is also the architect of the Diamond Dollars Case Competition series, which brings together students and MLB team and league executives and serves as a unique learning experience, as well as a networking opportunity for aspiring sports executives.
RP5: Optimizing the Swing: A Physics-Based Approach
Alan Nathan
- Audio: Listen to Alan Nathan’s presentation at SABR Analytics (MP3; 29:04)
- Slides: Download Alan Nathan’s presentation slides from SABR Analytics (.pptx)
In the past decade, physics-based models for the ball-bat collision have progressed to the point that they can reliably predict exit velocity, launch angle, and spin axis — and perhaps less reliably, spin rate — of the batted ball if the properties of the swing, pitch, and impact are known, largely based on laboratory experiments done under controlled conditions. The swing properties include the orientation, attack angle, and speed of the bat; the pitch properties include the speed, spin, spin axis, and approach angle of the ball; and the impact parameters include the location on the surface of the bat where the collision occurs. In a series of articles from several years ago, several simplifying assumptions for the batter’s swing were used to investigate how a batter might optimize the swing parameters, particularly the attack angle, to obtain certain outcomes, such as high on-base probability or maximum fly ball distance.
The primary goal of this talk is to develop this topic further by relaxing the simplifying assumptions used in the earlier analysis. First, the latest laboratory experiments used to develop the collision model will be reviewed, mainly to give a flavor for how we know — and how well we know — what we know. Next, the results of new calculations will be discussed, in the context of showing how exit velocity, launch angle, direction, and spin axis are related to the swing and pitch parameters. Finally, the calculations will be used along with Statcast data to address the “reverse engineering” problem, addressing the question of whether batted ball data can be used to infer — or at least constrain — the parameters of the batter’s swing, with particular emphasis on the attack angle.
Alan Nathan is Professor Emeritus of Physics at the University of Illinois. After a long career doing experimental nuclear physics, he now spends his time doing research on the physics of baseball. On this topic he has written many articles, both for academic journals and online baseball publications; he has given numerous talks to a variety of different audiences; and he maintains an oft-visited website, baseball.physics.illinois.edu, that many people have found to be a useful resource. He is interviewed regularly by the media and has consulted for various organizations, such as MLB, USA Baseball, NCAA, and several MLB clubs.
2:30-3:30 p.m., Saturday, March 9
RP6 and RP7 will take place back-to-back in a single session.
RP6: A Theory of Expected Performance
Jonathan Judge
- Audio: Listen to Jonathan Judge’s presentation at SABR Analytics (MP3; 24:39)
- Slides: Download Jonathan Judge’s presentation slides from SABR Analytics (PDF)
First there was batting average, then on-base percentage. Dick Cramer and then Pete Palmer focused on On-Base-Plus-Slugging or OPS, and subsequent work by Tom Tango et al brought us weighted on base average or wOBA. We develop a unified theory to demonstrate how a baseball statistic’s performance can be evaluated, which demonstrates the improvement in accuracy brought by each generation of baseball statistics.
We isolate the performance of a baseball statistic into three Contribution Measures: Descriptiveness (correlation to same-season run-scoring), Reliability (correlation to next year’s ratings of the same players), and Predictiveness (correlation to next season’s run-scoring). We will show that each of baseball’s best known statistics have different levels of performance in these measures, that the best of them manage to exceed previous efforts in all three categories, and that park-adjusted statistics follow a similar hierarchy.
We conclude with an introduction to DRC+, a new statistic from Baseball Prospectus that maximizes all three Contribution Measures better than any other park-adjusted statistic. DRC+ aims to estimate a player’s expected contribution, and its performance in the Contribution Measures notably exceeds most competing statistics. The performance of DRC+ raises new and interesting questions about what additional areas of study can help us better understand hitter contributions in baseball.
Jonathan Judge has a degree in piano performance from the Lawrence University Conservatory of Music and a law degree from the University of Wisconsin. He is a trial lawyer specializing in the defense and regulation of consumer products. He is a senior member of the Stats Team at Baseball Prospectus, and has been heavily involved in the rollout of mixed modeling to drive a new generation of baseball statistics. He believes that analytics can play an important role in driving better legal decisions.
RP7: Spin Signatures for Pitcher Evaluation and Development
Glenn Healey
- Audio: Listen to Glenn Healey’s presentation at SABR Analytics (MP3; 28:04)
- Slides: Download Glenn Healey’s presentation slides from SABR Analytics (.pptx)
The Trackman radar allows recovery of three-dimensional pitch and batted ball trajectories which have been used by machine learning techniques to quantify the run value of a pitch as a function of variables that include velocity, location, and movement. This has led to the definition of pitcher statistics that are independent of outcomes and contextual variables such as the defense, ballpark, umpire, and catcher and that can be used to directly compare pitchers across environments. Beyond its use for pitcher evaluation, this approach provides a quantitative framework to guide pitcher development and pitch design. An important component of this framework is the ability to relate pitch trajectory parameters to characteristics of a pitcher’s delivery. Pitch movement, for example, is a complicated function of the forces on a baseball when it leaves the pitcher’s hand and the atmospheric conditions. Since the Trackman system measures a pitch’s total spin along with trajectory information, we can separate the factors that contribute to movement.
We leverage a computational process proposed by Alan Nathan in combination with fine-grained weather data to analyze the movement of every MLB pitch thrown over a full season. This effort has led to an improved physical model for the relationship between the lift coefficient and the scaled spin parameter (Bauer units). This relationship determines pitch movement and is exploited by a robust process to recover the useful spin, spin efficiency, and spin axis for each pitch type thrown by each MLB pitcher. These recovered parameters are used to define the spin diagram for an individual pitch type in terms of the direction of the Magnus force and the spin efficiency. The information in spin diagrams is combined to define the spin signature for a pitcher’s collection of pitches. In addition to their utility for pitcher evaluation and development, spin signatures can be used by projection systems to model the aging characteristics associated with different pitcher types. The new approach can also be used to assist with sensor calibration and pitch classification and to analyze the effects of altitude and weather on pitch characteristics.
Glenn Healey is a professor of electrical engineering and computer science at the University of California, Irvine where he is director of the computer vision laboratory. He received the B.S.E. degree in Computer Engineering from the University of Michigan and the M.S. degree in computer science, the M.S. degree in mathematics, and the Ph.D. degree in computer science from Stanford University. Dr. Healey’s professional life is dedicated to combining physics, statistical signal processing, and machine learning methods for the development of algorithms that extract information from large sets of data.
8:30-9:30 a.m., Sunday, March 10
RP8 and RP9 will take place back-to-back in a single session.
RP8: The Relationship Between Release Parameters and Pitch Location in Baseball Pitching
Ayane Kusafuka
- Audio: Listen to Ayane Kusafuka’s presentation at SABR Analytics (MP3; 20:05)
- Slides: Download Ayane Kusafuka’s presentation slides from SABR Analytics (.pptx)
In pitching, the skill to control a ball to a target position accurately is one of the most important skills. Both mechanical and neural mechanisms relating to the pitch accuracy are not entirely clear, although many studies have attempted to understand it from different points of view. The objective of this study is to understand the mechanism of pitching accuracy; in particular, the influence of the mechanical parameters at ball release, which are called the release parameters in this study, on the pitch location.
With recent advances in science and technology, measurement equipment has made remarkable progress and made it possible to measure various parameters and the trajectory of the ball with high accuracy on real time. In this study, which parameter is important for pitch location is investigated by measuring parameters by TrackMan and developing a simulation that predicts the pitch location from these measured parameters. Comparing the variation of the pitch location caused by changing various release parameters on the simulation, and verifying using multiple regression analysis, it was found that the parameters affecting the vertical pitch location were the elevation pitching angle and velocity, i.e., the velocity vector of the ball, and the parameters estimating the horizontal pitch location were identified as the azimuth pitching angle and horizontal release point. Further, it is possible to say that the vertical release point is not the factor that determines the pitch location directly, but one of approaches to adjust the velocity vector. Moreover, a regression model using only the elevation pitching angle and velocity was prepared in the vertical pitch location, and it showed similar results for every pitcher. This indicates that elevation pitching angle and velocity are common factors in determining the pitch location, and the other parameters, such as the release point, are less critical and different with each pitcher. This study revealed the influence of each release parameter on the pitch location by combining the measured data, the computer simulation, and the statistical analysis, and is expected to contribute to understanding the neural mechanisms underlying accurate ball control skills. It may lead to the establishment of appropriate training and teaching methods.
Ayane Kusafuka is a master course student in Department of Life Science, Graduate School of Arts and Sciences at the University of Tokyo. She received the bachelor’s degree in Engineering from Waseda University, Tokyo. Her research theme is to understand the mechanism underlying pitching accuracy from the viewpoint of biomechanics and neuroscience.
RP9: Modern Roster Construction, Payroll Considerations, and the Next Collective Bargaining Agreement
Rob Mains
- Audio: Listen to Rob Mains’s presentation at SABR Analytics (MP3; 34:52)
- Slides: Download Rob Mains’s presentation slides from SABR Analytics (.pptx)
- Read more: Read Rob Mains’s article based on his SABR Analytics presentation (Baseball Prospectus)
Many analysts have noted that baseball is benefiting from the contributions of young players who are promoted rapidly through the minor leagues and given full-time jobs. The influx of young talent, while exciting, has changed the calculus of team payrolls. Players with less than three years’ service time have no bargaining leverage with their employer, other than a floor of a major-league minimum salary. Players with at least three but fewer than six years of service time can file for binding salary arbitration. Once a player has six years’ service time he is eligible for free agency once his contract expires.
Young players lacking arbitration rights will receive the major league minimum salary of $555,000 this season unless they can negotiate a higher figure. Some observers have expressed concern that a large pool of young talent is driving down club payrolls at a time in which baseball is generating record revenues, healthy operating margins, and unprecedented franchise values.
This study examines service time data from 2008 to 2018 to identify trends in player seniority. The key driver from a payroll perspective is not youth, but service time. Max Muncy was a breakout star for the Dodgers last year, but as a minor-league free agent signed by the club and called up last April, he made only about seven percent as much as teammate Yasiel Puig, who is about three months his junior, because of differences in service time. I examined service time averages for each season as well as the percentage of players falling into various service time cohorts. I also considered the differences between hitters and pitchers to determine the impact of the gradual shift in rosters away from position players and toward pitchers.
The results are useful not only to help understand the financial implications of service time trends but also to identify what are likely to be key points of contention when the current collective bargaining agreement expires on December 1, 2021.
Rob Mains is a writer for Baseball Prospectus. His “Flu-Like Symptoms” column runs twice a week. He is a former equities analyst and was a finalist for the 2018 SABR Analytics Conference Research Award for Historical Analysis/Commentary.
10:45-11:45 a.m., Sunday, March 10
RP10 and RP11 will take place back-to-back in a single session.
RP10: Optimal Fielder Positioning Model
Clinton Hausman, Michael Shames, and Bradley Waddell
- Audio: Listen to Clinton, Michael, and Bradley’s presentation at SABR Analytics (MP3; 21:30)
- Slides: Download Clinton, Michael, and Bradley’s presentation slides from SABR Analytics (PDF)
There is substantial research regarding shifting and offensive and defensive strategies around the shift. There does not, however, appear to be much (if any) public research concerning where to position fielders from a defensive management standpoint. The goal of this research was to create a model which optimizes the positioning of a team’s fielders for the best defensive output based on a given hitter’s batted ball data.
This model was created with the hope of being able to build a tool which managers can use to align their fielders optimally based on the match-up with the specific batter the team is facing. To convert this defensive management decision problem into a mathematical model, we partitioned the field in 96 segments and modeled these segments as a graph, ultimately computing a total “contribution score” that a player positioned in a particular segment could have in all of the other segments. The contribution score consisted of fielder- dependent measures as well as fielder-independent measures to mitigate the endogeneity of the fielder positions present in the outcomes for batted ball data.
Fielder-independent measures included exit velocity and launch angle of each batted ball, batted ball density in each segment, compiled in one aggregate severity score for each segment, while fielder-dependent measures included distance between segments and the angle to the ball from each segment (representing the distance a fielder would have to run to track down a ball in another segment and the angle at which the fielder would have to run for that ball, respectively).
We decided to do this optimization process for three subjects of interest: Mike Trout, Brian Dozier, and Christian Yelich. Our results are the contribution-maximizing seven fielder coordinates which the model would prescribe the defense for each hitter. Such information would contribute greatly to the field of baseball analytics from the perspective of defensive management.
Additionally, and more related to the advancement of baseball analytics as a field, this research demonstrates the need for fielder position data to be tracked on a pitch-by-pitch basis so that future improvements can be made in this space, opening up opportunities for a new class of analyses for future modeling and research.
Clinton Hausman is a senior at Tufts University majoring in Biology and minoring in History. He discovered an interest in sabermetrics after taking Andy Andres’ class at Tufts this past fall. He will be attending graduate school for Biology in the fall.
Michael Shames is a junior at Tufts University majoring in Economics and minoring in Philosophy. At Tufts, he serves as president of Baseball Analysis at Tufts (BAT) and is also an editor for The Tufts Daily. Last fall, he captained the Tufts team to a first-place finish in SABR’s regional Diamond Dollars Case Competition at New York University. He became interested in sabermetrics when he realized he could not hit a fastball over 80 mph.
Bradley Waddell is a senior at Tufts University studying Applied Math and Economics. His interest in sabermetrics stems from both his time playing baseball through high school as well as his passion for analytics. He will be working for Deloitte Consulting after graduation.
RP11: Pitch Sequence Optimization in Major League Baseball
Kevin Antonevich
- Audio: Listen to Kevin Antonevich’s presentation at SABR Analytics (MP3; 32:45)
- Slides: Download Kevin Antonevich’s presentation slides from SABR Analytics (.pptx)
Pitch sequencing has historically been a difficult aspect of baseball to analyze, as the interdependence of pitcher-specific, batter-specific and context-based factors make properly assigning credit or blame for the outcome of a pitch or plate appearance challenging. Previous research has supported traditional beliefs that changing pitch speeds and heights on successive pitches leads to more favorable outcomes for pitchers. The introduction of tunneling metrics into public research in the past few years has shed further light on how pitchers are able to deceive hitters and maximize the performance of their pitches. While these analyses help us better understand pitch sequencing along specific dimensions, it’s difficult to estimate their aggregate impact and how they’ll apply to a specific pitcher. In an attempt to produce a holistic study of pitch sequencing and aggregate the individual components identified in previous research, I model the probability of a swinging strike based on the physical characteristics and locations of both a single pitch and the pitch preceding it. Using these models, I construct a program that sequentially generates the optimal sequence of pitch types and locations for a specific batter-pitcher matchup, maximizing expected swinging strike rate at each iteration. While analyzing one optimized sequence shows a potential approach for a pitcher in a specific matchup, generating many sequences while holding the batter constant, for example, could expose a specific batter’s strengths and weaknesses against particular combinations of pitch types and locations. It can help us answer questions such as, “After a first-pitch fastball to Mike Trout, is it better to throw an average curveball or an elite changeup low in the strike zone?”, or, “Assuming a certain level of pitch quality, what sequences does the program expect Aaron Judge to struggle against?”. I believe that such research has numerous applications and potential extensions in baseball to help us further understand this complex area of the sport.
Kevin Antonevich is a senior at William & Mary studying Applied Mathematics and Economics. He is passionate about leveraging quantitative and qualitative analytical techniques to better understand athlete performance in a variety of sports, specifically in baseball. Kevin has previously worked as a Quantitative Analyst Intern with the Philadelphia Phillies and will be joining the Baltimore Orioles’ front office as a Baseball Analytics Fellow after graduation this spring.
12:15-12:45 p.m., Sunday, March 10
RP12: DeepBall: Modeling Expectation and Uncertainty with Recurrent Neural Networks
Daniel Calzada
- Audio: Listen to Daniel Calzada’s presentation at SABR Analytics (MP3; 30:43)
- Slides: Download Daniel Calzada’s presentation slides from SABR Analytics (.pptx)
Making reliable player preseason predictions is an issue of utmost importance to both teams and fans wishing to infer a player’s underlying talent or predict future performance. This is a well-studied and notoriously difficult problem. To varying degrees, leading prediction systems rely on baseball experts to isolate relevant predictive variables and combine them in logical ways. In recent years, the data science community has advocated using large datasets to train complex models rather than relying upon often- biased domain knowledge. However, applying these complex and expressive models to baseball has proven difficult due to the inherent randomness in baseball as well as the lack of abundant, clean data.
In this work, I will discuss the DeepBall projection system, a recurrent neural network that, once trained primarily on Retrosheet data, achieves performance comparable to other state-of-the-art public player prediction systems for common offensive statistics. It achieves this with minimal human guidance and domain knowledge, also overcoming the issue of limited data. Furthermore, the model is naturally extendible to other prediction tasks. We can apply standard machine learning techniques to have DeepBall model the uncertainty in its own predictions, estimating a fully defined probability distribution over potential outcomes for each player. These distributions can be studied, compared, or for simulation purposes, sampled from. DeepBall is easily coerced into predicting multiple years in the future, useful for evaluating the long-term effects of a trade. The same neural network architecture is adaptable to predict other offensive statistics or even to pitcher predictions. We believe that DeepBall can benefit both teams and fans in many ways by modeling expectation and uncertainty.
Daniel Calzada is a recent graduate of the Computer Science program (MS ’18) at the University of Illinois Urbana-Champaign, where he concentrated in applied machine learning. During his time as a student, he studied making preseason batter predictions using deep learning. Daniel founded DeepBall Data (www.deepball.net) to host these predictions and has a vision to present baseball data in an intuitive, visual way for new baseball fans and an accessible way for baseball analysts. He now works full-time applying his skills to machine learning research and development in Albuquerque, New Mexico.
For more coverage of the 2019 SABR Analytics Conference, visit SABR.org/analytics.
Originally published: February 13, 2019. Last Updated: February 13, 2019.