2022 SABR Virtual Analytics Conference: Research Presentations

Schedule Speakers Case Competition Research Awards Presentations Sponsors Digital Program Videos

2022 SABR Virtual Analytics Conference presentations

SABR and Sports Info Solutions are pleased to announce the list of research presentations for the SABR Virtual Analytics Conference on March 17-20, 2022.

Abstracts and presenter bios for each research presentation can be found below. Click on a link below to watch a video replay or download PowerPoint slides (where available.)

Thursday, March 17

7:00-7:30 p.m. EDT
RP1—How Do Drag and Spin Affect the Flight of a Baseball?
Alan Nathan

Video: Click here to watch a video replay of Alan’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Alan’s presentation (.pptx)

In a “Physics 101” world, atmospheric effects are neglected, the downward pull of gravity is the only force, and the distance a fly ball carries is determined entirely by the exit velocity and launch angle. In the real world, there are two additional forces that greatly affect fly ball distances. These are the drag and the Magnus forces, the latter due to the spin on the ball. The effect of drag is to reduce the speed of the ball without changing its direction, resulting in a considerable reduction in distance. The effect of Magnus is to change the direction of the ball without changing its speed. Its effect on distance is mixed, depending not only on the spin rate (colloquially, the rpm) but also on the spin axis. A ball hit with backspin experiences an upward force opposing gravity, allowing the ball to stay in the air longer and therefore travel farther than it would otherwise. The opposite is true for a ball hit with topspin. The primary effect of sidespin is to cause the ball to swerve horizontally, either hooking to the pull side or slicing to the opposite side. The present research uses a combination of Statcast data and physics analysis to examine how and why spin rate, spin axis, and drag affect fly ball distance. Two particularly interesting applications will be discussed: the dependence of fly ball distance on spray angle and the dependence of home run rates on drag. Prospects for further research will be discussed.

Alan Nathan (@pobguy) is Professor Emeritus of Physics at the University of Illinois with a specialty in the physics of baseball. His research has evolved from the study of collisions of subatomic particles to, among other things, the collision of a baseball with a bat. He has written many articles, both for academic journals and online baseball publications; he has given numerous lectures to a variety of audiences; and he maintains an oft-visited website, baseball.physics.illinois.edu, that many people have found to be a useful resource. He is interviewed regularly by the media and has consulted for various organizations, including MLB, NCAA, USA Baseball, and several MLB clubs, as well as various technology companies.

7:30-8:00 p.m. EDT
RP2—Hip Abductor Strength Asymmetry: Relationship to Upper Extremity Injury in Professional Baseball Players
Hillary Plummer

Video: Click here to watch a video replay of Hillary’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Hillary’s presentation (.pptx)

Hip strength is an important factor for control of the lumbo-pelvic-hip complex. Deficits in hip strength may impact throwing performance, and contribute to upper extremity injuries. The purpose of this study was to characterize the influence of hip abductor strength on the incidence of upper extremity injuries. Minor League baseball players (n=188, age=21.5 ± 2.2years; n=98 pitchers; n=90 position players) volunteered. Hip abduction isometric strength was assessed bilaterally with a handheld dynamometer in side lying, expressed as torque using leg length (Nm). Hip abduction strength asymmetry was represented by [(trail leg / lead leg) X 100]. Overuse or non-traumatic throwing arm injuries were prospectively tracked by the medical staff. Hip abduction asymmetry ranged from 0.05 to 57.5%. During the first 2 months of the season, 18 players (n=12 pitchers) sustained an upper extremity injury. In pitchers, for every 5% increase in hip abduction asymmetry, there was a 1.24 increased risk of sustaining a shoulder or elbow injury. No relationship between hip abduction strength and injury was observed for position players. Hip abduction asymmetry in pitchers was related to subsequent upper extremity injuries. The observed risk ratio indicates that hip abduction asymmetry may contribute a significant but small increased risk of injury. Hip abduction muscle deficits may impact pitching mechanics and increase arm stress. Addressing hip asymmetry deficits that exceed 5% may be beneficial in reducing upper extremity injury rates in pitchers.

Hillary Plummer is a research fellow with Oak Ridge Institute for Science and Education. She then obtained a Master’s degree in Athletic Training from the University of Arkansas and completed a PhD in Kinesiology at Auburn University. Her research examines readiness, resiliency, and recovery in order to maximize human performance in Army aviators.

Friday, March 18

6:30-7:00 p.m. EDT
RP3—The Third-Time-Through-The-Order Penalty Is Worse Than You Thought
Rob Mains

Video: Click here to watch a video replay of Rob’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Rob’s presentation (.pptx)

Increasingly, starting pitchers’ roles are limited by the number of times they face opposing lineups. In 2021, the average starting pitcher faced only 21 batters. A decade earlier, the average was 26. The reason for the decline is the understanding that pitchers perform worse each subsequent time they face their opponents’ batting order. In 2021, pitchers allowed a .705 OPS the first time through the order, .728 the second time, and .767 the third time, using data from Baseball-Reference.

This change has proved contentious, with many decrying the seemingly automatic removal of pitchers before or during their third time through the order. The complaints reached a crescendo in Game 6 of the 2020 World Series, when Tampa Bay’s Blake Snell was removed with a 1-0 lead, having retired 15 of the 18 batters he’d faced, after facing the Dodgers lineup twice. Los Angeles promptly scored what proved to be the winning runs. In Game 3 of the 2021 World Series, Atlanta’s Ian Anderson was similarly removed after facing the Houston lineup exactly twice, and while the Braves won the game, Anderson was pulled with a no-hitter in progress!

In this presentation, I will demonstrate that by adjusting for lineup positions faced and starters who do not face the opposition a third time, the third-time-through-the-order penalty is actually worse than the OPS figures listed above. Hitters perform markedly better facing a starting pitcher a third time, considerably more than publicly-available statistics would suggest.

I will then illustrate the history of the third-time-through-the-order penalty. I will trace the penalty through the Divisional Era (1969-present) to identify when it became more pronounced. Some complain that the penalty has resulted in starters trained to throw only five or six maximum-effort innings; I will identify how the penalty has informed this trend. I will suggest underlying causes and offsets.

Rob Mains is a writer for Baseball Prospectus. His “Veteran Presence” column runs twice a week. He is a former equities analyst and was a finalist for the 2018 SABR Analytics Conference Research Award for Historical Analysis/Commentary. He is a SABR Analytics Certification course reviewer.

8:00-8:30 p.m. EDT
RP4—Stops and Sends: 3B Coach Data and Analytics
Sarah Thompson

Video: Click here to watch a video replay of Sarah’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Sarah’s presentation (.pptx)

Up until now, the decision-making of MLB third base coaches and the subsequent outcomes of those decisions have been largely unknown to people that aren’t part of coaching staffs or major league rosters. Because signals and outcomes haven’t been diligently tracked, any judgment of a third base coach from an outsider’s perspective can only be anecdotal, and any analysis less than rigorous. Furthermore, if there are any current inefficiencies in the decision-making process, it’s nearly impossible to identify them.

To close the gap on this dearth of data, in 2021, Sports Info Solutions began collecting data on third base coach signals, recording every visibly clear stop and send from a third base coach. While one year of data is insufficient to draw conclusions, the availability of this data allows us to dig deep and identify signaling trends among various circumstances.

Sarah Thompson is a Research Associate at Sports Info Solutions. She previously worked in the healthcare software industry and received her B.S. in Mathematics from Ursinus College and her M.S. in Statistics from Villanova University.

8:30-9:00 p.m. EDT
RP5—Biomechanical Predictors of Pitch Efficiency in Professional and Collegiate Pitchers
Ryan L. Crotin, Jonathan Swolik, Gene Brewer, and Glenn Fleisig

Video: Click here to watch a video replay of Ryan’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Ryan’s presentation (.pptx)

Baseball pitchers are considered of highest value to teams when demonstrating elite performance while avoiding injury. Previous studies on pitching performance have investigated biomechanics related to fastball velocity [e.g., 1–3]. Other studies focused on injury risk – particularly UCL (“Tommy John”) injuries – have looked at pitching biomechanics related to elbow varus torque. [4–6]. However, no previous study has looked at pitch efficiency in professional pitchers – that is, maximizing the ratio of fastball velocity to elbow varus torque. The purpose of this work was to evaluate biomechanical qualities that influence pitch efficiency in elite level pitchers.

Biomechanical data from 545 healthy adult (i.e., professional or collegiate) pitchers tested by the American Sports Medicine Institute were used. For each pitcher, pitch efficiency was calculated as his fastball velocity divided by normalized elbow torque (varus torque divided by bodyweight and height). This new metric had been termed Normalized Pitch Efficiency (NPE). In addition, 21 kinematic variables were calculated for each pitcher. All variables were examined for collinearity and multiple linear regression was used to model the combination of biomechanical inputs that explained the highest amount of variance in NPE. Within the sample of 545 pitchers, upper and lower thirds for NPE formed high and low NPE groups that were compared through Mann-Whitney independent group t-tests. All statistical tests had an a priori = 0.05.

Level of competition (professional or collegiate) and 11 kinematic variables explained 27% of the variance in pitch efficiency. Six of the kinematic variables occurred at the instant of front foot contact (elbow flexion, shoulder abduction, horizontal abduction, shoulder external rotation, trunk separation, and pelvic rotation angle). The other kinematic variables were maximal external shoulder rotation (MER), trunk side tilt at ball release, shoulder horizontal abduction at MER, maximum trunk angular velocity and knee extension.

Comparisons between high and low NPE groups were significant for 8 kinematic variables (p<0.05). At foot contact, elbow flexion (High vs Low; 94±16º vs. 100±16º), shoulder abduction (86±11º vs. 91±10º), and the pelvic rotational angle (33±12º vs. 36±12º) were significantly lower in the high NPE group, while shoulder external rotation (60±28º vs. 54±6º) was significantly greater. MER was significantly greater in the high NPE group (166±11º vs. 160±11º), while shoulder horizontal adduction at MER (High; 19±7º vs. Low 21±7º), knee extension (6±12º vs. 10±11º), and trunk side tilt at ball release (High; 20±14º vs. Low 23±10º) were significantly lower compared to the low NPE. Professional pitchers showed greater NPE than collegiate pitchers (711 ±101 vs. 657±99).

This study showed that one thing distinguishing professional pitchers from collegiate pitchers is greater pitch efficiency. Pitching efficiency is also affected by the 11 kinematic variables identified in this study. Thus, pitchers and baseball organizations should focus on these factors to maximize performance and minimize risk of injury.

Ryan L. Crotin, Ph.D, is a Research Associate and Adjunct Thesis Advisor in the Department of Kinesiology at Louisiana Tech University. He is the Vice President of ArmCare.com and he previously worked as the Director of Performance Integration with the Los Angeles Angels from 2017 to 2021.

Saturday, March 19

1:00-1:30 p.m. EDT
RP6—Emerging Technologies in Biomechanics are Changing Baseball
Glenn Fleisig

Video: Click here to watch a video replay of Glenn’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Glenn’s presentation (.pptx)

Research studies have quantified the biomechanics of elite baseball pitchers. These studies have established the biomechanics for maximizing performance and minimizing risk of injury. Traditionally, baseball biomechanics have been limited to laboratory settings. But recent advances in technology have now made data capture on-field and in-game possible. In this presentation, Dr. Glenn Fleisig will explore these recent advances and critically evaluate their validity.

A November 2021 survey by the American Baseball Biomechanics Society showed that a most Major League Baseball organizations now use wearable sensors and marker-less motion capture. The most popular wearable sensor in baseball biomechanics is the Driveline Pulse (formerly known as Motus). The Pulse is a six-degree-of-freedom inertial measurement unit (“IMU”) which measures the biomechanics of the arm during throwing. A study by Camp CL et al. (Am J Sports Med 49:3094-3101, 2021) tested 10 high school pitchers and compared biomechanical measurements from Pulse IMU against “gold standard” marker-based motion capture system (Motion Analysis Corporation, Rohnert Park, CA). Measurements from the IMU and marker-based system were significantly different. The study concluded that the IMU was not accurate across subjects, but may be useful for monitoring measurements and changes within an individual baseball player.

Marker-less motion capture is an emerging technology in sports, particularly in baseball. Research is underway by Major League Baseball, individual MLB clubs, and research institutions to assess the validity of this technology. A study by the American Sports Medicine Institute compared pitching biomechanics from a marker-less motion capture system (Dari Motion, Overland Park, KS) against a “gold standard” marker-based motion capture system (Motion Analysis Corporation, Rohnert Park, CA). A total of 30 baseball pitchers, ranging from youth to professional levels, were tested. This study found high repeatability within each motion capture system, but significant differences between the two systems. It was concluded that marker-less motion capture appears viable for measuring pitching biomechanics, but caution is warranted in comparing directly against marker-based data.

In conclusion, the ability to analyze baseball biomechanics with today’s emerging technologies is promising. Biomechanical analysis can lead to improved assessment, coaching, and training of baseball players.

Dr. Glenn Fleisig is the Biomechanics Research Director of the American Sports Medicine Institute and the Founding President of the American Baseball Biomechanics Society. He is also an advisor to Major League Baseball, Little League Baseball, and USA Baseball. He earned his engineering degrees from MIT, Washington University, and UAB. Ranked by Expertscape as the top expert in the world on baseball science and medicine, Dr. Fleisig has published 200 scientific articles, delivered 350 presentations throughout the world and has been interviewed for thousands of stories by the media.

1:30-2:00 p.m. EDT
RP7—Machine Learning and Statistical Prediction of Fastball Velocity with Biomechanical Predictors
Kristen Nicholson

Video: Click here to watch a video replay of Kristen’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Kristen’s presentation (.pptx)

In recent years, one of the most important factors for success among baseball pitchers is fastball velocity. The purpose of this study was to (1) to develop statistical and machine learning models of fastball velocity, (2) to identify the strongest predictors of fastball velocity, and (3) to compare the models’ prediction performances. Three dimensional biomechanical analyses were performed on high school (n = 165) and college (n = 62) baseball pitchers. A total of 16 kinetic and kinematic predictors from the entire pitching sequence were included in regression and machine learning models. All models were internally validated through ten-fold cross-validation. Model performance was evaluated through root mean square error (RMSE) and calibration with 95% confidence intervals. Gradient boosting machines demonstrated the best prediction performance [RMSE: 0.34; Calibration: 1.00 (95% CI: 0.999, 1.001)], while regression demonstrated the greatest prediction error [RMSE: 2.49; Calibration: 1.00 (95% CI: 0.85, 1.15)]. Maximum elbow extension velocity (relative influence: 19.3%), maximum humeral rotation velocity (9.6%), maximum lead leg ground reaction force resultant (9.1%), trunk forward flexion at release (7.9%), time difference of maximum pelvis rotation velocity and maximum trunk rotation velocity (7.8%) demonstrated the greatest influence on pitch velocity. Gradient boosting machines demonstrated better calibration and reduced RMSE compared to regression. The influence of lead leg ground reaction force resultant and trunk and arm kinematics on pitch velocity demonstrates the interdependent relationship of the entire kinetic chain during the pitching motion. Coaches, players, and performance professionals should focus on the identified metrics when designing pitch velocity improvement programs.

Kristen Nicholson (kfnichol@wakehealth.edu) is an assistant professor of orthopedic surgery and director of the Biomechanics and Pitching Laboratory at Wake Forest School of Medicine. She earned her undergraduate in Mathematical Science from Clemson University and her M.S. and Ph.D in Biomechanics and Movement Science from the University of Delaware. Dr. Nicholson uses the pitching laboratory to enhance player development and conduct biomechanics research. Her research focus is defining pitching efficiency, reducing injury risk while promoting ball velocity.

2:00-2:30 p.m. EDT
RP8—Neuroscience at the Plate: Neural Activity during Computerized Task is Related with In-Game Hitting Performance
Jason Themanson

Video: Click here to watch a video replay of Jason’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Jason’s presentation (.pptx)

Research has shown relationships between hitters’ neural activity and their task performance in a laboratory-based computerized pitch perception task, providing valuable insights into hitters’ cognitive links to their hitting behavior. However, the existing research has not linked neural activity with actual in-game hitting performance. The current study examines the relationships between hitters’ neural activity in the computerized task and their in-game hitting performance. Collegiate baseball players completed a computerized video task assessing whether thrown pitches were balls or strikes. These pitches were viewed from the visual perspective of a hitter in the batter’s box and hitters were given umpire feedback on the accuracy of their choice following each pitch. Players’ neural activity to both the pitches and the umpire feedback was recorded throughout the task. Additionally, each players’ publicly available hitting statistics (BA, OBP, SLG, OPS) for the following baseball season and for their college careers were collected. Results showed that neural activity in response to the pitches during the computerized task was associated with all measures of in-game hitting performance, both for the subsequent college season and for players’ career hitting statistics. Significant relationships were present between neural activity and in-game hitting performance even after accounting for other variables related with in-game hitting statistics collected for the study. These findings indicate that players’ neural activity measured in a laboratory environment shows a translational relationship with in-game hitting statistics. Neural activity provides a more objective analysis of players’ ongoing cognitive processes during hitting and a better understanding of the cognitive self-regulatory and learning processes associated with hitting performance. Self-regulatory cognitive control is adaptable and trainable and this research advances the objective measurement of cognitive/psychological variables much like high-speed cameras, motion capture, and force plates have done for physical/physiological variables. These findings suggest that players’ real-time implementation of cognitive control processes obtained through direct measures of their neural activity are related to their on-field hitting performance and this level of measurement provides a data advantage to organizations, players, and player development personnel in relation to understanding cognitive processes associated with better hitting performance.

Jason Themanson is a Professor in the Department of Psychology and the Neuroscience Program at Illinois Wesleyan University. He received his B.S. in Psychology from the University of Illinois, his M.A. in Social Psychology from the University of Connecticut, and his Ph.D. in Kinesiology (with an emphasis in exercise psychology) from the University of Illinois. Dr. Themanson’s research utilizes both neural and behavioral measures to examine cognitive processes related to learning, decision-making, and control during task performance.

3:30-4:00 p.m. EDT
RP9—Rounding Second – A Probabilistic Investigation of the MLB Modified Extra innings Rule
David Hyland

Video: Click here to watch a video replay of David’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from David’s presentation (.pptx)

In 2020 MLB introduced a rule to end extra-inning games in a timelier fashion due to the pandemic environment. In this paper we use run scoring probabilities with the modified rules to analyze the probabilities of the number of innings that would be played out under the regular and modified rules scenarios. We find that approximately 53% more games using modified rules would end in 10 innings instead of going longer under the current rules.

The purpose of the paper is to use the run scoring probabilities from past seasons to investigate the effect of a modified rule. We can predict how many innings games are likely to go instead of using the rule for a season or more to find out in actual games. We use conditional probabilities based on where runners start to examine how many more or less runs would score based on the start of the inning state. Normally an inning starts with nobody on and nobody out. We examine the expected difference in run scoring with a runner starting on second.

The modified extra innings rule is expected to reduce innings beyond 10 by 53 percent. Additionally, 92 percent of the extra innings games are expected to end by the 11th inning and there is a very low probability of going beyond 14 innings. Other potential rules such as putting runners on first and second can be analyzed with this methodology. Further study at the end of 2021 to analyze how long extra innings games actually last under the current modified rules will be interesting and provide additional guidance.

David Hyland is a Professor of finance and sabermetrics at Xavier University and a minority partner in the Florence Y’Alls of the Frontier League. He is the founder of XBAT (Xavier Baseball Analytics Team) with the Xavier University baseball coaches to provide opportunities for students to do analytic work on real data, helping players develop and compete in the Big East.

5:00-5:30 p.m. EDT
RP10—Behavioral Biases in Daily Fantasy Baseball: The Case of the Hot Hand
Jeremy Losak, Andrew Weinbach, and Rodney Paul

Video: Click here to watch a video replay of Jeremy’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Jeremy’s presentation (.pptx)

The existence of the hot hand is a controversial debate in sport. It is often hypothesized that recent successful performances indicate the likelihood of subsequent successes. However, early research in the field of statistics and economics suggested that belief in the hot hand could be attributed to a misconception of chance in random sequences. Recently, there has been renewed evidence of a hot hand effect in baseball. Green and Zwiebel (2018) find evidence of a hot hand in batting performances in MLB, and also identify that pitchers appear to respond to a hot hand. When pitchers face batters with particularly high percentages of home runs or extra base hits in recent past, the batters are more likely to be walked.

Our study adds to the hot hand literature in two critical ways. First, it considers hot hand in a new setting and with a unique data set. Second, it borrows from prior literature, focusing on the hot hand in sport betting to examine a new marketplace: daily fantasy sports. It also considers behavioral biases related to the hot hand.

Daily Fantasy Sports (DFS) offer daily versions of season-long fantasy contests and operate in a similar fashion as traditional betting markets. DFS participants compete by selecting individual players to form lineups. The better performing lineups win some monetary award depending on the contest. The existence of the hot hand has been tested in betting markets on numerous occasions, but this is one of the first to consider the hot hand in the daily fantasy space.

The key questions in this paper can be summarized as follows. First, is the hot hand prevalent in DFS baseball scoring? Second, do DFS prices accurately capture the hot hand? Third, how do consumers respond to players on a hot streak? In addressing these questions, we utilize DraftKings data for batters from the 2019 MLB season. We illustrate that while there is no evidence of a hot hand strategy, and while hot hand appears to be accurately captured in DFS prices, consumers are heavily biased towards selecting hot players. This is clear evidence of a hot hand behavioral bias from participants.

Jeremy Losak is an Assistant Professor of Sport Analytics in the Department of Sport Management at Syracuse University’s David B. Falk College of Sport and Human Dynamics. His research focuses on the economics of sports, particularly baseball labor markets, attendance at sporting events, gambling markets, and college athletics. Co-authors on the project include Andrew Weinbach, Professor of Economics at Coastal Carolina University, and Rodney Paul, Director of the Sport Analytics Program and Professor at Syracuse University.

5:30-6:00 p.m. EDT
RP11—SEAM Methodology for Context-Rich Player Matchup Evaluations in Baseball
Julia Wapner, David Dalpiaz, Daniel Eck

Video: Click here to watch a video replay of Julia’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Julia’s presentation (.pptx)

Spray charts are an important tool in evaluating batted-ball outcomes and player tendencies, such as the area of the field a batter is most likely to hit the ball. However, one concern with the use of spray chart distributions is the potential sparsity of data when looking at a specific batter-pitcher matchup.

To bring useful context to spray charts, we develop the SEAM (synthetic estimated average matchup) method for describing batter versus pitcher matchups in baseball. We first estimate the distribution of balls put into play by a batter facing a pitcher, called the empirical spray chart distribution. Many individual matchups have a sample size that is too small to be reliable for use in predicting future outcomes. Synthetic versions of the batter and pitcher under consideration are constructed in order to alleviate these concerns. Weights governing how much influence these synthetic players have on the overall spray chart distribution are constructed to minimize expected mean square error.

We provide a Shiny app that allows users to visualize and evaluate any batter-pitcher matchup that has occurred or could have occurred in the last five years. This provides a tool that could be used to determine defensive alignments, lineup construction, or pitcher selection through estimation of spray densities based on any input matchup. The computational speed with which the method calculates the spray densities allows the app to display the visualizations for any input almost instantly. Therefore, SEAM offers distributional interpretations of dependent matchup data which is computationally fast.

Julia Wapner is a senior at the University of Illinois Urbana-Champaign, where she is majoring in Statistics and Spanish. She has worked with the data analytics group for the Illinois baseball team for four years, and is working on this project as undergraduate research with two statistics professors, Daniel Eck and David Dalpiaz. After graduation, Julia will be joining the Baltimore Orioles as an Analytics Fellow.

6:00-6:30 p.m. EDT
RP12—Improving Batter-Pitcher At-Bat Modeling Using K-Means Clustering and Mixture Models
Xander Schwartz

Video: Click here to watch a video replay of Xander’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Xander’s presentation (.pptx)

In baseball, a common refrain is that a batter “hits this pitcher well” due to their success against the pitcher. This sentiment rarely holds predictive value given the typically small sample sizes of batter-pitcher matchup history. This study aims to increase sample sizes by grouping similar pitchers together so that the question is whether a batter “hits this type of pitcher well.” This study uses k-means clustering and mixture models to group pitchers to answer this question while also comparing the two grouping methods. The pitchers are grouped by their pitch rates, velocities, spin rates, and extensions using Baseball Savant data from 2015-2019. The findings of this study argue that looking at a batter’s history against similar pitchers is far more predictive than looking at a batter’s history against a single specific pitcher when predicting the outcome of an at-bat against that particular pitcher. Between the two grouping methods, mixture models produced more predictive groupings and provided a framework for future analytic research.

The baseball implications of this study are extensive. Knowing a batter’s history against a group of similar pitchers can help make decisions from lineup construction to betting. If a manager is debating which batter to start when the first batter has good numbers against the pitcher and the second batter has good numbers against the entire pitcher’s “type,” this study would advocate starting the second batter. Furthermore, the inverse of this example can also be true; if a manager is debating which reliever to call in, they could consider using the pitcher who belongs to the group that the batter has historically performed worst against. Similarly, if a gambler is looking for a team or player in daily fantasy that historically has had good results against a pitcher’s mixture, this could provide some edge for the gambler.

Xander Schwartz is a junior at Amherst College, where he majors in Statistics and Computer Science. He is president of the school’s Sports Analytics and Business Club and has experience in modeling and researching baseball, basketball, and European football. This summer, he will be interning with the Cleveland Guardians in Research and Development.

Sunday, March 20

1:00-1:30 p.m. EDT
RP13—Measuring the 6%: Using Generalized Linear Mixed Models to Compare DRS, UZR, FRAA, and Statcast Team Season Fielding Metrics
David Schmerfeld

Video: Click here to watch a video replay of David’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from David’s presentation (.pptx)

In 1984, The Hidden Game of Baseball stated that fielding accounts for 6% of baseball. Times have changed, and the true percentage is debatable, but even 6% adds up to almost 10 games in a full season. Traditionally, fielding in baseball is measured by errors, while advanced statistics include Defensive Runs Saved (DRS), Ultimate Zone Rating (UZR), Fielding Runs Above Average (FRAA), and Outs Above Average, which Statcast converts to Runs Prevented (RP). Five generalized linear mixed models were prepared using data from the 2016–2019 and 2021 seasons. Each model uses one of errors, DRS, UZR, FRAA, or RP to measure team fielding. The dataset includes 150 observations and each team’s runs allowed in a season is the response. Home runs allowed, strikeouts, and walks allowed are also included as fixed effects to explain the variation in runs allowed due to pitching. Individual teams were included as a random effect.

Including the pitching statistics in the models allows for a check on the quality of the models in addition to fit because the coefficients from the models can be compared to the predictors’ true run expectancy values. Meanwhile, the coefficients of the fielding predictors indicate whether each of the fielding measures accurately values fielding as one run saved. The models using DRS, UZR, and RP provided an improvement over the model using errors to describe fielding. The DRS model had the best overall fit (conditional pseudo-R² = 0.89). The DRS model’s mean absolute error averages 22 runs/season, while the errors model has a mean absolute error of 26 runs. The DRS model also had the lowest variance in mean absolute error with a standard deviation of 17, compared to 19 or more for each of the other models. However, the coefficient for DRS suggests that DRS as currently scaled overvalues fielding because 1 DRS only corresponds to 0.28 to 0.87 runs saved.

In the other models, the coefficients for the fielding metrics as converted to run values all included 1 within 95% confidence intervals. Accordingly, DRS was rescaled to reduce its standard deviation by a factor of 0.58, or 42%, which provided a coefficient of 1, while producing a model with the same fit and R² value.

David Schmerfeld is a sports analytics student in the graduate certificate program at the University of Missouri. He holds a Bachelor of Science in Chemical Engineering from the University of Pittsburgh and a Juris Doctor in Law from the University of Minnesota.

1:30-2:00 p.m. EDT
RP14—Rebuilding Competitive Balance in MLB: A Multi-Dimensional Perspective
Scott A. Brave, Andrew Butters, and Kevin Roberts

Video: Click here to watch a video replay of Scott’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Scott’s presentation (.pptx)

Competitive balance remains a contentious issue in MLB to a large extent because there is general disagreement on what constitutes a competitive league. The conventional view focuses on a heavy concentration of overall wins amongst a few teams as lacking competitive balance. Others have instead focused their attention on teams frequently moving up and down in the standings as a sign of a competitively balanced league. Ultimately, competitive balance is a by-product of the distribution of talent across teams. Since we only observe the outcomes of this distribution, it should not be surprising that there exists disagreement on the appropriate way to measure it. Instead, the lack of consensus strongly suggests that competitive balance is multi-dimensional. What has been left largely unanswered then is the extent to which these competing notions are linked.

Using a simple framework that encapsulates multiple views on competitive balance, we show how the overall competitive landscape of MLB can be captured along two interrelated dimensions: (i) the disparity across teams within a season (measured by the standard deviation of winning percentages), and (ii) the immobility of teams across seasons (measured by the pooled first-order autocorrelation of winning percentages). Our model suggests a trade-off often exists on these two dimensions– e.g., efforts to increase parity can come at the expense of mobility. It also highlights, however, league rules that can improve upon both dimensions simultaneously. Changes in the contractual rights of players is one such instance. For example, we show that the elimination of the reserve clause and the rise of free agency after the 1968 season dramatically lowered both measures in a way that persists to this day.

More recently, the collective bargaining agreement (CBA) that expired in December 2021 was an instance where both disparity and immobility increased. We relate this fact to an important structural change in the economics of baseball over the last decade: namely the rising importance of non-gate revenues. Substantial increases in non-gate revenues that were largely untied to a team’s on-field performance increased the bargaining power of owners in subsequent CBAs. We show how this could explain the greater emphasis teams have come to place on developing young talent in comparison to free-agent acquisitions along with the coincident decline in competitive balance.

Scott A. Brave is the Lead Consumer Spending Economist for decision intelligence company Morning Consult, where he leverages the company’s high-frequency proprietary data in forecasting and predictive analytics to help clients make better business decisions. Before joining Morning Consult, he was a senior economist in the economic research department of the Federal Reserve Bank of Chicago, where his responsibilities included the releases for the Chicago Fed’s economic and financial activity indexes. Brave’s research in sports economics has been previously shared at the SABR Analytics (2020) and MIT Sloan Sports Analytics (2016 & 2017) conferences and published in the Journal of Sports Economics and the Journal of Sports Analytics.

3:30-4:00 p.m. EDT
RP15—Quantifying Hitter Plate Discipline in Major League Baseball
Joshua Mould and David Anderson

Video: Click here to watch a video replay of Joshua’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Joshua’s presentation (.pptx)

Pitch selection is an important part of hitting performance, and it remains under-studied. Current attempts like walk-rate, K/BB ratio, and O% all fall short because they are season-level aggregates of outcomes rather than direct measurement of decision making. We use Statcast data from the 2016-2021 Major League Baseball seasons to quantify the ability of players to make correct decisions about whether or not to swing at each pitch. We named this metric EAGLE, which stands for Expected Additional runs Gained by Looking/swinging Estimate.

Using player hitting statistics and pitch characteristics, we train an xgboost model to predict the likelihood of each possible outcome from a swing (Miss, Foul, Out, Single, Double, Triple, Homerun), and from not swinging (Ball, Strike). For each pitch, we calculate the change in run expectancy given the situation for each outcome. We then multiply the change in run expectancy of each event by the probability that it is to occur given the pitch information metrics. Finally, we subtract the predicted run expectancy of a take from the predicted run expectancy of a swing to get the expected runs gained by swinging. If this is positive, the hitter should have swung and, if negative, the hitter should have taken the pitch. Then, based on the hitter’s actual decision we attribute a positive or negative value to this difference and quantify a hitter’s plate discipline by averaging the total increase in expected runs from each pitch thrown to that hitter.

EAGLE explains hitter decision making much better than current and commonly used statistics like O%. EAGLE predicts OPS better than O%, with a correlation of .459, because it takes into account the slugging aspect of plate discipline that is commonly overlooked. In addition to this unique aspect, EAGLE explains the common plate discipline metrics like BB% and O% very well. We also show that this metric is stable over time, with a correlation coefficient of 0.43 from year to year, making it a reliable metric for prediction.

EAGLE accurately determines how good players are at choosing which pitches to swing at and can improve predictions of future hitting performance. It has better prediction of future OPS than other commonly used plate discipline metrics, like O% or K/BB ratio. EAGLE shows that it is not just important to take bad pitches but that swinging at good pitches has a big impact on production as well.

Joshua Mould is a Junior at Villanova University from Wellesley, Massachusetts, double majoring in Computer Science and Statistics. He is a co-founder and co-president of Villanova’s Sports Analytics Club and he was the winner of the student methods track at the 2021 Carnegie Mellon Sports Analytics Conference for his paper “Quantifying Hitter Plate Discipline in Major League Baseball.” He was recently hired to be a summer intern with the Philadelphia Phillies as an Associate Quantitative Analyst.

4:00-4:30 p.m. EDT
RP16—Exit Velocity Over Expected: An Evaluation of MLB Batted Ball Data
Sean Sullivan

Video: Click here to watch a video replay of Sean’s presentation (YouTube)
Slides: Click here to download PowerPoint slides from Sean’s presentation (.pptx)

As baseball entered the 20th century, Major League Baseball (MLB) teams began to embrace the usage of data and analytics at unprecedented rates. There are a variety of case studies and examples of how teams utilize data to extract insights and gain an advantage over their competition. This analysis sought to provide the baseball community with a metric that would aid in the evaluation of batters’ and pitchers’ ability to initiate, or limit, an exit velocity of a batted ball, harder or softer than expected. The metric, Exit Velocity Over Expected (EVOE), was generated by training an eXtreme Gradient Boosting Regressor model, that predicted exit velocity, on MLB batted balls from the 2017 – 2020 seasons. The 2021 season was then run through the trained model, the predicted exit velocity was subtracted from the actual exit velocity, and the difference was captured as EVOE. The batted ball data was gathered from MLB’s Baseball Savant Statcast website with the use of the python “pybaseball” package. Simply put, EVOE presents a player’s measured exit velocity compared to the expected exit velocity, given the attributes of a pitch (spin rate, velocity of the pitch, horizontal movement, etc). This metric provides an advantage over simply using exit velocity because of the added context the “expected” provides. Knowing that exit velocity is an important element of a batted ball successfully counting as a hit, understanding how players over or under perform given the context of the pitch will provide valuable insights for teams.

Sean Sullivan is the founder of URAM Analytics (@URAM_Analytics) where he posts original sports data science research, shares and amplifies the work of other researchers, and posts jobs open in the sports data science and analytics space. He is a recent graduate of DePaul University (MS Data Science, 2021) and currently works as a Data Scientist in the retail industry.

For more coverage of the 2022 SABR Virtual Analytics Conference, visit SABR.org/analytics/2022.

Search the Research Collection

SABR Analytics Conference

2022 SABR Virtual Analytics Conference presentations

Thursday, March 17

Friday, March 18

Saturday, March 19

Sunday, March 20

Support SABR today!