2024 SABR Analytics Conference Research Presentations
SABR and Sports Info Solutions are excited to announce the list of research presentations for the SABR Analytics Conference, which will be held on March 8-10, 2024, in Phoenix, Arizona.Abstracts and presenter bios for each research presentation can be found below. Click on a link below to watch a video replay or download PowerPoint slides (where available.)
Friday, March 8
2:25-2:55 p.m. MST
RP1: The Carrot and the Stick: Cleaning Up PED Use in Minor League Baseball
Scott Brave and Levi Bognar
- Video: Click here to watch a video replay of Levi’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Levi’s presentation (.pdf)
The last three years have brought about massive changes in minor league baseball (MiLB). From a contraction in major league (MLB) affiliates to better pay and support for living expenses and the acceptance of MiLB players into the MLBPA, decades of precedent have been overturned. With so much upheaval, it is perhaps not surprising that some consequences of these developments have slipped through the cracks. Here, we bring one of those developments back to the forefront; namely, the reduction in performance-enhancing drug (PED) use in MiLB.
Back in March 2020, we presented research at the SABR Analytics Conference that suggested a link existed between the pay structure of MiLB and PED use. While the details have changed a bit since then, the main argument is the same: For players at the margins of each minor league level, there are potentially large pecuniary gains from even small increases in performance. This was particularly true in the old MiLB at the lowest levels, and often provided an incentive for PED use even after taking into account the penalties that were to accrue if the player was caught.
Two seasons into this new MiLB regime, PED use has fallen considerably. For MiLB players in domestic leagues without MLB experience, there were on average during the 2022-2023 seasons roughly a third the number of MiLB PED suspensions for such players on average during the 2005-2018 seasons. While suspensions continued to be clustered at the lowest levels of MiLB even after the contraction in affiliates, PED use over the last two seasons shows a remarkably lower and flatter pattern across all levels. Coincidentally, after the changes made to pay in 2022 and 2023, MiLB wages across the levels are now higher and flatter as well.
In this paper, we formalize and calibrate a game theoretic model of how MLB can alter the incentives faced by the marginal PED-using player at each level of MiLB and, hence, their likelihood of PED use, with both “carrot” and “stick” type policies. We show that both the level and the slope of the pay curve matter for MiLB PED use. Furthermore, we show that our model predicts the decline in PED use for the 2022-2023 seasons quite well. This underscores an important framework for performing cost-benefit analysis when evaluating policies designed to deter PED use.
Levi Bognar is a Senior Analyst at the economic consulting firm Compass Lexecon. Prior to this, he was a Research Assistant at the Federal Reserve Bank of Chicago. His research has appeared in multiple Chicago Fed outlets including the Chicago Fed Working Paper Series, Chicago Fed Letter, Chicago Fed Insights, and the Midwest Economy Blog. He has also presented research on sports analytics topics, including a presentation at the 2023 MIT Sloan Sports Analytics Conference. Levi received his B.A. in Economics with Honors from the University of Notre Dame in 2021.
2:55-3:25 p.m. MST
RP2: Regression Based Approach for Predicting Ground Reaction Forces in Pitching
Tim Niiler
- Video: Click here to watch a video replay of Tim’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Tim’s presentation (.pptx)
Previous studies have shown that various measures of ground reaction force (GRF) have a relationship with pitching performance and joint loading in the pitching arm. However, since direct force collection is not possible on the field, players and coaches have no access to in-game pitching forces for training or assessment. Here we describe a method for predicting GRF measures using pitching kinematics as inputs to a statistical model. Retrospective pitching mocap data comprised of both kinematic and force plate data from 6488 pitches thrown by 201 players were obtained from four MLB bullpen settings as part of players’ standard monitoring and assessment. For both plates, each force axis (M/L, Vertical, A/P) for every frame of data were modeled using AIC regression with kernel density-based weighting. To ensure generalizability, data were randomly split into 10 groups and k-fold cross-validation was used. The accuracy of models were assessed using relative root mean square (RRMS). Additional testing was done using 592 additional trials which were not used in either training or cross-validation of the models. Combined model results on training and cross-validation data had RRMS values of 10.4 ± 4.9% (M/L), 6.9 ± 3.4 (Vertical), and 6.2 ± 6.3 (A/P) for the rear plate, and 15.1 ± 6.9 (M/L), 11.7 ± 6.1 (Vertical), and 10.8 ± 5.4 (A/P) for the front plate. These values were commensurate for the 592 additional trials tested. Within player results were very consistent with RRMS variability not exceeding 3.8%.
Tim Niiler is a Senior Software Engineer at KinaTrax. He started in physics and astronomy and finished his studies with a PhD in Biomechanics and Movement Sciences from the University of Delaware. For the past 14 years, he has taught physics at Penn State University-Brandywine and provided statistical support for the Gait Lab at Nemours Children’s Hospital while also building custom physics applications in JavaScript for students.
Saturday, March 9
10:00-10:15 a.m. MST
SP1: Bringing Pitching+ to DIII Baseball: The Art of W.A.R. Model
Henry Gliedman and Benjamin Reinhard
- Video: Click here to watch a video replay of Henry and Benjamin’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Henry and Benjamin’s presentation (.pptx)
Quantitative models used to measure the quality of a pitcher’s “stuff” have become prominent in baseball analytics in recent years. These models are commonly referred to as “Stuff+.” They attempt to solely evaluate the physical metrics of a pitch to determine what characteristics of a pitch lead to the best results. According to Rob Arthur’s research, exit velocity is roughly five parts hitter to one part pitcher, meaning pitchers have minimal impact on how hard a ball is hit. However, it is generally accepted that pitchers have some impact on if a ball is hit in the air or on the ground. Thus, Stuff+ models typically heavily weigh the ability of a pitch to get swings and misses. The vast majority of these models use run values, such as Pitching Bot, Stuff+, and Driveline’s Stuff+ model, arguably the three most preeminent public Stuff+ models.
A common method among public Stuff+ models has been the use of machine learning algorithms. We set out to create one of these models for our Division III baseball program but quickly found an error in applying current methodologies to our goal. These models largely utilize expensive camera-based and ball-tracking systems, such as Statcast and Trackman. Additionally, these models are virtually meaningless at lower levels of baseball, where the quality of pitches is vastly different from Major League pitchers. This discrepancy led to the larger research question for our work: Can we create a system in which statistics and basic ball metrics collected from a Rapsodo device help enable us to create a Stuff+ or Pitching + model for Division III baseball that is comparable to expensive camera systems? For pitchers at this level, an accessible, skill-level adjacent model could revolutionize player development. We constructed our model using basic thrown ball data and charted location-based statistics to accurately predict pitcher performance. Machine learning algorithms were then used to create and periodically refine our initial Pitching + model. Our Pitching + model generated an R^2 value against ERA of .619 after a single collegiate season, predicting ERA nearly 20% better than FIP (Fielding Independent Pitching).
Not only does our model accurately predict performance, but it is also designed to assist in player development by identifying key weaknesses. Our model identifies what metrics are causing the weaknesses in a pitcher’s “Stuff” grade, enabling them to quickly identify what they should improve. These two separate, but equally important aspects have enabled this model to become a crucial part of our program’s pitching development. Our research helps make advanced analytics a reality for the wider baseball community, encompassing all ages, abilities, and financial situations. With the distribution of our model and our companion program, we aspire to make quality pitching analytics and coaching accessible to players of all levels and situations.
Henry Gliedman is currently a junior mathematics major at St. Olaf College in Minnesota. He serves as a Student Assistant Coach for the school’s Division III baseball team, where he also co-founded the program’s analytics department. Gliedman recently co-founded a baseball technology company (Art of W.A.R. Baseball Technologies), with the mission of bringing financially accessible data driven coaching to high school and college programs nationwide. Gliedman aims to pursue a position working in baseball after graduation.
Benjamin Reinhard is a senior quantitative economics and sociology/anthropology major with a statistics and data science concentration at St. Olaf College in Minnesota. Reinhard is currently a student coach for the St. Olaf College baseball team, working as the Director of Pitching Development and co-founder of that program’s analytics department. Reinhard also co-founded the start-up baseball technology company, Art of W.A.R. Baseball Technologies. Reinhard is pursuing a position in sports analytics after graduation.
10:15-10:30 a.m. MST
SP2: How Did the Pitch Clock Affect Pitcher Performance?
Thomas Stanton
- Video: Click here to watch a video replay of Thomas’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Thomas’s presentation (.pptx)
MLB’s implementation of a pitch clock at the start of the 2023 season represented a massive change to America’s pastime. While controversial for a number of reasons, the pitch clock was unquestionably effective at speeding the game up. In just one year, average MLB game time dropped by nearly a half an hour, to 2 hours and 39 minutes per game.
One of the main issues experts raised with the pitch clock was the potential for increased injuries for pitchers. Sonne and Keir (2016) found that the faster game speed caused by pitch clocks increases muscle fatigue in arm muscles, which can lead to injury. Fatigue, however, can also lead to reduced pitch velocity. This research aims to determine if the institution of the pitch clock (and its subsequent increase in pace of play and shortening of time in between pitches) led to decreased performance in pitchers as measured by pitch velocity.
Because the pitch clock was first instituted in an MLB game just eight months ago, there is very little literature available on its effects on pitcher performance. As such, this will be the first research to examine how the MLB pitch clock and its shorter recovery times affected pitch velocity and spin. Szymborski (2023) analyzed how changes in time between pitches may have affected the results-based performance of certain pitchers, finding no evidence that pitchers with a higher pace change between 2022 and 2023 were getting worse results. Using Szymborksi’s approach, I found no evidence that a pitcher’s average difference in pitch tempo was positively related to a change in his average four-seam fastball velocity, although the efficacy of the model was hurt by a lack of data points.
I then used linear regression with pitcher fixed effects to quantify the effect of the pitch clock on pitch velocity on more granular pitch data, taken from the MLB Stats API. First, I investigated how the effect of pitch count on fastball velocity changed between the 2022 and 2023 seasons. Although velocity among qualified pitchers did drop between seasons, there is little statistical evidence that the effect of pitch count became more negative in 2023. Next, I used another fixed effects model to quantify the value of rest on pitch velocity. In this case, I found that the relationship between the natural log of pitch tempo and velocity became much more positive between 2022 and 2023, demonstrating that rest became more valuable once the pitch clock was instituted.
Thomas Stanton is a undergraduate senior at Northwestern University. He is majoring in Economics and Mathematical Methods in the Social Sciences and minoring in Data Science. He is a member of the Northwestern Sports Analytics Group, and will be working as an economic consulting analyst for Epsilon Economics post-graduation.
10:35-10:50 a.m. MST
SP3: Influence of Leverage Index on Pitching Biomechanics
Adam Nebel
- Video: Click here to watch a video replay of Adam’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Adam’s presentation (.pptx)
Background: Consistent release parameters in baseball pitching positively influence season-long performance and lead to greater consistency in the ball’s location crossing home plate. Additionally, the phenomenon of choking under pressure suggests there is an interaction between motor control and psychological stress that, in some cases, leads to decreased performance. Because of advancements in sabermetrics, in-game situational pressure can be estimated using the Leverage Index (hereafter referred to as Leverage), which measures the potential change in win expectancy. Because of the effect psychological pressure has on motor control and the influence consistent release parameters have on performance, this study looks to identify if there is a change in release parameter variability as Leverage changes.
Methods: Arm slot angles of thirty-five NCAA Division 1 baseball pitchers (1.89±0.1m; 92.7±8.9kg) throwing fastballs were analyzed using markerless motion capture. Arm slot was defined as the angle between the vertical axis of the world and the vector between the shoulder and hand at ball release, where 0° is perpendicular to the ground, and 90° is parallel to the ground. Play-by-play data were scraped for Leverage components (score, innings, outs, and baserunners), and Leverage was calculated and matched to the appropriate pitch. Leverage was grouped using cutoffs provided by Tom Tango on insidethebook.com (low: Leverage < 0.7; moderate: Leverage between 1.5 and 0.7; high: Leverage > 1.5). After grouping by pitcher and Leverage, the standard deviation of arm slot identified the variability of release within each Leverage situation for each pitcher. Pitchers were required to have at least 5 fastballs in each grouping to be included in the analysis. A repeated measures ANOVA was performed to identify if arm slot variability for each pitcher varied across Leverage situations.
Results: Variability in arm slot was statistically significant across Leverage situations (F2,34 = 4.60, p = .017). Post hoc analysis conducted with Bonferroni corrections revealed a significant difference (adjusted p = .023) in arm slot variability between high (3.0± 2.8°) and low Leverage situations (2.3 ± 2.3°), where the high Leverage situation had increased variability in arm slot compared to the low Leverage situations.
Discussion: Differences were observed in arm slot variability between high and low-leverage situations, with the high-leverage situations resulting in increased variability of the release parameters. These findings support prior research in motor control, which suggests when psychological pressure increases, motor control becomes impaired. These findings are important as they identify what biomechanical factors may be altered during situations of high pressure, which may influence outcomes. The integration of sabermetrics and biomechanics offers a valuable tool for athletes and coaches in identifying movement patterns that are influenced by specific situations, as well as optimizing mechanics to achieve ideal outcomes at the pitch-level. Further research should investigate the influence Leverage has on other biomechanical alterations, along with the effect Leverage has on pitch metrics and outcomes.
Adam Nebel is a Graduate Assistant of Sports Science with Auburn University’s baseball program, where he uses KinaTrax motion capture technology to work collaboratively with the coaching, strength and conditioning, and athletic training staffs. He previously worked as a student athletic trainer and researcher with the University of Arkansas baseball program from 2020 to 2022. He is working toward his Ph.D. in Kinesiology with an emphasis in Biomechanics and is expected to graduate from Auburn in 2025. He earned his bachelor’s degree in Kinesiology from Boise State University in 2020 and master’s in Athletic Training from Arkansas in 2022.
10:50-11:05 a.m. MST
SP4: Optimizing Lineup Simulators with Ball Tracking Data
Reece Calvin
- Video: Click here to watch a video replay of Reece’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Reece’s presentation (.pptx)
1. Research. In the public realm, there is little consensus in how to optimize a lineup. There is the traditional view, high batting average leadoff, power hitter at cleanup, and fill the rest in order of talent. This is not the only non-data driven method. In certain international leagues, teams tend to bat their fastest players first and second, regardless of statistics. One of the first public deep dives into lineup optimization came from Tom Tango and The Book. The results can be summarized as followed: Highest OBP, Highest wOBA, third highest wOBA, Home Run Hitter, second highest wOBA, then sort the rest by wOBA. In past conferences, Connor Turner built a Markov chain to simulate individual pitcher batter matchups, while Jeff Jin and Chris Zexin Chen subsequently attempted to improve the predictiveness of plate appearance (PA) outcomes. Here I attempted to refine this approach and test how it preforms when simulating several MLB games.
A major hurdle I wanted to clear with this model to have an availability to work in small sample sizes. Most Markov Chain Models look at a player’s past PA results and calculates their distribution. The issue is, in small sample sizes, these results tend to vary greatly, even three-quarters into the season. My solution was to instead use batted ball types as opposed to results. To do this, I kept strikeouts and walks, but for all batted balls I used the type: ground ball, fly ball, etc., and classified each as: Weak, Topped, Under, Burner, Solid Contact, and Barrel.
Using this method, Jin and Chen’s technique showed a 48% effect on the result from the pitcher and 52% for the batter. This tracks with the sabermetric belief that a pitcher has little control on the actual outcome of balls in play, but instead batted ball type.
Knowing these numbers, I developed a Markov Chain simulator that takes a starting pitcher and lineup as inputs and returns the projected runs. It calculates each batter result for the current pitcher, based on the 52/48 rule defined earlier.
2. Results. For the empirical validation of the model, the distribution of PA outcomes for each player was computed up to the conclusion of August. Subsequently, every game played in September was subjected to 1500 Markov chain Monte Carlo (MCMC) simulations, yielding an average of projected runs for each iteration. When compared to betting Over/Under, the model returned an accuracy of 57%. Even disregarding missing context such as park factors, weather, batter speed, defense, injuries, etc., the results show great promise.
Reece Calvin is an undergrad at Northeastern University, majoring in Data Science and Economics. He has worked with the Northeastern Huskies’ baseball team since 2021 and previously led Research and Development for the Hiroshima Toyo Carp. Additionally, he founded the Sports Analytics Club at Northeastern University.
11:05-11:20 a.m. MST
SP5: Kenny Lofton Portfolio for Induction Into the National Baseball Hall of Fame
Erick Figueroa, John Nunez, Renee Brown, Pamela Carbajal, Nogales High School Sports Analytics Club
- Video: Click here to watch a video replay of Nogales High School’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Nogales High School’s presentation (.pptx)
Kenny Lofton’s potential journey to the Baseball Hall of Fame hinges on a holistic assessment of his career, encompassing three key dimensions: overall performance, defensive prowess, and offensive statistics. In this abstract, we delve into the following nuanced metrics that define Lofton’s legacy.
1. Overall Performance (WAR, Peak WAR, JAWS)
Lofton’s overall performance shines when evaluated through the lens of advanced metrics. His Wins Above Replacement (WAR) of 68.3 places him in elite company among center fielders. JAWS (Jaffe WAR Score System) further underscores his excellence, with a score surpassing the average Hall of Fame center fielder. When considering Peak WAR, Lofton’s peak seasons compare favorably to Hall of Fame standards, signifying sustained excellence throughout his career.
2. Defensive Stats (dWAR, DRP, TZ, and Fielding Percentage)
Lofton’s defensive skills were a cornerstone of his impact on the game. Defensive Wins above Replacement (dWAR) highlights his contribution in preventing runs. Metrics like Defensive Runs Prevented (DRP) and Total Zone (TZ) reveal his ability to consistently save runs with his remarkable fielding and range. A high fielding percentage (FPct) further underscores his reliability in the outfield, demonstrating Lofton’s complete defensive package.
3. Offensive Stats (oWAR , OBP, and Stolen bases)
Lofton’s offensive contributions, particularly his ability to get on base, are noteworthy. His Offensive Wins above Replacement (oWAR) reflect his value with the bat, complementing his defensive prowess. An impressive on-base percentage (OBP) attests to his plate discipline and base-stealing acumen. Kenny Lofton’s offensive stats are very impressive. His consistent production as a top-of-lineup hitter and baserunner made him an extremely valuable asset to the teams on which he played.
It is through the use of these analytic measures that we make what we believe to be a compelling argument for Kenny Lofton’s enshrinement into the Baseball Hall of Fame.
The Nogales High School Sports Analytics Club, under the leadership of club advisor and AP Statistics teacher Ravi Dutt, includes Eric Figueroa, John Nunez, Renee Brown, and Pamela Carbajal.
Eric Figueroa is a senior at Nogales High School, driven by a profound passion for computer science and data science. As he prepares to embark on the next phase of his academic journey, he is poised to make a significant impact in the field of computer science and data science.
John Nunez is a junior at Nogales High School with multifaceted passion for sports and education. He does analytics for school softball teams and is a valuable member of the NHS football team.
Renee Brown is a driven senior student at Nogales High School, with a clear vision of her future in the dynamic field of sports analytics. As she prepares to embark on the next chapter of her academic journey, she is poised to make a significant impact in the field of Sports Analytics.
Pamela Carbajal, the youngest member of the group, brings a vibrant energy and fresh perspective. A sophomore at Nogales High School and IB candidates, she balances her passion for sports analytics with her academic pursuit. She has a keen interest in softball and baseball statistics.
2:30-3:00 p.m. MST
RP3: Is Less More? A Validation in Baseball Pitching Comparing a 4 vs 8 Camera Setup
James Wright
- Video: Click here to watch a video replay of James’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from James’s presentation (.pptx)
Background: The evolution of technology in sports has significantly enhanced our comprehension of human kinematics. Contemporary research in 3D Full-Body Biomechanical Markerless Motion Capture software emphasises towards validation and experimental studies to delineate research objectives (Ozkaya et al., 2018, Bullock et al., 2020). Despite a plethora of validation studies on the impact of reduced camera counts on kinematics (Dobos et al., 2022, Bench et al., 2023 the comparative efficacy of minimalistic approaches, specifically a four versus eight camera setups, remains unexplored. This gap necessitates a validation investigation to inform cost-benefit analyses for baseball biomechanics professionals, addressing the potential fallacy of overinvestment.
Objectives: This study aims to examine the spatial discrepancies between joint centres and rotations within two conditions: an eight-camera setup, and a reduced four-camera setup, both employing Simi NeMo AI to track a pitching movement. This study seeks to determine whether a four-camera setup yields non-significant differences compared to that of eight.
Methods: Using a within-subject repeated measure design approach, 15 fastballs, ¾ release trajectory from a single participant was analysed. Data was recorded at 200 Hz using Baumer Cameras and Simi Motion Software which was subsequently tracked utilising Simi NeMo AI. Data from each trial were normalised to 100 points utilising a linear interpolation method, filtered using a 4-order Butterworth (12 Hz) and assigned typical pitching phases (represented as a % of the pitch). Non-normality of data was confirmed through the Kolmogorov-Smirnov test, with subsequent analysis including Wilcoxon signed-rank tests, Bland-Altman plots, and Intraclass Correlation Coefficients (ICC), utilising Python for all computations.
Results: Primary quantitative analysis provided clear, mean non-significant statistical differences (p < .001 ± 0.005) within all trials along with high correlation scores (0.941 ± 0.021) between four camera vs eight cameras for all joint centres and rotations. The largest between-trial difference trial was observed within trial 1, however still represented a mean ranked test alpha value of 0.017 and a correlation value of 0.914. Displaying clear indications that a four-camera solution, given an appropriate recording environment, provides valid data.
Discussion: These results provide a clear indication that cheaper, portable solutions of motion tracking software can be utilised within the field of biomechanics whilst still providing valid and accurate data. This study’s implications extend across the biomechanics field, suggesting that high data quality can be achieved in less controlled environments. Further research is advocated to explore the minimum requirements for valid data collection and specific applications in baseball.
Conclusion: This research lays the groundwork for advancing a minimalist approach in biomechanics, promoting the acquisition of high-quality data across various settings, a goal worth pursuing in the broader scientific community.
James Wright is a Lead US Application Engineer at Simi Reality Motion Systems. He holds a master’s of science in Sport and Exercise Science from the University of Bath.
3:00-3:30 p.m. MST
RP4: Refining Baseball Pitching Accuracy: Assessing the Impact of Immediate Feedback on Pitching Precision
Takafumi Hayashi, Takaaki Nara, and Takehiko Sano
- Video: Click here to watch a video replay of Takafumi’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Takafumi’s presentation (.pptx)
Introduction: Accuracy in baseball pitching is crucial for successful performance. Shinya et al. (2017) found that the arrival point of a pitched ball forms an elliptical pattern, with the variation in the minor-axis direction of the ellipse attributable to change in the timing of the ball release. Canavan (2021) reported that pitching errors increase with greater distance from the lead foot to the midline. While practice on an unstable footing with closed eyes has been suggested to improve control (Marsh et al., 2004), specific methods for improving the accuracy of pitching accuracy remain insufficiently explored. This study examined whether immediate feedback on pitching errors during pitching practice improves control accuracy among pitchers.
Methods: This was a randomized controlled trial, employing a two-way analysis of variance (ANOVA) with immediate feedback of pitching errors as a factor and the area of the 95% confidence ellipse of pitching errors as the outcome. The participants were amateur pitchers (n = 12).
Experimental protocol: For the intervention group, pitching practice sessions with immediate feedback on pitching errors were performed once a week, with 20 pitches each for a total of five sessions. Pitching errors were measured before and after the experimental period in both the intervention and control groups (10 pitches each). Additionally, the pitching velocity was recorded. The pitching errors were measured using the two-dimensional direct linear transformation method based on images captured using a high-speed camera.
Results: Pitching errors: The two-way ANOVA revealed a notable interaction (F(1, 10) = 8.55, p = .015). A significant simple main effect was observed in the intervention group, and the 95% confidence ellipse area after the experiment was substantially smaller than before the experiment (from 11,344 ± 5,654 cm2 to 6,483 ± 1,701 cm2, p = .024). Comparatively, the 95% confidence elliptical area of the control group was also notably smaller than that of the intervention group (11,344 ± 5,654 cm2 vs. 5,485 ± 2,293 cm2, p = .040).
Pitching velocity: The two-way ANOVA revealed no considerable interaction (F(1, 10) = .06, p = .810). However, significant main effects were observed before and after the experiment, with substantially smaller values for pitching velocity post-experiment than before (p < .001).
Discussion: The findings of the two-way ANOVA of pitching errors suggest that immediate feedback on pitching errors improves control accuracy. Notably, the individual subject results showed that only one of the six subjects in the intervention group displayed no reduction in area, while three subjects exhibited a reduction in area from >14,000 cm2 before the experiment to <10,000 cm2 after the experiment. This implies that impact of immediate feedback tends to be more pronounced in pitchers with lower control accuracy, hinting at the existence of a potential threshold where the effect manifests.
This study is expected to foster improvements in control, which is important for pitchers, and contribute to pitching coaching methodologies.
Takafumi Hayashi is a professor at the School of Health Sciences at Asahi University in Japan. He majored in coaching and obtained his Ph.D. from Keio University. As a pitching coach in college baseball, he has trained many professional baseball players and several first-round draft picks in Nippon Professional Baseball. During his playing days, he pitched for the Japanese national college baseball team and was the winning pitcher against the US college representative team.
Takaaki Nara is an Assistant Professor at the Faculty of Health and Sport Sciences at the University of Tsukuba in Japan. He obtained his Ph.D. in Physical Education from the University of Tsukuba. During his playing career, he served as a captain and pitcher and participated in the Koshien Tournament (Japan’s national high school baseball championship) and national collegiate baseball competitions. In addition, as a coach, he led a college baseball team all the way to the national tournament.
Takehiko Sano is an associate professor at the Keio University Graduate School of Health Management. He holds a Master of Science degree in Sports Administration from Georgia State University. Specializing in sports management and sports marketing, he has practical experience working at the J-League headquarters, Japan’s professional soccer league. He played a key role in the statistical analysis for this study.
3:30-4:00 p.m. MST
RP5: Do Pitchers Learn How to Avoid the Third-Time-Through-the-Order Penalty?
Rob Mains
- Video: Click here to watch a video replay of Rob’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Rob’s presentation (.pptx)
In 2022, my presentation, “The Third-Time-Through-The-Order Penalty Is Worse Than You Thought,” introduced a methodology for evaluating the third-time-through-the-order (TTTO) penalty. Rather than use the publicly-available (via Baseball-Reference) data, which compares performance of all starting pitchers facing the opposing lineup the first, second, third, and fourth times, I added two refinements. First, I included only starts in which the pitcher faced at least 19 batters, i.e., faced the order three full times, discarding starts in which the pitcher didn’t last three times. Second, I divided the performance by batting order position, since a pitcher facing the first four batters of the opposing lineup a third time before leaving the game faced tougher hitters, on average, than he did when he also faced the fifth through ninth batters the first two times.
Analyzing all starts from 1969 through 2021, I found pitchers consistently performed worse (measured by OPS allowed facing each lineup position a third time compared to the average of the first two times) throughout the Division Play era. This finding refutes the idea that the TTTO penalty is a modern, sabermetrically-initiated construct.
This fall, a listener to the Effectively Wild podcast said he’d recently heard John Smoltz opine that pitchers have to learn how pitch through the order a third time. The listener wondered whether the TTTO penalty shifts over a pitcher’s career. I looked at the data for Smoltz and found that his penalty in the first half of his career (1988-1995, covering 231 starts) was worse (weighted average 13.0% higher OPS allowed the third time compared to the first two, compared to an MLB average of 11.7%) than in the second half (1996-2009 excluding 2001 when he started only five games, 245 starts, weighed average 8.5% penalty for Smoltz compared to 11.3%).
In this analysis, I expand this analysis beyond that of a grumpy Hall of Fame pitcher. I will consider regular starting pitchers whose career fell entirely between 1969 (the first year of my 2022 research) and 2014 (the last year in which the average pitcher start averaged six innings, after which it declined sharply) and compare their TTTO penalty in the first and second halves of their careers to determine whether they “learn” how to pitch a third time.
I expect two confounding variables. First, an analysis involving pitchers who regularly started games over multiple seasons necessarily introduces selection bias. Second, pitchers’ overall performance generally declines as they age; Smoltz’ OPS allowed the first two times through the order was .633 in the first half of his career and .647 in the second half. Nonetheless, I expect to be able to evaluate the veracity of Smoltz’ observation.
Rob Mains is a writer for Baseball Prospectus and manages the site’s Spanish content. He is a former equities analyst and is a two-time finalist for SABR Analytics Conference Research Awards. He is a SABR Analytics Certification course reviewer.
Sunday, March 10
9:00-9:30 a.m. MST
RP6: Noisier Judgments: In-Game Applications of Probability Surface-Based Analysis of Umpiring Variability
Emily-Anne Patt and James Stockton
- Video: Click here to watch a video replay of Emily-Anne’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Emily-Anne’s presentation (.pptx)
In “Noisy Judgments: A Probability Surface-Based Analysis of Umpiring Variability,” we established a prior probability surface representing the actual strike zone as called during an MLB game. We used this surface to evaluate changes in the actual strike zone over time and individual umpire performance, stress-testing the reliability of the model with established baseball facts related to batter and pitcher handedness and seasonal shifts.
In Noisier Judgments, we dive deeper into the practical in-game applications of the strike zone probability surface. A sensitivity study allows us to slice the data into multiple cross-sections and still be assured our surface is reliable. This allows us to evaluate game-specific factors such as count, leverage, team, and ballpark that could affect the size, shape, and position of the individual umpire’s strike zone during a game.
The season-by-season movement of the position of the strike zone was one of the distinct patterns detected by our strike zone probability surface. We revisit our analysis of pitcher and batter strike zone variations after normalizing the surfaces for the seasonal pattern. Controlling for seasonality allows us to better hone in on the differences between players. Furthermore, we build on this player-specific analysis by creating a strike zone surface for each catcher. In doing so, we propose a new metric for evaluating a catcher’s pitch framing ability by comparing the proportion of strike calls they receive outside of the umpire’s usual strike zone.
Emily-Anne Patt is the manager for quantitative intelligence and methodologies supporting security and resilience at Alphabet, Inc. based in Washington, DC. Her background is in econometrics and financial economics, with prior experience at the US Department of State, Federal Reserve Board of Governors, US Department of Treasury, and as a data science consultant for US federal government clients.
9:30-10:00 a.m. MST
RP7: Changes in Minor League Umpire Tendencies With The Challenge and Automatic Ball-Strikes Systems
Jeremy Losak, Jason Maddox, and Jonah Soos
- Video: Click here to watch a video replay of Jeremy, Jason, and Jonah’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Jeremy, Jason, and Jonah’s presentation (.pptx)
Subjective umpire pitch calling in baseball has long been a topic of debate. While some argue for the charm and individuality umpires bring to the game, others advocate for consistency and accuracy through technological intervention. Major League Baseball (MLB) has experimented with systems like the Automatic Ball-Strike (ABS) and Challenge systems in the minor leagues to automate pitch calling and address inconsistencies. Our study delves into the impact of these systems on umpire performance and tendencies during the 2023 Triple-A season.
Drawing on existing literature, we contextualize our analysis within the broader landscape of umpire evaluation and the influence of technology on their decision-making. Leveraging pitch-level data from the MLB API and PITCHf/x system, we conducted a comprehensive multi-treatment analysis to assess umpire behaviors under different conditions. Our modeling approach, employing an xgboost model, allowed us to predict the probability of incorrect calls and evaluate umpire performance through misclassification rates.
Our findings reveal notable shifts in umpire accuracy and tendencies under the ABS and Challenge systems. Umpires demonstrated improved performance, particularly under the Challenge system, potentially influenced by feedback mechanisms and increased oversight. We observed changes in strike zone dimensions and variations in umpire behavior across different pitch counts, shedding light on the evolving dynamics between technology and traditional umpiring practices in baseball. These insights contribute to the ongoing discourse on umpire evaluation and the integration of technology in professional baseball, with implications for future applications and improvements in the sport.
Jeremy Losak is an Assistant Professor of Sport Analytics in the Department of Sport Management at Syracuse University’s David B. Falk College of Sport and Human Dynamics. His research focuses on the economics of sports, particularly baseball labor markets, attendance at sporting events, gambling markets, and college athletics. Previous SABR Analytics Conference research presentations include “Comparing Age Curves Across the MLB, KBO, and NPB” (2023), “Behavioral Biases in Daily Fantasy Baseball: The Case of the Hot Hand” (2022), “MLB Home Field Advantage Without Fans” (2021), and “What’s Hanging? An Empirical Definition And Defining Attributes For The Hanging Pitch” (2020).
Jason Maddox is an Assistant Professor of Sport Analytics in the Department of Sport Management at Syracuse University’s David B. Falk College of Sport and Human Dynamics. His research is focused on in-game performance evaluation and in-game strategy across many sports. He previously interned with the San Diego Padres in their Department of Research and Development.
Jonah Soos is a Syracuse University Sport Analytics undergraduate also pursuing an Applied Data Science masters degree. He has focused his research efforts on analyzing umpire accuracy and strike-calling tendencies in Major and Minor League baseball, in addition to quantifying catcher and pitcher success holding runners, previously presenting his research at the 2023 NINE Spring Training Conference and 2023 Midwest Sports Analytics Meetings. He was part of overall winning teams for the Syracuse Basketball Analytics and Football Analytics Blitz competitions, and among the finalists in the first annual Cincinnati Reds Hackathon.
10:00-10:30 a.m. MST
RP8: Predicting Pitch Sequences Using Conditional At-Bat Modeling and Neural Networks
Connor Turner
- Video: Click here to watch a video replay of Connor’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Connor’s presentation (.pptx)
In baseball, being able to predict what pitch is coming next can give the hitter a major advantage. As such, teams are always looking for better ways to understand how the pitcher sequences their pitches, and analytics departments want to develop better models to predict these sequences. Neural network models can potentially be useful for this purpose; however, the results from such models thus far have been disappointing. In this presentation, a novel approach to predicting pitch sequences is introduced to improve on these earlier models.
Building upon previous work in at-bat modeling, conditional probabilities for each pitch type were derived given the previous pitch type, the current count, and the handedness of their opponent. These probabilities were calculated for both the pitcher and the hitter at each data point, allowing the resulting models to not only take the pitcher’s tendencies into consideration, but also how similar pitchers tend to attack the hitter in the same situation. Using Statcast pitch-by-pitch data, these conditional probabilities were calculated and used as inputs for two types of neural network models. Models were trained for each pitcher season in the dataset, and the average test accuracy was reported to compare each model in the study. Overall, the best-performing model averaged just under 60% accuracy on validation data, with a maximum accuracy of 87%. This is much better than a naïve classifier, an improvement over previous neural network models, and on-par with the best machine-learning models, suggesting this approach may be useful for developing new pitch prediction models in the future.
Connor Turner is a data science and machine learning researcher currently studying abroad in Sweden. He earned a bachelor’s degree in Quantitative Social Science from Dartmouth in 2020, and this summer he will be graduating from Linköping University with a master’s degree in Statistics and Machine Learning. In addition to his research and data science work, he also runs The Diamond, a baseball-centered YouTube channel with over 1,000,000 total views and nearly 10,000 subscribers. You can find him @connorbturner and @TheDiamondOnYT on Twitter. Previous SABR Presentation: “The Pinch-Hitter Problem: Using Markov Chains to Analyze Outcomes in Pitcher-Batter Matchups” (2021).
11:45 a.m.-12:15 p.m. MST
RP9: Weighted Bullpen Management Score Plus: Quantifying MLB Pitching Change Decision Making
Sean Sullivan
- Video: Click here to watch a video replay of Sean’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Sean’s presentation (.pptx)
It is well documented how the usage of data and analytics within Major League Baseball (MLB) has shifted how the game is played. This is no different for how teams manage their bullpens. In fact, the bullpen usage strategies deviated from the historical norm so much that in 2020, MLB instituted a Three-Batter Minimum rule for all pitchers. This was meant to curtail the incessant pitching substitutions that often led to teams using a pitcher for a single plate appearance. Even with the rule change, teams still utilize an array of information to facilitate bullpen choices. The goal of a pitching change is to limit runs, so this project sought to measure a pitching change decision through the lens of understanding how many runs a chosen pitcher would be expected to surrender and compare that to the runs that would be expected had the change never occurred or had the team chosen a different player from the bullpen.
This measurement, known as the Weighted Bullpen Management Score Plus (wBMS+), was generated by running a program that simulated the runs scored as the result of plate appearance matchups. wBMS+ is a composite score that breaks down a pitching change decision into two main components that measure whether the pitcher inserted was the best choice and whether the inserted pitcher was left in too long. For the first component, for each pitching change decision, the inserted pitcher, removed pitcher, and available pitchers in the bullpen had their plate appearance matchups, that satisfied the Three-Batter Minimum rule, simulated and the results were compared and weighted by leverage to generate a performance metric. For the second component, plate appearances that occurred after the Three-Batter Minimum rule was satisfied were considered, the inserted pitcher and available pitchers in the bullpen had those specific plate appearance matchups simulated and the results were also compared and weighted by leverage to generate a performance metric. The resulting composite score was normalized and indexed against to create a simple to understand metric that informed us just how good or bad a team’s overall decision making was in regard to managing their bullpen.
There has been previous research on this subject and it helped set the foundation for my own approach. Tim Kniker developed Bullpen Management Above Random which sought to quantify how well a manager used their best pitchers in the highest leverage situations. Rob Arthur and Rian Watt also attempted to measure how well managers were doing at deploying their best pitchers in high impact situations with their Weighted Reliever Management Score. What differentiates my approach from theirs is the focus on simulating pitcher-batter matchups.
The expected contribution of this project is two fold. Primarily, it will offer as a resource for others as the methodology will be made publicly available. Secondarily, all results will be made public thus contributing insights that will hopefully aid debates regarding bullpen management decisions for years to come.
Sean Sullivan is a Data Scientist in the retail industry. He received his B.S. degree in Environmental Science from the University of Illinois at Urbana-Champaign and his M.S. degree in Data Science from DePaul University. Sean runs his blog, URAM Analytics, where he posts open roles in the sports data science and analytics space, shares and amplifies the work of other researchers, and posts his own original sports data science research projects. He previously spoke at the 2022 SABR Analytics Conference, where he presented “Exit Velocity Over Expected: An Evaluation of MLB Batted Ball Data.”
12:15-12:45 p.m. MST
RP10: Introducing Grid WAR: Rethinking WAR for Starting Pitchers
Ryan Brill and Abraham Wyner
- Video: Click here to watch a video replay of Ryan’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Ryan’s presentation (.pdf)
Industry-standard models of WAR for starting pitchers from FanGraphs and Baseball Reference all assume that season-long averages are sufficient statistics for a pitcher’s performance. This is wrong for many reasons, especially because WAR should not be linear with respect to any counting statistic (including R/9, xRA, FIP, or wOBA). By ignoring convexity, exceptional games are undervalued and blow-up games are overcharged, in some cases causing pitchers to “lose” more than one game in a single outing.
To repair this defect, as well as many others, we devise a new measure, Grid WAR, which accurately estimates a starting pitcher’s WAR on a per-game basis. Grid WAR is a convex function of counting statistics, which diminishes the impact of “blow-up” games and up-weights exceptional games, raising the estimated value of pitchers like Sandy Koufax and Catfish Hunter who exhibit considerable game-by-game variance (e.g. mostly amazing with occasional blow-ups). In calculating WAR this way, we are also able to properly adjust for game-level variation in ballpark and opponent quality. Although Grid WAR is designed to accurately measure historical performance, it has predictive value insofar as a pitcher’s Grid WAR is better than Fangraphs’ FIP WAR at predicting future performance. Finally, at https://gridwar.xyz we host a Shiny app which displays the Grid WAR results of each MLB game since 1952, including career, season, and game level results, which updates automatically every morning.
Ryan Brill is an applied mathematics PhD student at the University of Pennsylvania advised by Professor Abraham Wyner. He is interested in statistics and probability and particularly enjoys working with sports data. He also loves Indian and Thai food and roots for the Dodgers and Lakers.
Abraham Wyner is a tenured Professor in Statistics at the University of Pennsylvania, and an expert at Probability Models and Statistics. His principle focus at Wharton has been research in Applied Probability, Information Theory and Statistical Learning. He is faculty co-director of the Wharton Sports Analytics and Business Initiative and a co-host of “Wharton Moneyball” on SiriusXM Business Radio. He also created the Wharton Moneyball Academy summer program. While he has consulted across many industries, he takes a particular liking to Major League Baseball.
For more information on the 2024 SABR Analytics Conference, visit SABR.org/analytics.