2023 SABR Analytics Conference Research Presentations
SABR and Sports Info Solutions are pleased to announce the list of research presentations for the SABR Analytics Conference on March 10-12, 2023, in Phoenix, Arizona.
Abstracts and presenter bios for each research presentation can be found below. Click on a link below to watch a video replay or download PowerPoint slides (where available.)
Friday, March 10
2:15-2:45 p.m. MST
RP1: Comparing Age Curves Across the MLB, KBO, and NPB
John Asel and Jeremy Losak
- Video: Click here to watch a video replay of John and Jeremy’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from John and Jeremy’s presentation (.pptx)
Identifying how players age is critical in baseball, especially when teams look to forecast player talent into the future. While aging patterns among Major League Baseball (MLB) players are well-documented, to this point there is no research examining aging patterns for players in different leagues. In this study, we compute aging curves for batters in the KBO (Korean Baseball Organization), NPB (Nippon Professional Baseball), and MLB. We then compare these aging curves to identify if players in different leagues exhibit different aging behaviors.
There are reasons to believe aging patterns may be different. First, different skill sets are prioritized in different leagues. For example, Leeds et al. (2012) showed that NPB rewards small ball performance factors more than MLB teams. Second, there may exist genetic-driven physical differences between players in each league (for example, the average adult male Japanese man is two inches shorter than the average adult male American man). Third, game load differences may lead to quicker physical decline. For example, MLB seasons consist of 162 regular season games, compared to 143 games in NPB and 144 in KBO.
Our data come from Lahman (MLB) and Baseball-Reference (KBO and NPB). We focus our analysis from 1982 to 2022, as 1982 is the first season in which we have data for all three leagues. In total, we have 27,682 player-season pairs (16,694 in MLB, 6,487 in NPB, and 4,501 in KBO), covering 5,157 unique players (2,918 MLB, 1,356 NPB, 889 in KBO), which we use to estimate aging patterns with generalized additive models.
Results show mostly similar aging patterns between MLB and NPB. OPS peaks around 26.7 years old, K% peaks around 29.3 years old, BB% peaks during a player’s early 30s, and HR% peaks between ages 27 and 29 with the peak being more pronounced for MLB hitters. MLB players also show slightly longer longevity. KBO players, on the other hand, exhibit a different aging pattern. Compared to MLB and NPB players, KBO players enter the league closer to their career averages, and then decline more quickly following their peak (for OPS, BB%, and HR%). KBO players, on average, are also younger relative to the other leagues. This may highlight cross-cultural differences in training or development regimens that lead to different aging patterns.
John Asel is a Syracuse University Sport Analytics and Economics student. He has interned for Driveline and the Tampa Bay Rays, and won the 2022 Doug Pappas Award at SABR 50. He will be joining the Baltimore Orioles analytics department upon graduation.
Jeremy Losak is an Assistant Professor of Sport Analytics in the Department of Sport Management at Syracuse University’s David B. Falk College of Sport and Human Dynamics. His research focuses on the economics of sports, particularly baseball labor markets, attendance at sporting events, gambling markets, and college athletics. Previous SABR Analytics Conference research presentations include “Behavioral Biases in Daily Fantasy Baseball: The Case of the Hot Hand” (2022), “MLB Home Field Advantage Without Fans” (2021), and “What’s Hanging? An Empirical Definition And Defining Attributes For The Hanging Pitch” (2020).
2:45-3:15 p.m. MST
RP2: Effect of Previous Innings Played on Plate Appearances
Alex Oppel
- Video: Click here to watch a video replay of Alex’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Alex’s presentation (.pptx)
While the effect of rest and workload on pitcher performance is well acknowledged, the evaluation of rest, or lack thereof, on hitter performance is limited. Consequently, this study quantifies the effect of position player workload on various plate appearance outcomes. Baseball Reference game log data from the 2010s are used to calculate each batter’s innings played totals over the previous seven days. Binary logistic regressions using Retrosheet Event Files are built to model the probability of a hit, walk, strikeout, and home run occurring. Innings played totals over the past week are included in all four models to determine the effect of workload on batter performance.
Results indicate the effects of innings played over the previous seven days on walks, strikeouts, and home runs is insignificant; however, weak evidence exists to suggest a quadratic relationship for the effect of batter workload on hit probability. Holding all other variables constant, hit probability is maximized at 42 innings played, potentially suggesting batters could use an occasional day off to stay fresh.
Originally from Muscatine, Iowa, Alex Oppel is majoring in Sport Analytics at Syracuse University, where he serves as president of the Baseball Statistics and Sabermetrics Club, assists the softball team with game strategy and analytics, and works as a research assistant for Sport Analytics faculty. He previously completed an internship as a data analyst for the Northern Colorado Owlz and aims to pursue a position in baseball analytics after graduation.
Saturday, March 11
9:00-9:30 a.m. MST
RP3: Predicting Hitting Power using Energy Flow Analysis in High School, Collegiate, and Professional Baseball Players
Arnel Aguinaldo
- Video: Click here to watch a video replay of Arnel’s presentation (YouTube)
- Slides: Click here to download the PowerPoint slideshow from Arnel’s presentation (.ppsx)
Hitting performance is largely dependent on the batter’s ability to produce a powerful swing with a kinetic chain through which mechanical energy flows between rotating body segments. Evidence in the current literature suggests that the energy generated, absorbed, and transferred by the net torques at the lead hip, back hip, and lumbosacral (L5S1) joints contribute to increased pitch velocity in baseball pitchers. However, it is unclear if energy flow patterns are similar during baseball hitting, particularly across different levels of baseball. Therefore, this analysis aimed to examine the relationship of energy generation, absorption, and transfer through the pelvis and bat velocity and to determine how they differ between high school, collegiate, and professional baseball hitters.
Kinematic and kinetic analyses were performed on high school (n=13), collegiate (n=13), and professional (n=11) baseball players using 3D motion capture and force platform technologies. Segment and joint torque powers were estimated using a conventional energy flow analysis that indicate the rates of energy generation, absorption, and transfer by joint torques. We employed a regularized regression analysis to determine the energy flow and ground reaction force predictors of bat speed, which was measured using an IMU-based bat swing sensor. High school, collegiate, and professional players exhibited bat speeds of 27.1 ± 2.7 m/s, 30.7 ± 1.9 m/s, and 32.3 ± 3.3 m/s, respectively, the difference between which was statistically significant (p < .001). While the rates of energy flow were similar across all levels of hitters, more experienced players exhibited higher levels in energy transfer at the lead hip during acceleration as well as in energy absorbed by the L5S1 joint during follow-through. Our model predicted bat speed with an RMSE of 1.1 m/s with roughly 60% of its variance explained by level, back leg impulse, back hip transfer, L5S1 generation and transfer during bat acceleration, and lead hip absorption and transfer during follow-through.
These findings reinforce the crucial role of rotations of the back hip and trunk during bat acceleration as well as the deceleration induced by the torques at the lead hip and L5S1 joints during follow-through. Hence, training strategies that enhance the muscular strength and mobility of the trunk and hips could potentially improve bat speed and overall power of the swing.
Arnel Aguinaldo is an Associate Professor of Kinesiology and Director of the Biomechanics and Pitching laboratories at Point Loma Nazarene University. He has earned an undergraduate degree in Bioengineering and a Ph.D. in Health and Human Performance and is a board-certified athletic trainer. Dr. Aguinaldo currently serves on the board of directors of the American Baseball Biomechanics Society and as a scientific advisor to Major League Baseball. His research aims to understand the biomechanical implications of injury risk and player performance in baseball pitching and hitting.
9:30-10:00 a.m. MST
RP4: The Next Frontier of Biomechanics in Baseball: Deception, Pitch Tipping, and Mechanical Evaluation
Alex Caravan and Kyle Wasserberger
- Video: Click here to watch a video replay of Alex and Kyle’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Alex and Kyle’s presentation (.pptx)
Professional baseball is inundated with more human movement data than ever before. Of particular interest are the potential use cases to 1) help explain why some players over or underperform projections and 2) track player development over time. As early adopters of motion capture data in baseball, Driveline has gone through several iterations of biomechanics-driven player evaluation and development decision systems. Our presentation offers a brief overview of some of the ways Driveline incorporates biomechanics data for cross-sectional talent evaluation and longitudinal player development. We focus on previous work into pitch tipping, mechanical similarity scoring, and pitcher deception.
Alex Caravan is the Vice President of Business Operations at Driveline Baseball. Having recently transitioned from the company’s R&D side, Alex’s past and current responsibilities have involved building predictive performance-based models and research insight-based APIs that pipe into Driveline’s training, software and enterprise offerings. He has been with Driveline Baseball since January 2018 and runs a weekly podcast that focuses on the ever-changing research frontier of player development in baseball. He graduated from the University of California at Berkeley in 2015.
Kyle Wasserberger is the Principal Sport Scientist and Coordinator of Health and Performance Research at Driveline Baseball, where his responsibilities include incorporating player and coach feedback to streamline and scale how Driveline uses biomechanics data and collaborating with researchers in academia and industry to advance baseball sport science research. He holds a bachelor’s degree in exercise science from Calvin University as well as master’s and doctorate degrees from Auburn University.
10:00-10:30 a.m. MST
RP5: Impact of Stride Length on Pelvis Energy Flow: A Retrospective Analysis
Taylor La Salle
- Video: Click here to watch a video replay of John and Jeremy’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Taylor’s presentation (.pdf)
Proper timing and sequencing of the kinetic chain is key to efficient pitching, with contributions from lower proximal segments impacting upper distal segments. Current literature of energy flow suggests that the hip joints of the pelvis and the lumbosacral (L5S1) joints contribute to improved pitching performance among pitchers. Energy flow between lower proximal segments and upper distal segments can be characterized as energy generated, absorbed, and transferred. Previous research has shown that pitchers who incorporated a shorter stride exhibited insufficient total body momentum in the intended throwing direction despite maintaining ball velocity compared to their longer stride pitches. Therefore, the purpose of this retrospective analysis is to measure energy flow through the pelvis among pitchers who maintained ball velocity despite altering their stride length.
In a randomized cross-over design, pitchers were instructed to pitch on flat ground employing either a 25% increase from their normal stride length (OS) or 25% decrease from their normal stride length (US). Kinematic and kinetic analyses were performed on high school and collegiate pitchers (n=15) using 3D motion capture synchronously with ground reaction force (GRF) technology. Segment and joint torque powers were calculated using a conventional energy flow analysis that utilizes joint torques to determine the rates of energy generation, absorption, and transfer. Back hip and lead hip were isolated to determine rates of energy flow of the pelvis with special consideration for the L5S1 joint. Paired t-tests were applied to determine energy flow differences between stride conditions.
While rates of energy generation, absorption and transfer were similar between stride lengths, we did observe some differences in GRF and energy at the hip. On the drive leg, we observed a greater amount of force applied towards home plate during OS compared to US, suggesting a greater transfer of momentum from lower body to upper body over time during OS. The stride leg produced a greater amount of braking force in the OS compared to the US. During the stride phase, we observed greater energy absorption at the back hip in the OS compared to US pitches. We also observed a greater amount of energy transfer at the L5S1 joint during arm acceleration in the OS pitches compared to US. These differences in forces from drive leg to stride leg as well as the energy differences at the back hip and L5S1 joint showcase changes that occur for pitchers with greater stride lengths or for an individual who is trying to lengthen their stride during pitching.
Taylor La Salle is the Lab Coordinator for the Point Loma Pitching Lab at Point Loma Nazarene University. He has earned a bachelor’s degree in Exercise Science at Utah Valley University and a master’s degree in Exercise Physiology at the University of Nebraska at Omaha, followed by a master’s degree in Kinesiology from Point Loma Nazarene University. With his background in exercise physiology and biomechanics combined with his passion for baseball, his research focuses on baseball player development.
2:30-3:00 p.m. MST
RP6: Evaluating Minor League Defense Using Detailed Contextual Data
Sarah Thompson
- Video: Click here to watch a video replay of Sarah’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Sarah’s presentation (.pptx)
As player tracking data and the ability to better evaluate players’ defensive abilities has proliferated at the major league level, the need for similar data and insights at the minor league level has increased in importance. Sports Info Solutions (SIS) has been collecting data on the minor leagues for a decade now, which has allowed for the increased understanding of defensive performance at that level via metrics like Defensive Runs Saved. Over that time we have continued to expand the data collection to include things like fielder positioning data, which has in turn allowed us to refine the metrics for defensive evaluation. This presentation will feature SIS’s latest research efforts to provide insight into minor league defense through the implementation of fielder positioning data, as well as how certain contextual elements of defensive play can help inform the evaluation of a player’s performance.
Sarah Thompson is a Research Analyst at Sports Info Solutions. She previously worked in the healthcare software industry and received her bachelor’s degree in Mathematics from Ursinus College and her master’s degree in Statistics from Villanova University.
3:00-3:30 p.m. MST
RP7: Digging Deeper into the Seam Shifted Wake Effect
John Garrett
- Video: Click here to watch a video replay of John’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from John’s presentation (.pptx)
Since 2019, Particle Image Velocimetry research has been conducted in the Utah State University Experimental Fluid Dynamics Laboratory to better understand the impact of seams on baseball aerodynamics. Early research clearly showed seams can alter the wake formation on the baseball, and at times the position of seams can shift the wake to occur earlier on one side of the ball than the other. The resulting asymmetric pressure distribution due to this Seam Shifted Wake (SSW) creates a force acting on the ball that could affect pitch movement. This was shown in lab experiments at USU and further proven by MLB’s Hawk-Eye system where pitch movement is evident in both the Magnus and non-Magnus directions. The SSW effect altering pitch movement has also been proven by 3D spin axis and break differences reported by previous iterations of Rapsodo and Trackman products.
Since SSW has the ability to create unexpected movement for pitchers, it is important to understand the underlying physics of why and when it happens. As with any research, the more we uncover, the more questions are revealed. This presentation will discuss research into the impacts that seam orientation, seam height, velocity, air density, and spin rate have on the physics of the SSW effect.
John Garrett is an Analytics Engineer at Rapsodo. He received his B.S. and M.S. degrees in Mechanical Engineering from Utah State University, graduating in 2022. He conducted research on baseball aerodynamics and the Seam Shifted Wake effect in the USU Experimental Fluid Dynamics Laboratory for three years as an Undergraduate and Graduate Research Assistant to Dr. Barton Smith. Before completing his M.S. degree, he served as a Baseball Data Intern with Major League Baseball.
3:30-4:00 p.m. MST
RP8: Pitchers Hitting: A Requiem
Rob Mains
- Video: Click here to watch a video replay of Rob’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Rob’s presentation (.pptx)
The 2022 season, the first full season with the designated hitter in both leagues, marked the end of pitchers batting. Several reasons were given for adopting the rule throughout MLB: Reduce pitcher injuries, lengthen benches by removing the need for a battalion of pinch-hitters, create uniformity between the leagues as interleague play increases. But, obviously, the main motivation was to increase offense, thereby scoring, increasing the sport’s popularity and the revenues associated with that.
On the latter score, the change would appear to be a bit of a misfire. There were 4.34 runs scored per National League game with the universal DH in 2022 compared to 4.46 in 2021. That can be considered a victory, however, when contrasted to the much steeper decline, from 4.60 to 4.22, in the American League.
In this presentation, I will discuss how the elimination of pitcher hitting affected offense in the National League in 2022 and evaluate how it compared to the American League. In particular, I will examine how historical differences between the two leagues’ strategies affecting both hitters and pitchers evolved during the 48 seasons during which they played under separate rules and note how the elimination of pitchers hitting could accelerate some trends while reversing others. I will examine scoring patterns, pitcher usage, substitutions, and the role of the designated hitter.
Rob Mains is a writer for Baseball Prospectus. His “Veteran Presence” column runs twice a week. He is a former equities analyst and was a finalist for the 2018 SABR Analytics Conference Research Award for Historical Analysis/Commentary. He is a SABR Analytics Certification course reviewer.
Sunday, March 12
9:30-10:00 a.m. MST
RP9: How to Effectively Communicate Baseball Research
Rebekah Callari-Kaczmarczyk
- Video: Click here to watch a video replay of Rebekah’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Rebekah’s presentation (.pptx)
Communicating findings effectively is an essential element of disseminating good research. Effective research communication is understandable and convincing, but producing writing that fulfills these criteria can be challenging especially for a newer analyst. Research and technical writing is an extensive field of study with a multitude of readily accessible books, research publications and websites, but resources for the baseball analytics community specifically are lacking. In addition, many of the resources readily available to the public are based in opinion rather than research. In this presentation, I will discuss some of the main challenges of writing effectively about baseball analytics and will present a genre-based approach that can be used to make writing more understandable and convincing to fellow analysts and baseball decision-makers.
A genre is the communication of a discourse community (a group of people who communicate while working towards a common goal) and is typically defined by unique characteristics such as shared language patterns and vocabulary. Using these unique language characteristics marks an individual as part of the discourse community, making their discourse more understandable and convincing to other community members. The baseball analytics discourse community is composed of both researchers and readers from diverse backgrounds with varying levels and types of knowledge. For example, a student may have limited experience writing about their independent research and have received feedback from fellow researchers that their writing is disorganized or “not academic” enough while a statistician may have extensive writing experience in academia but struggle to communicate these findings in a way that is accessible to baseball leadership. Understanding the intended discourse community – what the readers know about the topic and what they expect – enables the writer to communicate in a convincing manner.
This presentation will provide research-based tools to help attendees clearly define their intended audience and produce the patterns or structures expected by that audience. I will review the typical structure of a research write-up and how the sections should vary depending on the intended discourse community. Although this will be most helpful to those new to writing about baseball analytics, this is relevant to anyone seeking to improve their written research communication.
Rebekah Callari-Kaczmarczyk is a Lead Software Engineer in Research and Development with the Philadelphia Phillies. Before transitioning to software engineering, she taught academic and technical communication to graduate students at Duke University. She has a certificate in Data Analytics and a master’s degree in Applied Linguistics: TESOL from Georgia State University.
10:00-10:30 a.m. MST
RP10: Using Seam Orientation Data from Optical Sensors for Pitch Modeling and Design
Glenn Healey
- Video: Click here to watch a video replay of Glenn’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Glenn’s presentation (.pptx)
Researchers at Utah State University have used Particle Image Velocimetry to show that a baseball’s seams can advance boundary layer separation which causes a Seam Shifted Wake (SSW) force on the ball. High-resolution optical ball-tracking sensors enable measurement of the orientation of a ball’s seams relative to the spin vector. This orientation can be characterized using various representations including spherical coordinates and Mollweide projections. We use a large set of seam orientation and pitch trajectory data acquired by the Yakkertech optical ball-tracking system to build a physical model for the dependence of the SSW force on seam orientation, pitch release parameters, and the atmospheric conditions. The result is a new aerodynamic model that includes all of the forces that are known to affect the trajectory of a baseball. We show that this model provides more accurate predictions for pitch trajectories than current models. The new model can be combined with intrinsic pitch values to enable the efficient design and optimization of pitches. We demonstrate the use of seam orientation to manipulate pitch shape using the BallR visualization environment. By quantifying the relationship between seam orientation, pitch trajectory, and pitch value, we also provide analysts with a new tool for evaluating, comparing, and forecasting the performance of pitchers.
Glenn Healey is a Professor of Electrical Engineering and Computer Science at the University of California, Irvine where he is director of the Computer Vision Laboratory. He received the B.S.E. degree in Computer Engineering from the University of Michigan and the M.S. degree in computer science, the M.S. degree in mathematics, and the Ph.D. degree in computer science from Stanford University. Dr. Healey’s work focuses on combining physics, statistical signal processing, and machine learning methods for the development of algorithms that extract information from large sets of data. He has been elected a Fellow of IEEE and SPIE.
10:30-11:00 a.m. MST
RP11: Quantifying In-Game Command: Observed Data Meets Intended Location
Ryan Reinsel
- Video: Click here to watch a video replay of Ryan’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Ryan’s presentation (.pptx)
Teams and analysts have been trying to find a way to quantify command for decades. These processes have included measuring from the middle of home plate, from where the catcher setup, to using walk rate, edge percentage and manual tagging at both the college and professional level. Metrics have been created to measure how command works with called strikes in and out of the zone, but nothing has ever truly captured how well a pitcher can command his pitches during a game. With the advent of new data available, we can now know the intended location of each pitch. The combination of knowing the plate location height and side of a pitcher’s intended location and the observed location allow for new metrics to be added for each pitch, such as pitch miss distance and pitch miss direction.
This presentation will explore how definitive location data opens up a new sector of command metrics that can be developed. From a game perspective we can look at Execution Rate, Sequential Command Grades, and overall Command Scores. We’ll also take a look at the relationship with ballistic pitch and hit data regarding how well pitchers can not only throw a pitch where they wanted to, but does it have the spin and movement profile that is expected. For example, seeing how a pitcher’s breaking ball movement and spin profile might change across the plate will provide explanation and new analysis for player development. From a game level, looking at command grades for a starting pitcher after having a low score compared to league average, yet taking a no-hitter into the 6th inning gives us new questions to ask about the connection between “Stuff” and actually hitting spots.
Defining in game command with precise data gives the industry an opportunity to grade, improve, teach, and scout pitchers in a unique way. From identifying a pitcher’s command margin of error based off of their current arsenal, to leveraging seam-shifted wake models to better build and simulate an optimal sequence, we’ll explore an untapped data avenue that is ready to be further researched.
Ryan Reinsel is Vice President of Innovation at BaseballCloud and the creative mind behind the baseball industry’s newest data visualization software, BallR and PitchR. Equipped with a Fine Arts degree and a passion for technology, Ryan is a proven innovator at creating contextual applications to drive player development forward at the professional and collegiate level. Prior to BaseballCloud, Ryan held positions in multiple sectors of baseball including a role in Research and Player Development for the Chicago Cubs at their Arizona complex.
11:45 a.m.-12:15 p.m. MST
RP12: Trends in MLB Pitch Tempo Since the Start of Pitch Tracking
Andy Andres
- Video: Click here to watch a video replay of Andy’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Andy’s presentation (.pptx)
This past year, MLB and Baseball Savant released pitch tempo data (the time between pitches) in advance of the new rules for the 2023 MLB Season. Statcast has recorded a time stamp for the start of the pitch when released, and Savant made the considered choice to include “takes” only when measuring pitch tempo, defined as “pitch release to pitch release.” Year by year league, team and individual player trends since 2010, player service time and how it relates to recent measures of pitch tempo, and some early spring training data will be presented.
Andy Andres is on the faculty of Boston University where he has been teaching various natural sciences, mathematics, and data science courses for more than two decades. He developed and taught the first college course in Baseball Analysis and Sabermetrics at Tufts University where he was a Visiting Lecturer in Sabermetrics. He also initiated, designed, and taught the highly popular BU MOOC “Sabermetrics 101: Introduction to Baseball Analytics” to more than 70,000 registered students on the edx.org platform. His former students and mentees occupy front offices in the NBA, MLS, and MLB, where one was recently named General Manager. Andy is also a part-time MLB stringer and pitch clock operator at Fenway Park, a Data Analyst for BaseballHQ.com, and former Head Coach/Lead instructor for the MIT Science of Baseball Program. He has been a member of SABR since 2003.
12:15-12:45 p.m. MST
RP13: Noisy Judgments: A Probability Surface-Based Analysis of Umpiring Variability
Emily-Anne Patt and James Stockton
- Video: Click here to watch a video replay of Emily-Anne and James’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Emily-Anne and James’s presentation (.pdf)
In any repeated evaluation of events, we expect variability in expert assessment. Within a corpus of data spanning multiple experts’ evaluations across multiple events, this noise manifests as both interexpert and intraexpert variability in the assessed outcomes. Called balls and strikes in Major League Baseball provide a clear example of this noise in human judgment. To assess the prevalence, magnitude, and trends of this noise, we utilized 15 seasons of available PITCHf\x and Statcast data showing the locations of ~5.3 million individual pitches called either ball or strike. We generated a prior probability surface across the strike zone representing the average strike zone as called by the entire umpiring corps. This surface shows the actual behavior of these experts when making evaluations, providing a novel methodology for assessing trends and variability across MLB’s umpire corps. We use this surface to evaluate changes in the actual strike zone over time as well as individual umpire performance. We present details of the probability surface construction method along with initial results stemming from its application. We preview future applications of the best fit strike zone surface model for in-depth player performance. These detailed analyses are made possible by the carefully designed fitting methodology for the strike zone probability surface. Accounting for sampling error and providing a well-defined functional form for the optimizer allows the model to arrive at reliable results from smaller subsets of data than previous methods. Pitcher-Batter, Umpire-Pitcher, and similar specific match-ups are available for study.
Emily-Anne Patt is the manager for quantitative intelligence and methodologies supporting security and resilience at Alphabet, Inc. based in Washington, D.C. Her background is in econometrics and financial economics, with prior experience at the US Department of State, Federal Reserve Board of Governors, US Department of Treasury, and as a data science consultant for US federal government clients.
James Stockton, PhD, is the lead data scientist at Altamira Technologies supporting the United States Air Force Chief Data and AI Office based in northern Virginia. His background is in astronomy and astrophysics and he has worked as a data scientist in private industry, academia, and supporting federal customers in the IC/DoD space for the past 10 years.
12:45-1:15 p.m. MST
RP14: A Markov Chain Matchup Model Using Multi-variable Approach
Jeff Jin and Chris Zexin Chen
- Video: Click here to watch a video replay of Jeff and Chris’s presentation (YouTube)
- Slides: Click here to download PowerPoint slides from Jeff and Chris’s presentation (.pptx)
Markov Chain is a widely used analytical framework in baseball that can be used to estimate run scored and optimize lineups. In 2021, A novel approach was introduced at SABR by Connor Turner who used Markov Chain to simulate a hypothetical matchup between Mike Trout and Justin Verlander. Here, we would like to introduce a refined version of the Markov Chain matchup model (PA Markov Chain) as an expansion on Turner’s work.
The PA Markov Chain uses a player’s historical transitional probabilities from one count to another to model the expected event probabilities of his future PA. A player’s tendency to move through a plate appearance is characterized by a probability matrix (T). We can then use T, through the Markovian Process to generate a stationary matrix (P), which represents the expected PA event probabilities of a certain player vs. the league. Turner proposed to take the weighted average between a pitcher’s transition matrix Tp, and a batter’s transition matrix Tb, as the new transition matrix Tpb as they were to play each other. We expanded on this method by first testing its validity using Muli-Outcome Brier Score, then incorporating additional variables (i.e. platoon, park factors, temperature, etc.) before fitting machine learning models to improve the accuracy. Our current model achieved noticeable improvement against Turner’s baseline approach and is on par with the Log5 matchup model.
This work is still ongoing but the preliminary results are promising. We are currently in the process of incorporating additional variables and expect to make further improvements. This model will be beneficial to the analysis and prediction of matchup results. It can be further built into full-game Markov Chain simulation to perform team level analysis.
Jeff Jin is a recent graduate from the MS Sports Business program at NYU and the co-founder of NYU Sports Analytics and Technology Association. He is a self-taught sports analyst and data scientist whose recent work is featured in the presentation “Win With AI: Exposing Winning Strategies In Sports” at the 2023 MIT Sloan Sports Analytics Conference. He will be joining Cleat Street Capital as a Baseball Analyst.
Chris Zexin Chen is a current data science undergraduate at NYU and the co-founder of NYU Sports Analytics and Technology Association. He is a sports enthusiast with a strong interest in analytics and has experience in modeling basketball and baseball. He is part of the demo team for the presentation “Win With AI: Exposing Winning Strategies In Sports” at the 2023 MIT Sloan Sports Analytics Conference.
For more information on the 2023 SABR Analytics Conference, visit SABR.org/analytics/2023.