SABR and Baseball Info Solutions are pleased to announce the schedule for research presentations for the SABR Virtual Analytics Conference on March 11-14, 2021. Register now to join us! All baseball fans are welcome to attend.
Key Factors in Pitching Biomechanics for Fastball Consistency Glenn Fleisig, Alek Diffendaffer, Jonathan Slowik
Two keys for successful baseball pitching are ball velocity and location. While previous biomechanical studies have shown correlations with ball velocity, the relationship between pitching biomechanics and accuracy has not been investigated. The purpose of this study was to identify which aspects of fastball pitching mechanics are related to consistency. Data were analyzed for 47 healthy baseball pitchers from a wide range of levels (2 youth, 28 high school, 14 collegiate, and 3 professional). Testing was conducted in a biomechanics laboratory, where each pitcher threw 10 full-effort fastballs the regulation distance from a mound to a home plate. Pitchers were instructed to aim at the middle of a strike zone target suspended above home plate. Three-dimensional pitching biomechanics were tracked with a 12-camera automated motion capture system (Motion Analysis Corporation, Santa Rosa, CA) at 240 frames/second. Ball location and velocity were measured with a PITCHf/x system (SMT, Freemont, CA). 20 kinematic parameters were calculated from the motion capture data while ball vertical and horizontal position at home plate were measured with PITCHf/x. Stepwise linear regression analysis using backwards elimination was performed to assess the relationship between variability in the kinematic parameters and pitch location consistency.
The resulting model explained 58% of pitch location consistency and included variability of five kinematic parameters. Three of the parameters occurred at the instant of front foot contact (upper trunk tilt, shoulder abduction, and shoulder horizontal abduction) and the other two occurred when the arm was cocked back (maximum shoulder external rotation and shoulder horizontal adduction). Baseball biomechanists, analysts, and coaches can improve the consistency and performance of their pitchers by reducing variability of shoulder motion during the early portions of the pitch.
Dr. Glenn Fleisig is the Research Director of the American Sports Medicine Institute and the Founding President of the American Baseball Biomechanics Society. He is also an advisor to Major League Baseball, Little League Baseball, and USA Baseball. He earned his engineering degrees from MIT, Washington University, and UAB. Ranked by Expertscape as the top expert in the world on baseball science and medicine, Dr. Fleisig has published 200 scientific articles, delivered 350 presentations throughout the world, and has been interviewed for thousands of stories by the media.
Analyzing the Elements of Pitch Movement Using Hawk-Eye Data
Glenn Healey and Lequan Wang
The trajectory of a pitch is a function of the forces on the ball after it leaves a pitcher’s hand. These forces include gravity, drag, Magnus, and a side force that is theorized to result from an asymmetric flow separation known as a seam shifted wake. The Hawk-Eye system was introduced as MLB’s primary pitch-tracking technology in 2020 and allows separation of the effects of these forces. We develop a separation algorithm that utilizes a Magnus model that was derived from trajectory measurements for more than two million pitches augmented by measurements from weather sensors near the time and location of each pitch. The model quantifies the dependence of the Magnus force on the velocity and spin vectors as well as on changes in seam height on the order of a thousandth of an inch.
The separation process allows us to analyze several properties of the side force. We show that the side force coefficient which determines the force magnitude is bounded by an increasing linear function of the fraction of gyrospin for a pitch. This leads to the definition of side force efficiency which depends on the orientation of the seams over the pitch trajectory and is analogous to the spin efficiency for the Magnus force. We quantify the contributions of the various forces to pitch movement. The ability to model the dependence of movement on quantities that the pitcher controls can be combined with intrinsic pitch values to streamline pitch design. As a case in point, we present visualizations for pitchers who benefit the most and least from adding side force to a pitch.
Glenn Healey is a professor of electrical engineering and computer science at the University of California, Irvine where he is director of the computer vision laboratory. He received the B.S.E. degree in computer engineering from the University of Michigan and the M.S. degree in computer science, the M.S. degree in mathematics, and the Ph.D. degree in computer science from Stanford University. Dr. Healey’s research combines physics, statistical signal processing, and machine learning methods for the development of algorithms that extract information from large sets of data.
Lequan Wang is a Ph.D. student in Electrical Engineering and Computer Science at the University of California, Irvine. He received the B.S.E. degree in Electronic and Information Engineering from Northwestern Polytechnic University and the M.S. degree in Electrical Engineering from the University of California, Irvine.
MLB Home Field Advantage Without Fans
The phenomenon of a home field advantage (HFA) is universally accepted across most major sports at all levels (Pollard & Pollard 2005). While its existence is generally accepted, sources of that advantage have been difficult to detect, especially since many occur simultaneously. The literature identifies some potential causes, including crowd support, stadium and location familiarity, travel fatigue effects, referee bias, and psychological factors, among others.
The COVID-19 pandemic offers a unique natural experiment to isolate elements related to HFA. All 2020 regular season Major League Baseball (MLB) games were played in empty stadiums. Cancellations caused by individual team outbreaks were common, creating randomly assigned “breaks” in the season. For example, the St. Louis Cardinals had a stretch of 16 days between games with 12 postponements and two cancellations. Postponed games were made up with seven-inning doubleheaders instead of traditional nine-inning doubleheaders. Travel was reduced as teams only played games within their regional divisions. For example, the New York Mets, who play in the National League East, only played games against teams in the American League East and National League East.
We use this natural experiment to analyze the effect of fans, travel, workload, and “last licks” on HFA. The 2019 MLB season is our control group, and 2020 is our treatment group. Fishcer & Haucap (2020a) did this for German soccer, but to our knowledge, this is the first paper to analyze HFA in the era of COVID-19 for a professional North American League.
We also examine HFA in betting markets. Under semi-strong-form market efficiency, all past and current information is included in the price. The difficulty in pricing HFA in 2020 is the lack of historical precedence without fans in the stands to guide this updated pricing.
Initial results show HFA appears to be lower in 2020 compared to 2019, but only early in the season, and that difference disappears as the season progresses. Our results also show that teams playing regional games in 2020 see the greatest reduction in HFA, but this reduction is not present for further travel distances. We show no statistically significant effects for batting second or fatigue, although we recognize that may be due to statistical noise and relatively small samples for 2020. Examining betting markets, differences in HFA are not priced in consensus money lines, especially for home underdogs, suggesting that relevant first-time shocks in prediction markets can create temporary inefficiencies.
Jeremy Losak is an Assistant Professor in the Department of Sport Management at Syracuse University, serving as a faculty member for the sport analytics bachelor’s degree program. His research focuses on the economics of sports, particularly baseball labor markets, attendance at sporting events, and daily fantasy betting markets. He earned a Ph.D. in economics from Clemson University. Jeremy is assisted by Joseph Sabel, originally from Basking Ridge, New Jersey, who recently completed his undergraduate degree in sport analytics at Syracuse University and is currently a graduate student at Syracuse studying Applied Data Science.
Lessons Learned from the National League’s DH Experiment
In January 1973, following a season in which American League teams hit .239/.306/.343 and scored fewer than 3.5 runs per game, MLB voted to adopt the “designated pinch hitter” in the Junior Circuit. Teams thus had three months to prepare for the momentous change. Over the ensuing 47 seasons, American League teams developed strategies for the DH based on accumulated experience and available personnel. National League teams have used the DH only sporadically since the rule’s inception. The introduction of interleague play in 1997 meant, for the first time, National League teams used the DH in regular-season home games against teams in the other league. However, this limited the DH to ten games per team per year under interleague scheduling beginning in 2013. No team builds a roster around a ten-game event.
The pandemic forced a sharp change in 2020. In late June, shortly before Summer Camps opened, MLB announced implementation of a universal DH for the 2020 season. National League teams were faced with a dilemma that was a natural experiment for researchers: How best to build a roster featuring a DH from a team built for a non-DH league? In this research, I will review the strategies that teams utilized, their efficacy, and the results. Given insufficient time to simply copy the strategies of their American League counterparts, NL teams had to work with their existing personnel. I will evaluate their deployment of on-field talent, the outcomes that teams generated, and provide an update on the “DH Penalty” described by Tango, Lichtman, and Dolphin in The Book in 2007. I will discuss the applicability of the success of some NL teams to teams in the both leagues.
Rob Mains is a writer for Baseball Prospectus. His “Veteran Presence” column runs twice a week. He is a former equities analyst and was a finalist for the 2018 SABR Analytics Conference Research Award for Historical Analysis/Commentary.
What Is Hawkeye Teaching Us About Baseball Pitches?
The new Hawkeye baseball-tracking system, which was used for the first time in MLB during the 2020 season, opens the door to some exciting new opportunities in our quest to understand the flight of a baseball, specifically as it relates to pitching. As with the previous radar-based Trackman ball-tracking system, Hawkeye measures both the full trajectory and the spin rate of the pitched ball. But Hawkeye adds an additional piece of information: The direction in three dimensions of the spin axis of the spinning baseball, which allows new studies of pitch characteristics, particularly the movement. The conventional reason for movement is the so-called Magnus force on a spinning baseball. However, laboratory experiments have recently found that the orientation of the seams may result in non-Magnus movement, a phenomenon dubbed the “Seam Shifted Wake.” Such behavior would be a major paradigm shift in our understanding of baseball flight and would have important implications for the whole concept of “pitch design.”
In this talk the formalism for separating movement into Magnus and non-Magnus parts will be presented and applied to pitches from the 2020 season. Some general features of the data will be discussed along with some interesting and revealing examples from specific MLB pitchers. Knowledge of the spin axis also allows a precise separation of the spin into components: backspin/topspin, resulting in vertical movement; sidespin, resulting in horizontal movement; and gyrospin, resulting in no movement. This separation opens the door to studies of related topics, such as spin efficiency, the dependence of Magnus movement on spin rate and efficiency, the phenomenon of “late break,” and the dependence of drag on spin. As time permits, these topics will be presented.
Alan Nathan is Professor Emeritus of Physics at the University of Illinois. After a long career doing experimental nuclear physics, he now does research on the physics of baseball. On this topic he has written many articles, both for academic journals and online baseball publications; he has given numerous lectures to a variety of different audiences; and he maintains an oft-visited website (baseball.physics.illinois.edu) that many people have found to be a useful resource. He is interviewed regularly by the media and has consulted for various organizations, including MLB, NCAA, USABaseball, and several MLB clubs. He can be found on Twitter at @pobguy.
Predicting Pitching Arm Stress With Machine Learning Models
Kristen Faith Nicholson
Musculoskeletal injuries in baseball players are a persistent and significant problem. Pitching arm injuries are often contributed to excessive shoulder distraction force and elbow valgus torque. Over the past decade, research has attempted to elucidate the cause of these injuries. However, there is a lack of research investigating the relationship and influence of multiple variables on arm stress.
The purpose of this study was to identify which variables have the most influence on elbow valgus torque and shoulder distraction force using a machine learning approach. Gradient Boosting Machine models were created for elbow valgus torque and shoulder distraction force. Models utilized the same predictor variables, which included: pitch velocity and 17 pitching mechanics. The elbow valgus torque model reported highest influence for pitch velocity (relative influence: 28.4), maximum shoulder external rotation (9.23), lead leg maximum ground reaction force (6.6), shoulder abduction at foot strike (6.3), and maximum humeral rotation velocity (6.3). The shoulder distraction force model reported highest influence for pitch velocity (20.4), maximum humeral rotation velocity ( 9.6), shoulder abduction at foot strike (6.5), maximum shoulder external rotation (6.5), shoulder abduction at release (5.7), difference of time to maximum trunk rotation velocity to pelvis rotation velocity (5.6), and trunk forward flexion at release (5.5).
Pitch velocity was the most influential variable in both the elbow valgus torque and shoulder distraction force models. This suggests that the harder a pitcher throws, the more arm stress they will attenuate. However, both models also had pitching mechanics with significant influence. The results of this study can be used to inform players, coaches, and trainers on which mechanical variables to focus when the goal is to limit throwing arm stress.
Kristen Nicholson joined the Wake Forest pitching lab staff in October 2018 as the director of the lab and lead of biomechanics research. She earned her undergraduate in Mathematical Science from Clemson University and her M.S. and Ph.D in Biomechanics and Movement Science from the University of Delaware. Dr. Nicholson wrote her dissertation on a mathematical model for measuring scapular motion and became an expert in upper extremity biomechanics. Dr. Nicholson has a particular interest in the development of non-invasive tools and methods for assessing kinematics and implementing scapular kinematics in baseball pitching.
Association Between Hip Rotational Range of Motion and Pitching Mechanics in High School Baseball Pitchers Hillary Plummer
The repetitive nature of the pitching motion leads to adaptations in the hip joint tissues, often manifesting in altered hip range of motion (ROM). In professional pitchers, limited stride leg hip total arc of motion is associated with lower trunk separation velocity. These altered mechanics may result in greater forces at the shoulder and elbow and contribute to injury.
The purpose of this study was to examine the association between hip rotational ROM with lower extremity and trunk mechanics during pitching in high school athletes. Twenty-five high school baseball pitchers volunteered (15.9 ± 1.1 years; 180.4 ± 5.5 cm; 75.4 ± 9.3 kg). Kinematic data were analyzed for the mean of three 4-seam fastballs. Bilateral hip internal rotation (IR) and external rotation (ER) ROM was measured with the participants seated. Total arc of motion was calculated (IR+ER). Mechanics examined at stride foot contact included pelvis rotation, trunk rotation, stride length, stride leg knee flexion, stride foot angle and position. Pelvis and trunk angular velocity were examined as the maximum value from foot contact to maximum shoulder ER. Pearson correlation coefficients for drive leg and stride leg hip ROM and pitching mechanics were performed. Mean ball velocity was 70.1±4.6 mph. No significant correlations were observed between drive leg hip ROM and pitching mechanics. Trunk rotation and total arc of motion in the drive leg had a trend towards significance (r=0.39, p=0.053). No significant correlations were observed between stride leg hip ROM and pitching mechanics.
These results differ from previous work in professional pitchers which found hip ROM was related to trunk separation velocity and pelvis orientation. This may indicate adaptations only appear in older pitchers who have accumulated more total pitching volume and had more time to elicit musculoskeletal adaptations. Trunk rotation and drive leg total arc of motion approached significance and may have been significant with a larger sample size. As ROM increased, trunk rotation towards home plate also increased. Therefore, pitchers with decreased ROM may be in a more closed trunk position which could indicate altered rotational timing and inefficient segmental sequencing of the kinetic chain.
Hillary Plummer received her Ph.D in Kinesiology from Auburn University and is currently doing a research fellowship with the US Department of Defense. The focus of her research is to identify deficits in modifiable physical factors that prognosticate upper extremity injuries in overhead athletes. The long-term goal of this work is to identify athletes who may be at an increased risk of injury, and to establish a foundation for prevention programs aimed at reducing upper extremity injury.
Modeling Injury Risk Using In-Depth Injury Data Joe Rosales & John Shirley
Injuries are an unfortunate part of sports that can derail a team’s season and a player’s career. They are usually unforeseen incidents due to their relatively rare occurrence and the fact that some are truly random events. But by understanding the risk factors involved for each individual player, we can model injury risk with some measure of accuracy.
With that in mind, Sports Info Solutions has developed multiple injury risk machine learning models that use each player’s age, body type, usage, playing style, and SIS’s uniquely in-depth injury database to calculate their risk of injury over time during the season and for specific body regions. Separate models were created for pitchers and hitters, and then broken down further by specific body regions and time intervals such as a week, a month, and two months. The development of these models helps improve the inferences one can make when assessing players through a combination of on-field performance metrics and more accurate injury risk analysis.
Joe Rosales is a Lead Research Analyst and Strategic Advisor at Sports Info Solutions. He has worked for SIS in a full-time role since 2013, one highlight of which was being named a co-winner of the MIT Sloan Sports Analytics Conference Research Competition for the development of SIS’s Strike Zone Runs Saved pitch framing methodology.
John Shirley is a Research Associate at Sports Info Solutions. He initially joined SIS as a Video Scout in 2017. He advanced into roles as a Senior Video Scout and R&D Intern while also completing his Master’s degree from Northwestern. He became a full-time member of the R&D staff in 2019 and works on both football and baseball related projects.
Explainable AI for Baseball Predictions and Strategies
Joshua Silver and Tate Huffman
Over the last decade, Major League Baseball has dramatically increased the number of metrics it captures through the use of Statcast, a tracking technology that allows for the collection and analysis of massive amounts of baseball data. Our challenge is to transform this data into sophisticated analyses that are useful, understandable and shareable.
Our first attempt to address this challenge is to answer a fundamental question in baseball: How can we predict the outcome of a batter vs. pitcher matchup? To do so, we introduce a neural-network-based model, Singlearity-PA (pronounced single-arity-P-A). We trained Singlearity-PA using the last nine years of Statcast data. We show that our model scales to incorporate more than 90 different statistics (including batter and pitcher historical statistics, head-to-head statistics, park factors, and weather) to generate predictions that far exceed existing methods. Singlearity-PA predicts outcomes that include singles, doubles, homeruns, and strikeouts, as well as context-sensitive outcomes, such as double plays, intentional walks, and sacrifice flies.
Now that we’ve developed an accurate model, we’re challenged with making the rationale for its predictions understandable to human beings. To do so, we use the SHAP method to create Shapley values. Shapley values have a deep history in economics, game-theory, and AI, but have rarely been applied to baseball. They allow us to provide a set of simple charts, graphs, and metrics that make it easy for non-AI experts to understand the basis of the predictions. This allows people, such as team managers, the ability to run more sophisticated models while still maintaining their ability to defend their decisions to management, fans, players, and the press.
Building a sophisticated plate-appearance predictor, with transparent reasoning, now opens up vast opportunities for creating new strategies and simulations for the game. As an example, we’ve built a tool based on Singlearity-PA and Markov chains. The tool accurately predicts runs scored per inning by taking into account specific batters and pitchers (unlike earlier models, such as RE24, that are batter and pitcher agnostic).
We’ve built a user-friendly demo of Singlearity-PA and have released open-source code for Markov chains and other baseball simulations in both R and Python that will allow both amateurs and experts to explore baseball strategies.
Joshua Silver is the founder of Singlearity, a baseball analytics platform company which creates software tools and models to optimize baseball strategies. Prior to founding Singlearity, Josh spent 25 years in Silicon Valley working in high-tech software. In his most recent role, he led the team that built the AI platform and tools for Amazon’s Alexa voice assistant. Josh is a contributor to Baseball Prospectus.
Tate Huffman is a senior majoring in applied math and economics at Harvard University. He is a member and contributor to the Harvard Sports Analysis Collective.
Seam Shifted Wake in MLB Pitches Barton Smith
For the last two years, fluid dynamics researchers at Utah State University have been measuring airflow over pitched baseballs to learn more about their aerodynamics. These experiments have demonstrated that baseball seams near the center of the ball can modify the wake of the ball and change its direction. This effect, termed Seam Shifted Wake (SSW), depends on the ball’s orientation relative to its axis. Depending on the orientation, the SSW force can be in nearly any direction. Recently, MLB’s Hawkeye system has revealed that there is, indeed, a force in addition to Magnus force, gravity and drag acting on baseball pitches, and this force fits our SSW model. In most cases, the force is predominantly along the ball axis (90 degrees from the Magnus force) but it’s also common to find it adding or subtracting from the Magnus force. Most MLB sinkers and about half MLB changeups have a significant SSW force. This paper will emphasize unique SSW effects on Sliders.
Barton Smith has a BS in Mechanical Engineering from Michigan State University and a PhD in Mechanical Engineering (Fluid dynamics) from Georgia Tech, graduating in 1999. He has been a professor of Mechanical and Aerospace Engineering at Utah State University in Logan Utah for almost 20 years. He began researching baseball aerodynamics two years ago.
Contextual Influences On Neural Activity to Pitches and Feedback: Psychology and Performance at the Plate
Recently, researchers have expanded their interests into hitters’ neural activity. Studies have examined neural activity both during pitches and in response to performance feedback between pitches. This research provides valuable insights into hitters’ psychology and behavior, including their expectations, attentional focus and control, and ability to learn and correct decisions.
The current study expands on this foundational research to examine contextual influences on these measures. Two groups of collegiate baseball players completed a computerized video task assessing whether thrown pitches were balls or strikes. Players were given umpire feedback on the accuracy of their choice following each pitch and their neural activity was recorded throughout the task. One group of players (low-pressure) was just participating in the research on a voluntary basis. The other group of players (high-pressure) knew that all of their study data, including measures of their neural activity and task performance, would be given to their coaches, increasing their stress and pressure to perform on the task.
Results showed that the high-pressure group responded less accurately and more quickly than the low-pressure group, indicating hurried responses. Further, high-pressure hitters showed decreased proactive control compared to low-pressure hitters. Additionally, significant relationships were present between low-pressure hitters’ neural activity to feedback and their performance in the task. These relationships were not present in the high-pressure hitters. This finding suggests that low-pressure hitters were better able to associate information received in their feedback to their processing and performance in the task. The combined results indicate that contexts surrounding performance not only influence behavior, but also influence patterns of neural activity to pitches and feedback. This neural activity indexes many psychological processes that underlie task performance as well as self-regulatory learning and decision-making efforts to improve performance. Further, these findings show that neural activity can be used to objectively measure a player’s sensitivities to contextual influences (pressure, stress, anxiety, etc.) that are known to impact on-field performance, but are often difficult to monitor or uncover as they occur during performance.
Implications and uses for this research include assisting in player evaluations and player development processes. When combined with advanced analytic data as well as physiological data, this new level of psychological data and measurement could provide new insights into performance modeling and player development protocols.
Jason Themanson is a Professor in the Department of Psychology and the Neuroscience Program at Illinois Wesleyan University. He received his B.S. in Psychology from the University of Illinois, his M.A. in Social Psychology from the University of Connecticut, and his Ph.D. in Kinesiology (with an emphasis in exercise psychology) from the University of Illinois. Dr. Themanson’s research utilizes both neural and behavioral measures to examine cognitive processes related to learning, decision-making, and control during task performance.
The Pinch-Hitter Problem:Using Markov Chains to Analyze Outcomes in Pitcher-Batter Matchups
In baseball, consistently winning the matchup between pitcher and batter is integral to winning games. This intuitive conclusion drives most strategy and decision making in the game, as teams are constantly looking to gain an advantage in these matchups using pinch-hitters, relief pitchers, and lineup changes. If managers and teams are able to better understand the probabilities of achieving a certain outcome from a certain matchup, it will greatly help them in making these important decisions.
In this presentation, a new model of predicting the outcomes of an at-bat is developed using a Markov chain, where each state in the chain represents a certain count or a certain outcome. Statcast pitch-by-pitch data scraped from MLB’s online database is used to plug real-world data for pitchers and batters into the model and simulate how their expected performance changes along with the count. By combining the transition matrices for a given hitter and pitcher, this model gives the expected probabilities of outcomes in a hypothetical at-bat between the two. This model is useful in multiple ways, namely in that it allows teams and researchers to determine which counts are particularly strong for a certain player, take a closer look at performance splits (e.g. home/away, vs. RHP/LHP), and calculate expected statistics for a player (e.g. wOBA, OPS) by count or by matchup. With this tool at their disposal, managers can hopefully take a lot of uncertainty out of their decisions during the game.
Connor Turner is the founder of The Diamond (readthediamond.com), where he posts original baseball commentary, research, analysis, and videos. He recently graduated from Dartmouth College with a degree in Quantitative Social Science and a minor in Economics. You can find him @connorbturner and @TheDiamondUS on Twitter.
The 2020 Major League Baseball: And Now, For Something Completely Different
From 2015 through the 2019 postseason, the Major League baseball has undergone several changes, each of which can be tracked back to manufacturing and economic decisions. It was found that the 2020 ball had again changed, in a manner unprecedented and possibly unique. Balls used in games were externally consistent and unremarkable, comparable to those from before 2019.
However, examination of the wound “centers” revealed they were made to two distinct sets of manufacturing parameters — one that matched samples going back to at least 2002, and one that was lighter, with measurable weight changes to two separate yarn layers. Statcast 2020 home run data showed further evidence of two baseballs, with one having lower drag than previous seasons. Deciphering Rawlings’ inventory batch codes made it possible to determine the manufacturing timeline: both types were made as part of the 2020 production cycle, with the new parameters introduced three months in and discontinued after four months. MLB has confirmed that Rawlings implemented experimental changes over the period in question, with the goal of lowering the coefficient of restitution (COR) to be more centered within their specifications. However, because of the nature of the changes, it would have been difficult to determine the aggregate effect on drag without a Statcast-sized sample.
Since the 2021 baseball is expected to have the new manufacturing parameters, regular season play may see a counterintuitive combination of lower COR and lower drag, with slower exit velocities potentially producing longer home runs.
Dr. Meredith Wills is a Data Scientist for SportsMEDIA Technology (SMT). She also does independent research on the composition of Major League baseballs, and her findings — published in The Athletic and Sports Illustrated — have shed new light on the offensive changes in recent seasons. In her spare time, Wills is a knitting designer, partnering with both the Baseball Hall of Fame and the Negro Leagues Baseball Museum to create reproductions of vintage baseball sweaters. She has a B.A. in Astronomy & Astrophysics from Harvard University, and a M.S. and Ph.D. in Physics from Montana State University-Bozeman.
Is the 3rd Time Through the Order Effect Real? Correcting for Lineup Order and Pitcher Quality Selection Bias
Previous research has claimed that the pitcher has a disadvantage that increases markedly the third time through the batting order (TTO). This TTO penalty has become a consideration in deciding when to pull starting pitchers. In recent seasons, starters are pulled earlier, usually midway through the third cycle. Since higher quality batters dominate the earlier lineup positions, there is an imbalance in batter quality composition in the starter’s third cycle, not present in the first or second times through. Conversely, while nearly all starters begin the third TTO, better pitchers are more likely to complete the 3rd time through the lineup.
With data from Retrosheet, we analyzed each MLB season, 2010 through 2019, independently, jointly, and in groups. The data set includes over 1.18 million plate appearances, and the primary outcome measure was weighed on-base average (wOBA). We conduct a rigorous statistical analysis (that controls for batter quality, pitcher quality, handedness match, home field advantage, pitch count, park and league). We test the research hypothesis that a “3TTO” effect would produce a measurable “regression discontinuity” in batter performance as the game progresses against the null hypothesis that pitcher fatigue would cause a smooth decline in pitcher effectiveness over the course of number of batters faced.
Contrary to accepted sabermetric wisdom, across 10 years of data and adjusting for confounders, there is no evidence of a regression discontinuity when the lineup turns either the second or third time. We do not see a 3TTO discontinuity. Although batters are better in their third TTO the increase is gradual and steady. Indeed, starting pitchers are vastly most effective the first TTO with slight indications that the pitchers “settle in” and improve over the first few plate appearances. But then, pitcher performance. Gradually and smoothly worsens beginning with the bottom third of the order. We attribute this change to pitcher fatigue (although there may be other causes). Of course, the effect is on average and individual batters and pitchers may have very different profiles.
Adi Wyner is a tenured Professor in Statistics at the University of Pennsylvania, and an expert at Probability Models and Statistics. His principle focus at Wharton has been research in Applied Probability, Information Theory and Statistical Learning. He is faculty co-director the Wharton Sports Analytics and Business and a co-host of “Wharton Moneyball” on SiriusXM Business Radio. Prof. Wyner created the Wharton Moneyball Academy summer program. While Adi has consulted across many industries, he takes a particular liking to Major League Baseball.
Student Track presentations
The following presentations by students will also be delivered during the 2021 SABR Virtual Analytics Conference:
Using Kelly Criterion to Build MLB Draft Rankings
In the past decade, quantitative methods used to evaluate prospects for the First Year Player Draft have improved dramatically. The result has been increased accuracy of the models that predict how a prospect will perform. However, being able to value that distribution is a more complex problem. A common way to approach this problem is to calculate an expected value for each player based on the distribution of projected WAR output and the corresponding WAR to dollar values for each part of the distribution. This is a logical place to start, but for edge cases where players with similar expected values but with different individual distributions, this valuation method can create false confidence that the two players are the same. One of the major edge cases is deciding between a “toolsy”, but riskier high schooler and a college performer with less perceived upside, but a higher floor. From a qualitative standpoint, scouts are able to add context to their reports by adding a qualifier such as “risky” or “safe,” and adjust their dollar evaluation based on that. Quantitatively, it is more difficult.
To solve this issue and to provide a clearer overall picture of the value for all prospects, I propose using a formula well known to finance professionals and advantage players, the Kelly Criterion. The Kelly Criterion takes in a set of probabilities for each outcome and the corresponding payoffs of reaching that outcome and returns a fraction of the bonus pool that a team should “wager” on a player. This optimal wager size (represented as the signing bonus) maximizes the expected growth of the bonus pool, and for the edge case provided, returns two different signing bonus sizes, which provides a clearer picture of the distribution that each player has. This presentation will provide an intuitive understanding of the Kelly Criterion, how it can be used to solve difficult edge cases in prospect valuation and discuss the similarities and differences of expected value and expected growth.
David Gerth is a junior at Indiana University-Bloomington studying economics and mathematics, with research interests broadly in macroeconomics. He has assisted the Evansville Otters with player acquisition and is currently an associate scout with the New York Mets.
Modeling SwStr% and Batted Ball Profile from the KBO to MLB
This project models plate discipline, measured by BB%, K%, and BB/K, from the Korea Baseball Organization (KBO) to Major League Baseball (MLB), with a special focus on Kim Ha-seong and Na Sung-bum, the two KBO hitters posted following the 2020 season. The two have drastically different approaches at the plate and allow for an in-depth exploration of KBO to MLB plate discipline. Through a backwards stepwise regression and k-means clustering, this project produced predicted MLB plate discipline stats for Kim and Na. One challenge that has faced similar analysis in the past is the lack of swing-level data for the KBO; 30,000 pitches, their locations, and results were manually tracked over the 2020 season, a collected dataset that allows for an in-depth look at a player’s approach and how it may translate to MLB. MLB is becoming increasingly diverse, and MLB teams must accurately project performance from league to league. This project is the first step to making KBO pitch-level data available and using it to predict MLB performance.
Ben Howell is a Baseball Research and Development Intern at the University of Texas at Austin, where he studies Sport Management and Economics. He collected data from the 2020 Korea Baseball Organization season and created the KBO Wizard to host pitch-type and result data. He will be joining the San Diego Padres as a Baseball Research and Development Intern for the 2021 summer.
The Whiff Effect: Are Pitchers More Likely to Repeat the Previous Pitch After A Swing-And-Miss?
A pitcher stands on the mound and throws a nasty slider, causing the batter to swing and miss wildly. After getting the whiff, is the pitcher more likely to go back to the slider on the next pitch than if he had gotten a called strike? A ball? A foul ball? If the pitcher does throw the slider again, is the batter more likely to swing-and-miss than he would be if he’d fouled off the last pitch?
In this research, we attempt to answer two questions: are pitchers more likely to repeat their previous pitch after inducing a swing-and-miss? And, if they are, how do batters perform against these repeated post-whiff pitches? We investigate these questions using 2019 MLB pitch data from Baseball Savant. After controlling for count, pitch type, and pitcher whiff rates, we find that yes, pitchers do repeat pitches at an increased rate after getting a swing-and-miss. Then, against these pitches, it initially appears that batters perform worse than they do against non-post-whiff-repeat pitches. However, after applying our controls, we find that the relationship is mediated, with a bizarre exception on fastballs. We also explore individual pitcher tendencies and whether the same pitchers show high “whiff effects” year-over-year.
These findings can serve as a jumping off point for future research into pitch sequencing and independence of events throughout a baseball game. This research approached the problem at a granular, pitch-to-pitch level, but future research might look at a larger scale. For example, if a pitcher gives up a home run on his curveball in the first inning, is he less likely to throw it later than he would be if he gets two strikeouts on it early? Should he abandon the curveball if it gets beat up early, or should he stick with it?
Nate Rowan (firstname.lastname@example.org) is a junior at Baylor University, where he is majoring in statistics and sociology. He has worked on data projects in wastewater treatment and food insecurity, and loves working on any problem that can use data to improve people’s everyday lives. Nate has also presented his sports analytics research at the UConn Sports Analytics Symposium and has participated in the NFL Big Data Bowl and Sports Info Solutions 2020 Football Analytics Challenge.
For more information on the 2021 SABR Virtual Analytics Conference, or to register, visit SABR.org/analytics.