2016 SABR Analytics Conference: Research Presentations – Society for American Baseball Research

Here are the research presentation abstracts and presenter bios for the fifth annual SABR Analytics Conference, which was held March 10-12, 2016, at the Hyatt Regency Phoenix in Phoenix, Arizona. Click on a link below to listen to the research presentation or view the presentation slides.

The full conference schedule is available at SABR.org/analytics/schedule/2016.

4:00-5:00 p.m., Thursday, March 10
RP1 and RP2 took place back-to-back in a single session.

RP1: How The Brain Hits, And Where We See It On The Field
Jason Sherwin

Audio: Click here to listen to Jason Sherwin’s presentation (MP3; 32:06)
Slides: Click here to view slides from Jason Sherwin’s presentation (PDF)

Hitting a baseball is described as one of the “hardest things to do in sports.” While much vision research has focused on this difficult visual task, the study of the neural correlates of hitting a baseball have been only minimally studied. This is because of the technical difficulty of precisely measuring relevant brain response and then decoding the performance-relevant aspects of that signal. deCervo launched in 2014 as a company that offered a solution to these and related problems of high-speed decision-making, such as hitting a thrown baseball. Founded by Ph.D.s Jordan Muraskin and Jason Sherwin, deCervo grew out of the Columbia University School of Engineering and Applied Sciences, where the founders realized that they could utilize rapid decisions’ neural correlates as biomarkers of expertise. To date, deCervo has worked with four NCAA Division I Baseball teams and seven Major League Baseball teams, having collected over 30 million brain measurements from professional and collegiate players. deCervo has also just begun to bring this capability to youth players in limited test markets. In this talk, Dr. Sherwin will cover both in-depth neuroscience findings and extensions into professional practice via deCervo. Particularly, he will also talk about the relationships seen between deCervo Baseball Profiles and on-field performance metrics, such as plate discipline.

Jason Sherwin, Ph.D., is Chief Executive Officer and co-founder of deCervo. Previously, he was Research Professor of Visual Neuroscience at the State University of New York. Before that, he held dual appointments as a post-doctoral research scientist at the Columbia University in the City of New York and as an Oak Ridge Associated Universities post-doctoral fellow at the U.S. Army Research Laboratory. His research during this time covered perceptual decision-making in real-world environments. Ultimately, this work led to establishing deCervo with co-founder and Chief Technology Officer Jordan Muraskin based on research and inventions they had pioneered at Columbia.

RP2: Splitting Range, Positioning, and Throwing in Defense
Scott Spratt

Audio: Click here to listen to Scott Spratt’s presentation (MP3; 25:13)
Slides: Click here to view slides from Scott Spratt’s presentation (PPT)

Modern defensive metrics have previously been limited in their capacity to separate individual components of a fielder’s ability to convert batted balls into outs. A shortstop might make a play on a batted ball that historically was rarely made because he showed a lot of range, because he made a great throw, or because he was well-positioned in the first place. Thanks to newer data from Baseball Info Solutions such as infielder starting position coordinates, we can now systematically break out individual components of a defensive play for the first time.

The process begins by estimating a batted ball’s likelihood of being converted for an out given different sets of prior information and at different moments in time. From there, we can distribute the credit/penalty between a player’s range, positioning, and throwing abilities for every groundball throughout the past three seasons. We find that many players with strong defensive reputations excel in the range category, but their positioning and throwing often limit their overall contributions. On the other hand, some fielders are among baseball’s most effective by complementing their mediocre range with excellent positioning and machine-like reliability in making accurate throws to first base.

Scott Spratt is a Research Analyst for Baseball Info Solutions. He writes for ESPN Insider and FanGraphs and co-hosts the Off the Charts Football Podcast with Aaron Schatz. He is a Sloan Sports Conference Research Paper Competition and FSWA award winner.

11:00 a.m.-12:00 p.m., Friday, March 11
RP3 and RP4 took place back-to-back in a single session.

RP3: Measuring Performance Better and Sooner
Jonathan Judge

Audio: Click here to listen to Jonathan Judge’s presentation (MP3; 12:54)
Slides: Click here to view slides from Jonathan Judge’s presentation (PPT)

Everyday baseball statistics measure outcomes, not actual player contributions. With modern software, we can and should do better. Through proper mixed modeling, analysts can peg player value sooner, and account dynamically for quality of opponent, stadium effect, and virtually any other statistic of interest. Moreover, modeling not only provides a better answer, but further provides confidence intervals around that answer. These measures of certainty allow better analysis and empower management to make more informed decisions, sooner.

Jonathan Judge has a degree in piano performance from the Lawrence University Conservatory of Music and a law degree from the University of Wisconsin. He is a trial lawyer specializing in the defense and regulation of consumer products. He is a senior member of the Stats Team at Baseball Prospectus, and has been heavily involved in the rollout of mixed modeling to drive a new generation of baseball statistics. He believes that analytics can play an important role in driving better legal decisions.

RP4: Quantifying the Impact of Injuries on Playing Time and Performance
Joe Rosales

Audio: Click here to listen to Joe Rosales’ presentation (MP3; 35:40)
Slides: Click here to view slides from Joe Rosales’ presentation (PPT)

Up to now, one of the barriers to being able to do research on how injuries might affect how a player performs or how a player ages has been lack of information. So, in 2015, Baseball Info Solutions began collecting detailed information on all game events in Major League Baseball that have a physical impact on a player, whether that event resulted in a trip to the Disabled List or not. This includes everything from a hitter simply fouling a ball off his foot to a pitcher requiring Tommy John surgery. While one year of data is not yet enough to necessarily draw definitive conclusions, we can begin to examine the data to see what trends stand out and where the most fertile areas of research may be. For example, when we look at injuries by defensive position, we now have clear evidence that catchers are involved in the most injury events. By far. This is primarily because of all the foul balls they take off the mask and other parts of their bodies. Given the greater understanding that medical professionals are gaining regarding the effects of repeated head trauma on people, this type of data could help illuminate which players may be at greater risk than others. This presentation will begin to explore these and other such injury related topics.

Joe Rosales is a Research Analyst for Baseball Info Solutions. He is a New England native and found his way to BIS after internships in baseball operations with the Boston Red Sox, Pittsburgh Pirates, and New York Mets. He is also a winner of the MIT Sloan Sports Analytics Conference Research Competition for the development of BIS’s Strike Zone Runs Saved pitch framing methodology.

3:30-4:30 p.m., Friday, March 11
Presented by Baseball Info Solutions, RP5 and RP6 took place back-to-back in a single session.

RP5: Hidden Gold on the Diamond? The Contribution of the Relative Age Effect to Talent Estimation Errors of High School Players in the June MLB Draft
Robert Brustad

Audio: Click here to listen to Robert Brustad’s presentation (MP3; 34:05)
Slides: Click here to view slides from Robert Brustad’s presentation (PPT)

Identifying and projecting talent of high school players is complicated by meaningful age differences that exist among players within the same draft class. The “relative age effect” (RAE) refers to the tendency for older athletes within any competitive age cohort to appear more “talented” than younger athletes when current performance differences can be partially attributed to additional maturation and accumulated practice favoring older players. This study examined relative age influences on the probability of draft selection and eventual return value of high school players with the assumptions that relatively older players a) are more likely to be selected but b) relatively younger players have a higher ceiling and provide greater potential return. All high school draft selections in the first twenty rounds of the 2005 through 2012 MLB June drafts were included. The first analysis included those players born within the 12-month age range corresponding with the age of “typical” high school senior year players. Chi-square frequency analysis revealed a significant difference that favored the selection of older players. Return value was then assessed through subsequent accumulated MLB WAR values of three age-on-draft day groups. Significant differences existed in return value favoring the youngest group of players (m draft day age = 17.97 yrs.) who outperformed the average (m age = 18.40 yrs.) and oldest players (m age = 18.79 yrs.) by 2.36 and 2.44 times, respectively. This advantage was also present for seven of the eight individual drafts. Maturational considerations involved in projecting talent will be further discussed.

Robert Brustad is Professor in the School of Sport and Exercise Science at the University of Northern Colorado and former Editor of the Journal of Sport & Exercise Psychology. His focus is on the design of talent identification and talent development systems in sport with primary interest devoted to the role of physical and psychological maturation and development on sport performance. He has consulted with various professional sport organizations and the United States Olympic Committee.

RP6: True wOBA: Estimation of True Talent Level for Batters
Scott Powers and Eli Shayer

Audio: Click here to listen to Scott Powers and Eli Shayer’s presentation (MP3; 27:21)
Slides: Click here to view slides from Scott Powers and Eli Shayer’s presentation (PDF)

Estimating the run value of plate appearance outcomes for batters is a solved problem. What has received relatively little attention in sabermetric literature is the problem of estimating the frequency with which a batter will produce each of these outcomes. In an industry where all sample sizes are finite, it is insufficient to distinguish only between a “small sample” and one that is large enough. Dodgers infielder Corey Seager posted a wOBA of .421 in 113 Major League plate appearances in 2015. On a level playing field, how does this compare with Pirates infielder Jung-ho Kang’s .356 wOBA over 456 PA? This is a simplified example of the questions that baseball executives need to address when making roster decisions in the face of uncertainty.

Our proposal, True wOBA, uses ridge multinomial regression to estimate batters’ true skill levels while simultaneously adjusting for sample size, park effects, opponent quality and luck factors, like BABIP. True wOBA is closely related to a simplified version of Deserved Run Average (Judge et al. 2015) for batters, with the novel contribution that outcome probabilities are estimated under the restriction that they sum to one, which is not feasible under the framework of linear mixed effects models. The focus of our work is espousing the idea of leveraging the interplay between population variance and sample variance for different statistics, which is not new (The Book, Tango et al. 2007) but is also not used in mainstream baseball analysis as much as we would like.

Scott Powers is a Ph.D student in statistics at Stanford University, where he is co-president of the Stanford Sports Analytics Club. He works as an analytics consultant to the Oakland Athletics and as a data analyst for the professional soccer club AZ Alkmaar in the Dutch Eredivisie. Scott is involved with the Stanford Club Baseball team as a former catcher and current coach, and he is also a setter on the Stanford Men’s Club Volleyball team.

Eli Shayer is a sophomore at Stanford University studying Mathematical and Computational Science, originally from Anchorage, Alaska. At school, Eli is the Technology Officer of the Stanford Sports Analytics Club. In the last several months he placed first in the engineer division of the TruMedia Baseball Hackathon and led his team from the Stanford Sports Analytics Club to a second-place finish in the Graphicacy Major League Data Challenge.

9:45-10:45 a.m., Saturday, March 12
RP7 and RP8 took place back-to-back in a single session.

RP7: Hot & Cold Streaks Using Batted Ball Velocities
Rob Arthur

Audio: Click here to listen to Rob Arthur’s presentation (MP3; 19:23)
Slides: Click here to view slides from Rob Arthur’s presentation (PPT)

Hot streaks have been the subject of considerable controversy in statistical research, both in baseball and other sports. First conclusively debunked, more recent analyses have found some evidence that hot streaks may be real. New batted ball velocity data offers the cleanest and best opportunity yet to search for a signal of streakiness in baseball. To do so, I employed a statistical approach called a Hidden Markov Model, which attempts to infer which of two states (“hot” or “cold”) a hitter is in based on their batted ball exit velocities. After adjusting for opposing pitchers, park effects, and a number of other factors, I examined batted ball velocities of about 100 hitters. I found that some hitters showed strong statistical evidence of streakiness, according to a likelihood ratio test. The difference between some hitters when they were “hot” and “cold” was up to 10-15 mph of exit velocity, which equates to a several hundred point increase in OPS. Notably, I found no similar signal of hot streaks in pitchers. My results indicate that at least some hitters in MLB can go through hot streaks which significantly elevate their offensive performance, and that sabermetric researchers should re-evaluate the existence of streakiness in baseball.

Rob Arthur is a freelance journalist, consultant, and researcher based in Chicago. He has contributed to Baseball Prospectus, FiveThirtyEight, and ESPN The Magazine, among others. When he’s not thinking about baseball, he works as a postdoctoral fellow studying cancer at the University of Chicago Medical Center.

RP8: Solving DIPS by Deconstructing BABIP
Brian Cartwright

Audio: Click here to listen to Brian Cartwright’s presentation (MP3; 26:55)
Slides: Click here to view slides from Brian Cartwright’s presentation (PPT)

Batting average on balls in play is a ‘noisy’ stat because it is an aggregation of several rate stats, each with their own properties. Using a binary decision tree to classify each batted ball by ground or air; infield, outfield or over the fence, we’ll examine how the launch angles and exit velocities by batters and allowed by pitchers shape the results within each of these batted ball types. This combination of advanced remote sensing metrics with publicly available play by play provides valuable insight to interpret the batting, pitching and fielding from college, foreign and minor leagues where the advanced metrics are not available.

Brian Cartwright is the developer of the Oliver projections that have appeared at The Hardball Times and FanGraphs since 2009. He was a finalist in the Baseball Prospectus Idol competition and has been studying play by play, projections and defensive metrics for more than 35 years. When away from baseball he applies data science to photogrammetry to manage Obstruction Evaluation / Airport Airspace Analysis for the largest geospatial firm in North America.

12:15-12:45 p.m., Saturday, March 12
Presented by Baseball Info Solutions

RP9: Component Predictions of Batting and Pitching Measures
Jim Albert

Audio: Click here to listen to Jim Albert’s presentation (MP3; 18:25)
Slides: Click here to view slides from Jim Albert’s presentation (PDF)

To predict batting measures for the following season, it is well known that it is desirable to shrink or adjust a batter’s season average towards the average for all players. This result was demonstrated forty years ago by Efron and Morris (1975), and recent demonstrations of these improved predictions are described in Albert (2004) and Tango, Lichtman, and Dolphin (2007). The degree of shrinkage depends on the type of batting measure (Albert, 2004)— the variability of some types of measures such as strikeout and home run rates are more influenced by the different talents of the batters, and for other measures such as batting average and batting average on balls in play are more influenced by chance variation. Bickel (2004) and Bickel (2003) demonstrated how a batting average can be decomposed as a function of other rates such as strikeout rate, home run rate, and hit-in-play rates. This research uses these decompositions to develop improved predictions of batting averages. The strategy is to estimate groups of component rates (such as strikeout rates, home run rates, and hit-in-play rates) separately, and aggregate the component estimates to get new predictions of batting average. A similar strategy can be used to predict on-base percentages, and to predict field-independent-pitching (FIP) measures for pitchers. For pitchers, the method is to separately estimate strikeout rates, home run rates, walk rates, and hit-in-play rates for a group of pitchers, and then aggregate these component estimates to develop “better” estimates of FIP ability. By using batting data for all non-pitchers in a given season to predict batting measures in the following season, it is demonstrated that the new method tends (over a fifty season period) to provide better predictions of batting average and on-base percentages that standard shrinkage methods. In addition to providing better predictions, the component estimates provide better insight into the specific batting and pitching talents of hitters and pitchers. We provide insight into the situations where one would anticipate seeing the best improvement of these component predictions of batting and pitching performance. In awarding long-term contracts, baseball teams need to make informed predictions about the performance of players in future seasons, and this research should be helpful in the development of better predictions.

Jim Albert is a statistics professor at Bowling Green State University. He is past editor of the Journal of Quantitative Analysis of Sports and been active in the Section of Statistics in Sports in the American Statistical Association. He has written three baseball books, Curve Ball (with Jay Bennett), Teaching Statistics Using Baseball, and Analyzing Baseball Data with R (with Max Marchi). Growing up in the Philadelphia area, he is a longtime member of SABR and has been a lifelong Phillies fan.

3:30-4:30 p.m., Saturday, March 12
Presented by Baseball Info Solutions, RP10 and RP11 took place back-to-back in a single session.

RP10: The Stolen Base
Lindsay Parr

Audio: Click here to listen to Lindsay Parr’s presentation (MP3; 24:19)
Slides: Click here to view slides from Lindsay Parr’s presentation (PDF)

The stolen base is an integral part of the game of baseball. As it is frequent that a player is in a situation where he could attempt to steal a base, it is important to determine when he should try to steal in order to obtain more wins per season for his team. We used a sample of games during the 2012 and 2013 Major League Baseball seasons to see how often players stole in given scenarios based on number of outs, pickoff attempts, runs until the end of the inning, left or right-handed batter/pitcher, run differential, and inning. New stolen base strategies were created using the percentage of opportunities attempted and the percentage of successful attempts for each scenario in the sample, a formula introduced by Bill James for batter/pitcher match-up, and run expectancy. After writing a program in R to simulate baseball games with the ability to change the stolen base strategy, we compared new strategies to the current strategy used to see if they would increase each Major League Baseball team’s average number of wins per season. We found that when using a strategy where a team steals 80% of the time it increases its run expectancy and 20% of the time that it does not, the average number of wins per season increases for a vast majority of teams over using the current strategy.

Lindsay Parr is working towards her Ph.D in Applied Mathematics and Statistics at the Colorado School of Mines in Golden, Colorado. She is a Graduate Teaching Fellow who is currently teaching undergraduate Probability and Statistics at her University. The research that she is presenting was conducted under her advisor Dr. William Navidi. She is an avid sports fan, especially baseball and hockey. Her dream is to work on the statistical side of baseball.

RP11: Unintended Consequences of Rising Strikeout Rates
Rob Mains

Audio: Click here to listen to Rob Mains’ presentation (MP3; 30:37)
Slides: Click here to view slides from Rob Mains’ presentation (PPT)

The main impact of rising strikeout rates has been well-reported: More whiffs and fewer walks mean fewer balls in play, fewer baserunners, and fewer runs scored. In this project, I researched how rising strikeout rates may affect other outcomes.

The first conclusion is that, unsurprisingly, pitchers are increasingly ahead in the count. The data illustrate that 2014 and 2015 were probably (I use the qualifier because of the limited availability of pitch data) the first two seasons in modern baseball history in which more plate appearances ended with the pitcher ahead in the count than with the batter ahead.

The second conclusion is that this trend has led to a statistically significant increase in wild pitches and batters hit by pitches and a decrease in sacrifice flies. (Stolen base attempts are more successful with the pitcher ahead as well, though that could be a result of managerial strategy.)

The hypothesis, supported by pitch location heatmaps, is that when a pitcher is ahead in the count, he focuses on the edges of the strike zone, where a miss can become a hit batter, a wild pitch, or a slower catcher pop time, while the batter is likely to shorten his swing, resulting in fewer well-hit outfield fly balls with runners on third and fewer than two outs. The conclusion is that increased strikeouts have yielded an increase in hit batters and wild pitches and a decrease in sacrifice flies.

Rob Mains is a retired Wall Street equities analyst, now analyzing baseball. He has contributed to FanGraphs Community, writes for BanishedToThePen.com, and blogs about the Pittsburgh Pirates at OnTheFieldOfPlay.com. He is a SABR member and a Retrosheet volunteer.

For more coverage of the 2016 SABR Analytics Conference, visit SABR.org/analytics/2016.

Search the Research Collection

SABR Analytics Conference

Support SABR today!