Artificial Intelligence, Machine Learning, and the Bright Future of Baseball
This article was written by Brian Hall
This article was published in The National Pastime: The Future According to Baseball (2021)
Using machine learning, AlphaGo taught itself the game of Go, and in 2016 beat 18-time world champion Lee Sedol. Pictured here is Go professional Michael Redmond providing a play-by-play commentary on the AlphaGo/Sedol match. (DEEPMIND)
Baseball is a sport steeped in tradition, but there are relatively new and rapidly developing technologies that are already impacting baseball and will continue to shape its future: artificial intelligence (“AI”) and machine learning. We are currently living in the midst of a technological revolution powered by AI and machine learning. Six out of seven Americans carry an AI-powered assistant with them every-where—their smartphone.1 With Google Translate, we have a universal communication device that 15 years ago we thought was something we could only see on Star Trek. There are many other uses of AI and machine learning in industry that have far reaching impacts on our society, such as the coronavirus vaccines that were developed in record time, aided by new techniques powered by machine learning. Few people could have predicted how quickly and seamlessly AI and machine learning have become part of our lives. I suspect we will also underestimate how quickly it will be integrated into baseball and other sports.
For a baseball team, poor decision-making can lead to costly mistakes in terms of dollars, performance, and wins. Knowing this, teams have turned to the science of decision-making, viewing decisions as being the product of an interplay between two parts of the mind: “System 1”—a person’s quick, instinctual response, and “System 2”—rational analysis that is a product of slower, thoughtful reasoning.2 When a person is asked to complete a task requiring cognitive effort, that person may inadvertently rely on an instinctual response (System 1) rather than expend the mental energy needed to make the best rational decision (System 2).3 Naturally, sabermetricians are champions of System 2 thinking. Sabermetrics, with its goal of better, quantitatively supported decisionmaking, has also led to large amounts of new data to analyze. However, the human mind and traditional statistical techniques can only process new information up to a point before being overwhelmed.4 In contrast, the strength of AI and machine learning is that large amounts of data are its rocket fuel, and the larger the dataset the better.5 Baseball has created new sources of data (such as Statcast) and statistics (spin rate, barrel, etc.). Seven terabytes (7 TB) of data are gathered by MLB at each game.6 Machine learning and AI can detect patterns in these otherwise overwhelming mountains of new data, and teams will be able use these insights to improve their decision-making.
What kind of decisions are teams looking for AI and machine learning to help with? With the growth of player data, these tools can be used to try to crack the nut of predicting player performance.7 It is hard enough to try to predict a player’s future performance with years of high school, college, and minor league stats. But what about adding more contemporary statistics such as exit velocity, barrels, launch angle, spin rate, OAA, extension and arm strength? And how does that interact with a player’s age, weight, build, running speed and other measures of athletic ability? To analyze this information, machine learning models can be built using the data of past players. Based on what the computer learns from these past players during this “training” phase, it can build a model to predict how well players will perform in their careers. The accuracy of these models is then judged by setting aside some prospects that the computer has not seen before, and having the computer predict how bright the future is for these players. If these prospects are players from the past, with the results of these careers already known, they can be used as “test” data to judge how well the machine learning model performed in predicting the future performance of these players. The models that do well can be used by teams to better project the careers of current prospects, while those models that do poorly in the “test” phase will be discarded.
The batter and pitcher match-up has always been a chess match, but each side can now be aided by insights produced by an AI-assistant. Currently, it is becoming more common to see players in the dugout studying analytics on their iPads intently when new pitchers come into the game. Within two decades, expect wearable devices to become widely adopted, with perhaps “smart” glasses or contact lenses conveniently providing such information. Such advice could include the probability of the location and type of the next pitch.8 Elon Musk, through his Neuralink project, expects to take this further by developing implantable brain-machine interfaces. While it may seem outlandish today, it may be common (and more cost-efficient) to upgrade our brain chips than computers or phones, which require external displays. If information is no longer accessed through an iPad on the bench, but through wearables or implants, AI could be paired to help provide advice. Of course, MLB would need to make a decision as to whether players would be allowed to access information while in a game. When making such a decision, it is important to remember that ultimately it is up to the player to throw the pitch, make contact with the bat, catch the ball and throw it. An AI assistant would not enhance a player’s physical abilities, only aid the player’s judgment.
Recent advances in machine learning have also aided pitchers in training, detecting patterns in pitching mechanics—previously unobservable to the human eye—to avoid arm stress as well as potentially increase a pitcher’s velocity, movement, or spin rate.9 There are hundreds of biomechanical variables that could influence pitcher arm health. Consider the linking of the joints that start at the ankle, move to the knee, hips, shoulders and elbow before ending at the wrist (the “kinetic chain”). Previously, it was difficult to determine which of hundreds of variations in body movement were responsible for creating arm stress. However, with a machine learning approach, the causes of arm stress can be better identified without needing to sacrifice performance, such as arm velocity.10 Given this, expect pitcher velocities to continue trending upwards.
Even data on the neural activity of batters and pitchers are being used to measure, among other things, the impact of pressure on performance.11 For example, by measuring electrical activity in a batter’s brain during the time between the pitcher releasing the ball and the ball crossing the plate, researchers are starting to quantify how cognitive activity is correlated with on-field metrics (OBP, SLG, and OPS). In the coming years, it should be possible to use machine learning to detect more granular patterns in cognitive activity. This would allow for the creation of new player statistics that are not a measure of the player’s external performance, but a measure of a player’s inner world. For example, a statistic could be created to measure how well a batter or pitcher’s brain handles stress or how quickly a batter’s brain can react and adapt to different pitches. These new “brain” statistics could be valuable in identifying the progression of talented players and could become an important statistic to help predict which players are likely to succeed.
Previously, player injuries could be seen as largely unpredictable and a result of bad luck. Recent work relying on data based on player age, body type, usage and playing style has attempted to better identify risk factors impacting a given player’s risk of injury.12 European soccer currently appears to have taken the lead in creating new sources of player data for use with AI injury predictions models. Researchers were able to improve their predictions on the risk of injury for a soccer player using machine learning and these data related to average playing time, position, age, body mass, quality of sleep, fatigue, and various neuromuscular factors such as joint range of motion, balance, strength of the hip adduction, core stability, and knee flexion.13 If soccer teams are successful in reducing injury rates by utilizing these techniques to decrease the playing times of players flagged at higher risk, expect baseball teams to develop and adopt similar techniques.14 By detecting patterns to injuries that could be avoided by decreasing playing time or the training workload— whether for a couple hours or days—AI may be able to help fans avoid the heartbreak of losing their favorite player early in a season to injury.
Umpiring could be helped as well. The home plate umpire has the tough job of determining the position of a pitch down to a fraction of an inch when the ball is being hurled ever faster and with more movement. One recent study found that umpires made the wrong call on average 30% of the time when a batter had two strikes.15 MLB has been experimenting in the Atlantic League with “Robo-Umps”—an Automatic Ball-Strike system that employs TrackMan’s 3D-doppler radar system to determine the position of the ball.16 While this works for calling balls and strikes, a human eye is still needed for checked swings and calling runners safe or out. In the past ten years, advances in a computer vision system known as Deep Learning has allowed machines to see the world and make decisions with ever increasing accuracy.17 To further improve Robo-Umps in the coming years, Deep Learning technologies could allow computers that are processing video of the game in real time to determine whether a player has foultipped, checked a swing, or is safe or out. This should reduce errors in calls and speed up the game, perhaps by putting an end to heated conversations between umpires and players/coaches. The Robo-Ump in the digital cloud won’t have the ears to hear the complaints.
Technology changes rapidly, making it difficult to predict further into the future of baseball, AI and machine learning, but there are clues when one looks at breakthroughs that have occurred in other games. In 1997, IBM’s Deep Blue became the first computer to defeat a world chess champion, using early artificial intelligence algorithms.18 More recently, in 2016, a Google computer called AlphaGo, taught itself the rules of the game of go using a modern type of artificial intelligence called reinforcement learning, and used this knowledge to defeat the world champion. That this feat could occur so quickly surprised many, as go was seen as far more complex than chess, and AlphaGo employed novel moves that go champions had never seen before, despite the game’s closely studied ~2,500 years of history.19 Today, if you must win a chess or go game over an opponent, you could always consult AI for the best moves, moves better than any human has been able to make for several years now. The game of baseball, similar to the games of chess and go, can be broken into a series of strategic data-driven decisions made by a team’s management and its players. One day a savvy baseball team will rely on AI in those must-win games.
But even if this information were available, the question arises as to how to communicate it in the middle of a game to players on the field. This is tricky because AI can also easily intercept and understand a team’s signs. If you have your doubts, just watch a video by Mark Rober, a popular YouTube content creator, entitled “Stealing Baseball Signs with a Phone (Machine Learning).”20 The video demonstrates how simple it is to use a phone to crack the code behind baseball signs. With signs so easy to decipher with machine learning, a future scandal involving a team using this technology may make us long for the simple days of trash cans. A machine learning cheating scandal could also push MLB to revise its rules on how to communicate information between coaches and players on the field and from catchers to pitchers. If rules are changed, and electronic communication becomes permitted, it would make it much easier for pitch-by-pitch insights generated from AI to be deployed to players.
The groundbreaking AI and machine learning technologies in our phones and our homes have been produced by some the best tech companies in the world, whose idea of top talent is superstar professors and graduates recruited from a handful of elite universities in the world. Sports teams have another sort of star talent that they primarily need to spend their resources on, and the tech and financial industries can snap up the best AI and machine learning talent in the world. This poses a problem for sports teams who want to capitalize on AI and machine learning, as there are few people in the world who can create innovative products based on the mathematics and software engineering underlying AI and machine learning. But that hasn’t stopped some fans, whose love of the game has motivated them to create new innovative applications in baseball. And, if SABR is any guide, baseball has a brighter future when fans get involved in baseball research. By 2040, we will have taught AIs much about baseball, and it will be exciting to see what they will teach us in return.
BRIAN HALL recently joined the NYU Tisch Institute for Global Sport, where his work focuses on AI, machine learning, and their application to the sports world. At NYU, he teaches “Artificial Intelligence and Machine Learning,” an on-demand course that is open to both professionals and sports fans alike. Prior to joining the NYU faculty, Brian served as Vice-President at Two Sigma Investments, a New York City-based hedge fund primarily known for its use of artificial intelligence, machine learning, and distributed computing trading strategies.
Notes
1. Pew Research Center, “Mobile Fact Sheet,” April 7, 2021, accessed June 16, 2021. https://www.pewresearch.org/internet/fact-sheet/mobile.
2. Daniel Kahneman, Thinking, Fast and Slow. Farrar, Straus and Giroux, 2011; Joe Lemire, “This Book is Not About Baseball. But Baseball Teams Swear by It,” The New York Times, February, 24, 2021.
3. Lemire, “This Book is Not About Baseball.”
4. Donald E. Farrar and Robert R. Glauber, “Multicollinearity in Regression Analysis: The Problem Revisited,” The Review of Economics and Statistics, vol. 49, no. 1, 1967: 92-107; Kahneman, Thinking, Fast and Slow.
5. On the other side of the coin, for simple problems with smaller datasets, traditional statistics may be more useful then machine learning or AI.
6. Barb Darrow, “Live from Fenway Park: a behind the scenes look at MLB’s Statcast,” Fortune, September 4, 2015. https://fortune.com/2015/09/04/mlb-statcast-data.
7. Arlo Lyle, Baseball prediction using ensemble learning, PhD thesis, University of Georgia (2007).
8. Phuong Hoang, Michael Hamilton Hien Tran, Jospeh Murray, Corey Stafford, Lori Layne, & David Padget, Applying machine learning techniques to baseball pitch prediction, International Conference on Pattern Recognition Applications and Methods, 2014.
9. Kristen Faith Nicholson, “Predicting Pitching Arm Stress With Machine Learning Models,” SABR Analytics Conference: RP18, 2021. SABRVideos Youtube channel: https://www.youtube.com/watch?v=krZRI6o6wkw.
10. Nicholson, “Predicting Pitching Arm Stress.”
11. Jason Themanson, “Contextual Influences On Neural Activity to Pitches and Feedback: Psychology and Performance at the Plate,” SABR Analytics Conference: RP8, 2021. SABRVideos YouTube channel: https://www.youtube.com/watch?v=gjWtOTlzjD8.
12. Matt Manocherian and John Shirley, “Modeling Injury Risk Using In-Depth Injury Data,” SABR Analytics Conference: RP2, 2021. ABRVideos YouTube channel: https://www.youtube.com/watch?v=g_iPVOY6TiU; Georgios Kakavas, Nikolaos Malliaropoulos, Ricard Pruna, and Nicola Maffulli, “Artificial Intelligence: A tool for sports trauma prediction,” Injury, August 19, 2019. https://doi.org/10.1016/j.injury.2019.08.033.
13. Alejandro Lopez-Valenciano, Francisco Ayala, et. al. “A Preventive Model for Muscle Injuries: A Novel Approach based on Learning Algorithms,” Medicine & Science in Sports & Exercise: May 2018, Volume 50: Issue 5: 915-27. https://pubmed.ncbi.nlm.nih.gov/29283933.
14. Mark Ogden, “Soccer looks to AI for an edge: Could an algorithm really predict injuries?” ESPN.com, February 4, 2011. https://www.espn.com/soccer/blog-espn-fc-united/story/4306701/soccer-looks-to-ai-for-an-edge-could-an-algorithm-really-predict-injuries.
15. Mark T. Williams, “MLB Umpires Missed 34,294 Ball-Strike Calls in 2018. Bring on Robo-umps?” BU Today, April 8, 2019. https://www.bu.edu/articles/2019/mlb-umpires-strike-zone-accuracy.
16. Katherine Acquavella, “Robot umpires: How it (sic) works and its effect on players and managers in the Atlantic League, plus what’s to come,” CBS Sports, August 27, 2019. https://www.cbssports.com/mlb/news/robot-umpires-how-it-works-and-its-effect-on-players-and-managers-in-the-atlantic-league-plus-whats-to-come.
17. Anne Bonner, “The Complete Beginner’s Guide to Deep Learning: Convolutional Neural Networks and Image Classification,” Towards Data Science, February 2, 2019. https://towardsdatascience.com/wtf-is-image-classification-8e78a8235acb.
18. Larry Greenemeier, “20 Years after Deep Blue: How AI Has Advanced Since Conquering Chess,” Scientific American, June 2, 2017. https://www.scientificamerican.com/article/20-years-after-deep-blue-how-ai-has-advanced-since-conquering-chess.
19. AlphaGo. Directed by Greg Kohs. Moxie Pictures, 2017. Film.
20. “Stealing Baseball Signs with a Phone (Machine Learning).” YouTube, uploaded by Mark Rober, June 30, 2019. https://www.youtube.com/watch?v=PmlRbfSavbI&ab_channel=MarkRober.