A Guide to Sabermetric Research

Editor's note: This is the single-page version of Phil Birnbaum's "A Guide to Sabermetric Research." For the full version, visit SABR.org/sabermetrics.


We're often asked, "I'd like to know more about sabermetrics, but where do I begin?"

Longtime SABR member Phil Birnbaum has authored a Guide to Sabermetric Research to help answer your questions. We're pleased to publish it at SABR.org/sabermetrics.

Birnbaum is the editor of the SABR Statistical Analysis Committee newsletter, "By the Numbers", and he can be found writing on various topics at his blog, Sabermetric Research.

First, let's go over some basics:

  • What is sabermetrics? As originally defined by Bill James in 1980, sabermetrics is "the search for objective knowledge about baseball". James coined the phrase in part to honor the Society for American Baseball Research. 
  • Who invented sabermetrics? Statistical analysis has been around as long as baseball has been played competitively. Long before Moneyball became a worldwide phenomenon in the 21st century and before Bill James' baseball writings gained mainstream popularity in the 1980s, Hall of Fame manager Earl Weaver was using index cards to fine-tune his platooning system and pitching changes with the Baltimore Orioles in the 1960s, while Branch Rickey hired statistician Allan Roth in the 1940s to evaluate player performance with the Brooklyn Dodgers. A generation before that, Baseball Magazine editor F.C. Lane was creating new statistical methods to measure offensive production, culminating in his classic book of essays, Batting. In the mid-19th century, Henry Chadwick is credited with developing the box score and his tabulation of hits, home runs and total bases led to the formulation of metrics such as batting average and slugging percentage.
  • SABR or sabermetrics? With more than 6,000 members around the world, SABR is a membership organization comprised of passionate and knowledgeable baseball fans with a variety of interests — one of them being statistical analysis. SABR members Bill James, Pete Palmer and Dick Cramer co-founded SABR's Statistical Analysis Committee in 1974 and helped popularize the study of sabermetrics. The phrase "sabermetrics" itself is in the public domain and is generally used to describe any mathematical or statistical study of baseball.

Sabermetric researchers often use statistical analysis to question traditional measures of baseball evaluation such as batting average and pitcher wins. Early on, James' theories were largely mocked (or ignored) by the baseball establishment, but as Joe Posnanski wrote in "The Ballad of Bill James", over time his work started to be recognized. Time Magazine once named him one of the 100 most influential people in the world. The Boston Red Sox hired him in 2003 and subsequently won two World Series. James is still asking relevant questions today at billjamesonline.com, and so are legions of his disciples such as Rob Neyer, baseball editor at FoxSports.com; Birnbaum; and all the great writers at Baseball Analysts, Baseball Prospectus, Beyond the Box Score, FanGraphs, The Hardball Times and other sites.

Want a primer on sabermetrics? Check out the FanGraphs Library for down-to-earth explanations of advanced metrics such as wOBA (weighted on-base average), FIP (fielding-independent pitching) and WAR (wins above replacement), written by Steve Slowinski. SABR members can also read cutting-edge articles on statistical analysis in every issue of the Baseball Research Journal, such as "The Many Flavors of DIPS: A History and Overview", by Dan Basco and Michael Davies. We've got a full list of resources on our Related Links page at the end of this section.

Be sure to check out the annual SABR Analytics Conference, where we bring together the top minds of the baseball analytic community under one roof to discuss, debate and share insightful ways to analyze and examine the great game of baseball.

Whether you're just starting out or you'd like a refresher course, whether you're a numbers wizard or you consider yourself math-phobic, we hope you'll find Phil Birnbaum's Guide to Sabermetric Research informative and interesting.



The Basics


Sabermetrics was first introduced to a wide public in 1982, with the first mass-market publication of the Bill James Baseball Abstract. And for my generation of sabermetricians of a certain age, this was the very first sentence about sabermetrics that we ever read:

“If you sometimes get the feeling between here and the back cover that you are coming in on the middle of a discussion, it is because you are.”

That is: Bill James and a handful of colleagues, mostly SABR members, had been working on a body of knowledge for a few years. There was an established, although private, literature of sabermetrics, and part of James’ task was to explain what had already been discovered, and how.

That was a number of people “who could congregate peacefully in the restrooms in the left field bleachers of Yankee Stadium,” working for a few years, without computers or formal publication. Still, those few researchers had built a considerable base of knowledge that we had to be caught up on.

Imagine, then, the situation today. Sabermetrics has been in full force since the mid-1970s. “By The Numbers,” the SABR Statistical Analysis Committee newsletter, has been publishing since the late 1980s. Before that, there was Baseball Analyst, Bill James’ own sabermetrics journal in the 1980s. With the advent of Rotisserie/Fantasy Baseball, an industry of professional sabermetrics research sprang up. Publications like Baseball Prospectus and Baseball Forecaster do their own proprietary research and publish some of it in their annual books.

And, most importantly, in the past few years, “amateur” sabermetrics has found its stride and, in my opinion, taken over the lead. In the past half-decade, a vast number of researchers have published to websites and blogs, giving us serious, state-of-the-art results that are instantly seen by thousands in the community, who often build on the findings and take them further.

Five years ago, I would have argued that the main outlets for sabermetric research were print publications, and that a few books and websites could bring you reasonably well up to date on what sabermetricians had learned over the years. But, now, things have moved so fast that it’s hard to keep up, especially with articles and papers and studies spread all over the web.

It’s a little like the software industry. In the 1990s, almost all software came shrink-wrapped from retail stores, and most of it was by big industry players, such as Microsoft and IBM. Today that still exists, but with open-source, shareware, file sharing and hundreds of third-party iPhone apps created every year … well, now it takes some effort to keep track.

Still, the basics haven’t changed that much. As with any science, the earliest discovered principles tend to be the most fundamental, and, over time, there gets to be a bit of an unwritten consensus of what findings are most important. So I’m going to do my best here to give you a short reading list of “classical” sabermetrics, a way to try to get a good feel for what sabermetrics has been up to over the past few decades.

The Bill James Baseball Abstract, 1982

This work is three decades old and counting, and it’s getting harder to find. Still, it remains the best place to learn what sabermetrics is, how it works, and how sabermetricians think.

That’s all attributable to Bill James himself. Not only did he make most of the discoveries in the book (there were other sabermetricians active at the time, but James was well over 90% of the field), but his writing style makes the explanations effortless. Anything by Bill James is a joy to read.

If you can’t find the 1982 edition, try whatever other years you can find. Generally, the earlier the year, the more space is devoted to the basics.

Mathletics (Princeton University Press, 2009)

Wayne Winston, a professor and consultant to the NBA’s Dallas Mavericks, wrote this 2009 summary of sabermetrics findings in various sports. The baseball section comprises seventeen basic explanations of various sabermetric principles, such as runs created, streakiness and momentum, pitcher evaluation, and situational strategy.

There’s no original baseball research in Mathletics, but if you want a quick and concise introduction to some of the basic findings in the field, this is the book to get.

The Hidden Game of Baseball (Doubleday, 1984)

This book, by sabermetrician Pete Palmer and baseball historian John Thorn, is considered by many to be the “bible” of sabermetrics. I’d consider it a complement to the Bill James books.

While James developed some methods and formulas by trial and error, Palmer mines historical data and shows the theoretical underpinnings of the methods he uses. If you like a more mathematical approach to sabermetrics, this is the work that lays the foundation.

Thorn and Palmer’s book will tell you, for instance, that a leadoff double helps the team by an average of .614 runs. How do they know that? Well, they looked at many years of play-by-play data, and they found that, on average, .454 runs are scored in the average inning. But, with a runner on second an nobody out, an average 1.068 runs were scored. And so, the double is worth the difference between the two situations, which is 0.614 runs.

While the Bill James Baseball Abstract is the philosopher and theoretician of sabermetric thought, The Hidden Game of Baseball is its engineering department.

The Book (Potomac, 2007)

A collaboration by three exceptional sabermetricians, The Book studies over 100 different questions on baseball strategy. While it does cover topics that have previously been studied by others, it does so, usually, with much greater rigor. For instance, when looking at player performance in various situations, The Book will often correct for park, home/road, the identity of the opposing pitcher, the ball/strike count. As a result, its conclusions are very detailed and very well-considered.

The Book is intended more for fans with a hardcore interest in sabermetrics and strategy issues. It’s included here because it has been so influential among current researchers, and you will see its ways of thinking, especially as described in Chapter 1, repeatedly surface in emerging research.

If the three books above comprise the reading list for Sabermetrics 101, then The Book could be the text for Sabermetrics 301 or 401.


Asking the Right Questions


“Sabermetrics” is the search for objective knowledge about baseball through analysis of the statistical record. At its most basic, the evidence is just simple observation and counting. For instance, in early 2010, sabermetrician Dave Allen wondered if better hitters get fewer good pitches to hit. He looked at twenty of the best hitters in baseball, and twenty of the worst. He found that, at almost every ball-strike count, the better hitters were thrown fewer strikes and fewer fastballs. For instance, on 0-0, the worst hitters got 66% fastballs, while the best hitters saw only about 63%.

Not every question in sabermetrics is that simple. One of the most controversial questions in baseball statistical analysis is that of clutch hitting. Do some hitters have the ability to “turn it up” when the game is on the line, and perform better than usual? Do other hitters have the opposite tendency, hitting better when it doesn’t matter as much?

Many studies have been done on the topic, starting as many as 30 years ago. In 1977, Dick Cramer analyzed batting records from 1969 and 1970, and found only a very slight tendency for “clutch” players to repeat their performance in subsequent seasons. In 1990, Pete Palmer (PDF, page 6) studied clutch hitting over multiple seasons, and found that there were almost exactly as many apparent “clutch” and “choke” hitters as you would expect by luck, if clutch hitting skill didn’t exist at all.

A few years later, Tom Ruane repeated a version of Palmer’s study, using a larger dataset, and got roughly the same result.

Finally, in 2006, Andy Dolphin did a more sophisticated mathematical analysis, and found evidence for a very, very slight variation in how players varied in the clutch. But he concluded that it’s impossible to discover, with any degree of accuracy, which players are which, and that “for all practical purposes,” players should be expected to hit no better or worse in the clutch than their normal performance would suggest.

Sabermetrics is a science, which means that it follows the scientific method. Conclusions must be based on evidence and logic, and any conclusions can be re-evaluated or overturned if new, contradictory, evidence turns up. Right now, the evidence suggests that clutch skill is a very minor factor in player performance, if indeed it exists at all. It’s certainly possible that some future sabermetrician will find differing data, or a flaw in the previous studies, and force us to change our minds on the question. But if we do change our minds, it will be because of empirical evidence for the other side.


A Primer on Statistics


For the typical fan, sabermetrics doesn’t represent anything as theoretical as scientific inquiry. Rather, sabermetrics is associated with new and unfamiliar statistics. OPS is the most famous of those new stats. It’s gone from a nearly unknown statistic in the early 80s, to barely used a decade ago, to mainstream now (it even appears on Topps baseball cards). There have also been stats like Linear Weights, Runs Created, Extrapolated Runs, WAR, and so on.

I’d still argue that sabermetrics isn’t really about those statistics; rather, the statistics have been proven to be useful based on evidence that sabermetricians have uncovered. “Runs Created,” for instance, is a statistic that was created by Bill James in the late 1970s. James’ thinking went this way: a team’s job on offense is to score runs – the more runs, the better. Suppose you didn’t know how many runs a team scored, and wanted to make an estimate, based on its batting line. For instance, here’s a real team batting line:

161 5517 1451 234 22 214 604 908 .263


How many runs would you guess that team scored that year? If I made you guess, you’d probably look over a few years of team statistics, try to find some team that was reasonably close, and use that as a baseline. You might find a team that hit .267 with less power, and scored 788 runs. You’d figure, “well, this team hit only .263, but they had a few more home runs, so I guess maybe they’d cancel out, so I’d guess the same 788 runs. But, wait, this team had about 20 more walks than the other team, so maybe I should bump up my estimate to 800 or something.”

What Bill James probably did was work through logic like that, and, after some trial and error, come up with the Runs Created (RC) formula. That statistic is intended to provide a formal way of estimating how a batting line translates into runs. In its most basic form, RC looks like:

Runs Created = (TB) (H+BB) / (AB+BB)

If you plug the numbers in from the above batting line, you get

Runs Created = (2371) (2055) / (6121)

which gives 796 runs.

As it turns out, that was actually the batting line for the 1985 Baltimore Orioles. They actually scored 818 runs. The estimate is off by 22 runs, which is very good, a little better than typical.

Why is Runs Created important? Why do we need RC if we already know the Orioles scored 818 runs? Well, knowing that there is a predictable relationship between a batting line and runs is useful when we don’t know how many runs we actually have. For instance, we can use RC on an individual player’s batting line. Here’s Albert Pujols in 2009:

160 568 186 45 1 47 115 64 .327


Using the basic RC formula, we can estimate that if a given major league team had a batting line like Pujols did, it would score about 149 runs. That batting line would comprise about 15 games, which gives about 10 runs per game.

What we can conclude, then, is that if you put together a lineup of nine Albert Pujols clones, on average they’d score 10 runs per game. That’s a huge total – the average MLB team scores somewhere between 4.5 and 5.0.

We can compare Pujols to Joe Mauer, or Adam Lind, or Alex Rodriguez, to help inform our conclusions on how much each contributed to his team, or even to our arguments about which player deserves the MVP award.

Runs Created is one of the most famous of the statistics used to evaluate offense. Others include Pete Palmer’s "Linear Weights," Jim Furtado’s “Extrapolated Runs,” and David Smyth’s “Base Runs.” All are very good estimators. But which is the best? Well, that depends. No estimator is perfect, and all have their strengths and weaknesses.

One way to compare the various estimators is to test them for accuracy. Apply them to the last (say) fifty years of baseball, which should give you around 700 team-seasons. Have them each estimate runs for all 700 teams, and see which ones do the best.


Offensive Statistics — A Caution


What does all this have to do with how to do baseball research? Well, it brings me to my first suggestion: if you’re just starting out, you might want to consider researching something other than coming up with new ways to evaluate player offenses.

It’s just that it’s been done to death. I’ve listed four different statistics that evaluate offenses, and there are even more than those. All of them are pretty good, and all of them are pushing the limits of how accurate a statistic can possibly be.

Now, I’m not saying that there’s no way you’ll do better. I would have thought the same thing maybe 20 years ago, that there was no way to beat Linear Weights and Runs Created – but then David Smyth came to invent Base Runs, which, by some measures, is the best yet. My advice is not to suggest that you can’t do better, but, rather that your research effort may yield more fruit if applied elsewhere.

But, on the other hand, evaluating players is fun. And if this area of sabermetrics is something that you find most interesting, then go ahead! But if you come up with a new statistic, you will be expected to come up with hard evidence that yours works better than any that are already out there. It’s not enough to argue theoretically why it should work — you have to prove it does.

There’s a sabermetric adage: Just because a statistic has Babe Ruth on top and Mario Mendoza on the bottom, that doesn’t mean it’s accurately measuring what it’s supposed to measure.

So, as you work on your new statistic, keep these points in mind:

  • It’s possible to get more and more accurate by including more and more information. The version of Runs Created includes only six data items: AB, H, 2B, 3B, HR and BB. Obviously, you can get more accurate if you include SB and CS, and HBP, and SF, and other information. Indeed, some of the other statistics already include those categories, so when you compare your statistic to others, make sure you use the equivalent version, to ensure you’re comparing apples to apples. If you show that your statistic that includes 20 categories is more accurate than a statistic that includes only six categories, that’s not necessarily a breakthrough.
  • It is possible to get very accurate if you include “situational” statistics that give information about when the various events happened. For instance, if you were to add “batting average with runners in scoring position,” you’d increase the accuracy of your estimates quite a bit. But you wouldn’t necessarily increase your statistic’s usefulness.
  • If you’re trying to show how various factors lead to runs scored, you can’t include categories that are based on how many runs actually scored! For instance, you can do a lot better than Runs Created if you include “runners left on base.” For instance (H + BB – CS – DP – runners left on) is almost exactly equal to runs! That’s because it’s almost equal to (runners reaching base – runners who didn’t score), which is exactly the definition of runs.

After keeping all this in mind, if you do come up with a statistic that you can demonstrate is more accurate than its counterparts, you’ll have something of very high interest to the sabermetric community. But, again, as I said, you have an uphill climb. This is the one area of sabermetrics that has had the most effort poured into it over the past three or four decades, and a better mousetrap will not be easy to invent.

A similar caution applies to any new statistic, especially one that’s supposed to evaluate or rank players or teams in some dimension. If your new stat is trying to estimate something that can be measured, show how well it does that, especially compared to any other stats that are out there. And if it’s trying to estimate something ethereal, like “consistency” or “durability,” something that doesn’t have a real definition, how do you know that you’re measuring it the best way possible? There’s nothing wrong with a statistic like that — Bill James has “speed score,” which estimates the fuzzy notion of a player’s “baseball speed” — but be aware that those kinds of things are rough tools, not strong empirical findings.


What To Research


In sabermetrics, as probably like any other discipline, there’s no official list of topics to research. Most sabermetricians just study what they’re interested in. Often, ideas for subjects come up during conversations with other fans. You’ll be talking baseball over a beer, and someone will say, “well, I’m worried about the Indians next year … they went 7-25 in September and October, and that’s probably a bad sign of things to come.”

And you think, hmmm, I wonder if that’s true, that a bad September is likely to be a negative indicator for next year’s performance? And, suddenly, you have a topic to study.

Another common source for ideas is baseball broadcasters – they’ll make some claim on the air, without giving evidence, and you spot an opportunity to check if what they say is true. Bill James used to do this a lot.

Or, you might be reading a certain study on one of the many sabermetric internet sites, and someone makes a suggestion in the comments – or, the study raises a question in your mind that you think it would be interesting to investigate.

If you’re just starting out, my suggestion would be to start fairly simple. One possibility is to find a bunch of old Bill James Abstracts, and read through them (which I recommend you do anyway, if you’re new to sabermetrics). Those books are full of little studies that Bill James throws in when a question occurs to him, and those might lead you to related questions that you can test. Even repeating one of Bill’s studies with more current data can be useful.

For instance, in the 1982 Bill James Baseball Abstract (Ballantine, 1982), Bill lists the average attendance for every starting pitcher in the major leagues, and finds that the only pitcher who reliably seemed to draw fans, in 1981, was rookie phenom Fernando Valenzuela. It immediately occurred to me: is it still true that the starting pitcher doesn’t affect attendance? I’d love to see a similar study for recent years*. I’d also love to see someone take this a bit further. Bill just eyeballed the data before concluding that there didn’t seem to be an effect. But might there be a small effect that you’d find if you looked harder? You might check whether the better pitchers tended to draw more fans than the worse pitchers, after adjusting for day, weather, and opponent. Maybe there’s a small effect, but maybe there’s not.

* UPDATE: it turns out that someone has followed up Bill's study! In an excellent piece in The Hardball Times 2012 Baseball Annual, Max Marchi looked at all pitchers since 1947, adjusted for overall trends, and found many great starters who drew in the fans. Nolan Ryan was the career leader, with 641,000 estimated extra tickets sold, while Mark Fidrych had the highest season average, with a total of around 300,000 tickets over three years.

The nice thing about using the Bill James Abstracts for ideas is that Bill tends to use straightforward techniques that don’t require any formal statistical expertise. His techniques may not be formal enough for, say, academic journals, but they’re excellent nonetheless, and they have enabled Bill James to teach us more about baseball than any other sabermetrician.

Of course, if you do have some expertise in statistical techniques, that will help too. For the attendance study, you might run a regression to predict attendance based on team, day of the week, opponent, and starting pitcher’s quality. But, even if you don’t use a formal statistical technique (and, for the record, I think in all of Bill James’s work, he’s used linear regression maybe twice), with a bit of creativity you can usually still figure out what’s going on.

Once you’ve settled on a question, you have to figure out how you’re going to work your way to an answer. That’ll be difficult without some knowledge of sabermetrics. There’s no field of human knowledge where you can just jump in without some basic understanding of how the field works and what’s already been done.

Indeed, if there were only one piece of advice I was allowed to give to aspiring researchers, it would be: learn some sabermetrics first. As my friend John Matthew IV said, “If you were interested in astronomy, you would read at least a few books before trying to predict the path of a comet.”

And so: know some of the sabermetric canon. In the next section, I’ll outline what might be a reading list for “Sabermetrics 101.”

Also, before you start working on your problem, you’re going to want to check whether others have worked on the problem before. Maybe they’ve already done the exact same thing you’re planning to do. Maybe they’ve gone only part of the way, and you can expand on what they’ve done. And maybe they’ve thought of some things that you haven’t, or maybe you won’t agree on how they did it.

In any case, no matter how knowledgeable you are in sabermetrics, nobody is aware of everything. Before you start, you’ll want to search the literature, to see what progress has already been made on your problem. We’ll talk about that a bit later too.


Literature Search


So you’re at the point where you have a research idea in mind. Your next step, then, is to find any previous work that’s already been done on your topic.

In academia, there’s a conventional wisdom on how to do a literature search, and a lot of it involves indexes to scholarly journals that cover your topic. In sabermetrics, however, it’s not quite so simple — much of the best research is published online, on any one of hundreds of websites, without a formal peer-review process to separate the good from the flawed.

So, as much as we might wish there were a step-by-step process for finding existing work, the reality is that it becomes a bit of a seat-of-the-pants thing. Some suggestions, though, for how to proceed:

1. Scan the research repositories

While most sabermetric work of recent vintage is web-published, there are still several more formal repositories of studies. The advantage of those is that, if they’re all at one specific website, you can search them online by using any normal search engine (such as Google), but using the “advanced search” feature to ask for results only from that one site.

Some specific places to look:

  • Every back issue of SABR’s “By The Numbers” is available. There is a repository at the SABR website and at my own website, www.philbirnbaum.com.
  • In the 1980s, Bill James edited the “Baseball Analyst,” a sabermetrics newsletter that went out to what I think were only a few dozen subscribers. In 2012, SABR published those online for the first time at sabr.org/research/baseball-analyst-archives.
  • Tom Tango, one of the leading active sabermetric researchers today, has some of his own studies at his website, tangotiger.net.
  • Tango and his co-authors of “The Book” have set up a wiki, an open-source encyclopedia of sabermetric subjects. There has been some talk of abandoning the project, but, at time of writing, it’s still active at tangotiger.net/wiki/index.php?title=Main_Page.
  • Charlie Pavitt, a SABR member and regular contributor to “By the Numbers,” has compiled a bibliography of published sabermetric papers. It’s dedicated to only the more formal publication outlets, so it’s missing a large part of the recent explosion in web research. Still, it’s a worthy source. A description can be found here and the bibliography itself can be found here.

2. Search the biggest websites dedicated to sabermetric research.

My advice would be to start by searching "The Book" blog. There, Tom Tango reviews, or at least mentions, a large proportion of the most significant studies. Also, the site has, in my opinion, the densest population of knowledgeable commenters; almost always, you learn more from the comment discussion than from the studies themselves. Comments do show up in the searches, I believe.

From there, consider these other sites:

In 2010, Beyond the Box Score held a poll to vote for the best sabermetric websites and studies. All the nominee websites are worth a look and a search, and can be found at http://www.beyondtheboxscore.com/2010/1/21/1263306/your-btb-sabermetric-award-voting.

3. Ask.

Perhaps the best way to find research on a certain topic is to ask around. There are various places to ask, but, before doing so, please spend some time looking for yourself. That’s just a courtesy to those to whom you are requesting assistance. I have had people e-mail me about finding research on topic X, when they could have found what they’re looking for by doing the simplest search for “X” on Google.

People are generally very willing to help when you show what you’ve tried, and you let them know what you’ve found so far.

Places to ask:

  • One good bet is to write to authors of studies on topics that are close to yours. If you’re thinking of doing a study on how accurate scouts are when they evaluate pitchers, and you find a study on how accurate scouts are when they evaluate batters … well, the author is probably as interested in the subject as you are, and is likely to be able to help. Even if the answer is, “sorry, I don’t know of anything,” that’s a sign that your topic may indeed be a fresh one.
  • Most websites allow comments on the studies they publish. If there’s a topic that’s similar in some ways, post a comment asking about your topic.
  • Ask on e-mail forums. SABR has SABR-L, which is probably a bit too general for many detailed sabermetric inquiries, but still worth a shot. A better place is the Yahoo group “statisticalanalysis,” which is free to join for SABR members with an interest in sabermetrics.
  • Finally, you can try specific people. I don’t mind an occasional inquiry, and I’m sure many others are happy to answer too. If you’re stuck, you can always try writing to someone who you know is an active researcher. Many of the sabermetric websites have links to contact authors. Sabermetrician John Doe may not have published anything that touches on your specific topic, but if he publishes a column every week and a research paper once a month, you wouldn’t be out of line occasionally sending a courteous request for assistance.


How to Find Raw Data


Back in the beginning days of sabermetrics, data was hard to come by. Some things weren’t too bad — if you wanted to know Bill Terry’s batting average in 1933, there were two encyclopedias, Macmillan and Neft/Cohen, that would tell you. But if you wanted more esoteric statistics, like Joe Morgan’s career performance with the bases loaded, you were out of luck.

When Bill James started writing his self-published Baseball Abstracts back in the late 1970s, he had to compile situational statistics himself, from the daily box scores, without a computer. At the time, Bill marketed his book as “featuring 18 categories of statistical information that you just can’t get anywhere else.”

James found that he had to keep compiling those stats even into the 1980s; famously, in his 1981 book, he reprinted a letter from the Chicago Cubs refusing to provide him with such “intelligence-type” stats.

Now, of course, things are different. There is no shortage of almost any kind of data. My four favorites — in rough order of increasing detail — are:

MLB's website provides copious statistical data, sortable and printable, updated instantly as games progress. But that stuff can be found elsewhere. The main attraction of the MLB website is that it provides PITCHf/x data. That is, for every pitch thrown by any pitchers in MLB, they’ll tell you the type of pitch, where it crossed the plate, and how much it broke vertically and horizontally. As a result, and not surprisingly, much of the groundbreaking research these days has to do with pitch analysis.

Easily the best source for precalculated historical statistics is Baseball-Reference.com (B-R). That site has pretty much rendered printed baseball encyclopedias obsolete. Not only do you get the regular Bill-Terry’s-batting-average data, but you also get a large selection of sabermetric stats, breakdowns by tens of different criteria (left/right, day/night, April/September, and so on), and the ability to manipulate the data in ways that other websites don’t allow. You can also do absurdly specific searches. Want to know Joe Morgan’s longest consecutive streak of games where he came to the plate at least twice? The answer: 235 games. (If you want the details, you have to subscribe, but the overwhelming majority of the information on the site can be had for free.)

For those of us who want to do more complicated things, Baseball Reference, awesome as it is, just isn’t enough. We need the raw data on our own computers, so we can manipulate it in ways that B-R never thought of. There are two main sources of raw data: the Lahman Database and Retrosheet.

The Lahman Database can be obtained for free at seanlahman.com/baseball-archive/statistics, the website of its creator, Sean Lahman. It’s basically a standard Baseball Encyclopedia in downloadable form. You can get it in text form, for loading into Excel, but, more importantly, it also comes in relational database format (Microsoft Access). If you’re familiar with Access and with SQL database queries, you know how convenient it is to use it to do powerful, specific data searches quickly. (If you’re not familiar with SQL, there have been a few tutorials on sabermetric sites recently.)

Anyway, the Lahman Database has every player’s standard batting and pitching line for every year. It’s got managers, birthdates, awards, all-star games, and other good stuff. Its limitation is that data is available only for single seasons — if you want to know how Eddie Murray hit in July 1979, there’s no way the Lahman Database will tell you. For that, you have to turn to Retrosheet.

Retrosheet is, basically, a miracle. It’s the result of a small army of volunteers, combing historical sources to try to re-create the play-by-play of every game in baseball history and digitizing it for download and analysis. I can’t begin to imagine how difficult it is to find all that information, to reconstruct the top of the 6th inning of the Cardinals/Phillies game of April 29, 1953. But they did. (D. Rice grounded out (shortstop to first); Presko popped to first in foul territory; Hemus popped to first in foul territory.)

You can also see the entire career of any player, game by game. You can see the standings and results from any date in baseball history. You can see a coach’s career, which teams he coached for and what he coached, and even how many times he was ejected.

You can see this stuff online, or, if you have computer data-manipulation skills, you can download it and work with it yourself. You can load the data into Excel and write macros to manipulate it. Or, you can write programs to analyze it; I use Visual Basic, but any language will do. There’s a 2006 book called “Baseball Hacks” (O’Reilly), which explains how to use a computer language called “R” to download and analyze Retrosheet data (and, actually, lots of other baseball data that can be found on the internet).

Not all of baseball history is available on Retrosheet — yet. The volunteers are still working on it, though. (Want to help? Click here for details.) For now, you can see game-by-game summaries from 1871 on. You can see box scores for more than 90 percent of games since 1916. And, if you want full play-by-play data, it’s available for any game after 1952, and a large number of games before that. Some years even include pitch-by-pitch data, in terms of ball, strike, foul.

The result of literally tens of thousands of hours of volunteer labor, Retrosheet is the greatest sabermetric resource ever.


Computer-Aided Research


Before the 1990s, a significant proportion of sabermetric research was done without the benefit of computers — or at least, without the kind of computer power and software we have today. A great deal of statistical information had to be compiled by hand, or typed by hand into spreadsheets. As a result, many studies used only a small amount of data, in order to keep the workload manageable.

Things are different now, of course, and it’s harder to study new areas without the benefit of a computer and a good baseball database. That’s because a lot of the low-hanging fruit has been picked, and we’re now looking for more and more subtle effects. In 1977, Dick Cramer’s clutch hitting study consisted of only two years of batting average data, entered by hand. From that, he was able to find that clutch hitting consistency was next to nothing. But, that was only one year’s data, not enough for a definitive conclusion. It took others, with more sophisticated computers, and existing baseball databases, to refine that result to the level of understanding we have today.

In an early 2000s essay on this topic, Neal Traven wrote, “the computer is almost an obligatory … tool for sabermetric research.” That holds even more today.

It’s unfortunately true that you’re going to need a certain amount of computer skills in order to be able to take a huge mountain of baseball data and try to squeeze conclusions out of it.


Where To Publish


The nice thing about living in the second decade of the 21st century is that it’s ridiculously easy to get your research out to the sabermetric community.

If you prefer something more formal than a website, you can start with “By the Numbers,” SABR’s Statistical Analysis Committee newsletter. The advantage of BTN is that you have a semi-formal citation (publication, date and page number) that might be of some use to you, such as on a resume. Also, you retain full rights to your work, so you can publish it elsewhere any time afterwards.

SABR also publishes the Baseball Research Journal (BRJ) twice a year. Your work will get more exposure there, as the book is sent out to all SABR members in hard copy form. Many authors prefer to be published in a “real” book such as this one, over alternatives such as newsletters or websites. Click here for tips on submitting an article to the BRJ and other SABR publications.

However, the advantage of publishing to a website is that you get instant feedback and discussion. Articles that are particularly interesting or controversial will attract tens or hundreds of comments, be linked to on other websites, and can very quickly attract the attention of many, many active researchers.

One of the most significant findings in sabermetrics, Voros McCracken’s DIPS theory, was published online in 2001 at Baseball Prospectus (where it still lives today.) The fact that it didn’t appear in a formal paper publication didn’t stop it and its author from becoming legendary.

If you’re interested in online publication, many of the independent sabermetric websites are happy to consider worthy submissions. As a bonus, both BTN and BRJ are willing to accept research that has been published on the web — so if you go that route, you can have the best of both worlds.




When you set out to disseminate your research, be ready to take comments and criticisms. The logic of a statistical argument isn’t always straightforward, and it’s quite possible that your study doesn’t show what you think it shows. Even though you won’t necessarily get the formal, blind-refereed, peer review that you would in academia, the informal peer review process of the sabermetric community is still very effective in spotting flaws in an analysis, so be ready for constructive criticism as you shop your paper around.

You’ll also get some criticism that you don’t agree with. That’s how life is: People disagree on what a result means or how a methodology works. It’s important to engage readers who disagree with you, even if you think they’re wrong. There’s nothing more frustrating than a researcher who posts a piece and refuses to discuss criticisms of it. Be civil and meet questions head on.

Of course, not everyone is logical and reasonable and will agree that you’re right (even if you are, which you might not be.) At some point, the discussion may fail to advance any further, and you might have to agree to disagree. That’s OK. And, a nice bonus is that a public discussion on the internet will attract others, and you probably won’t have to do all the defending yourself.

Finally, there are always idiots online who just don’t get it, or who will insult you for the sake of insulting you. Just keep a thick skin.




Sabermetrics has a mixed reputation in the outside world. In mainstream sportswriting, it’s sometimes seen as something nerds do from their parents’ basements, something real sportswriters don’t need because they see all the games and know all the players. In academia, it’s not always respected as serious research, because it often doesn’t fit into any specific established discipline (although economists are starting to get involved), because it often doesn’t use enough fancy math, and because it’s “only” about baseball. And it used to be that in baseball itself, sabermetrics was not perceived to be anything that would be of use to the insiders of a major-league team.

But the situation in MLB is changing, perhaps due to Moneyball (Norton, 2004), Michael Lewis’ story of how Billy Beane’s Oakland Athletics used sabermetrics to build a winning team on the cheap. In 2003, the Red Sox hired Bill James. Since then, other teams have hired statistical analysts and begun advertising for similar positions.

Still, the serious study of baseball through its statistics isn’t taken all that seriously outside of the Moneyball crowd. Over the past couple of years, there have been several university professors who have had their schools issue a press release when they came up with something sabermetric. Usually, those academic studies aren’t any more worthy of special recognition than many other studies published on the Internet at the same time. But I guess baseball is a subject that many consider less serious, than, say, sociology, so the idea that people study it in earnest becomes a bit of a novelty.

Even if the wider world doesn’t see sabermetrics as completely serious, its practitioners do. In one recent university press release, the professor expresses his interest in someday getting his “dream job” doing sabermetric consulting for a major-league team. That’s something a lot of sabermetricians would be interested in, obviously. Many have already gotten there, in recent years.

But there will probably always be more sabermetricians than employment opportunities. For most of us, the motivation for sabermetrics is not the glamour of having an inside job with a baseball team, but just our interest in baseball. And scientific curiosity is a big factor too. Because of the abundance of cheap data, its relative neglect by the academic community, and the fact that the science is so young, sabermetrics is perhaps the best serious field where part-time researchers can routinely make the most significant discoveries. And there’s a certain thrill in creating new knowledge, discovering something that nobody knew before.

And if the thrill of scientific discovery isn’t enough, the fact that those discoveries are about baseball — for many, our favorite subject on earth — is icing on the cake.


Related Links


Here are some related links to help you learn more about sabermetrics: