3.2 million MLB pitches, all classified and available online

From Dan Brooks at The Hardball Times on February 3, 2012:

Long term PITCHf/x data has always been difficult to find online. There are several existing sources available: Fangraphs has some of it, but not everything you might want (I’m sure they will tomorrow, just for that). Texas Leaguer has had a fantastic tool up for quite some time now, but still, there are places it lacks functionality. Josh Kalk used to have a wonderful website, but he moved on to the Rays. BrooksBaseball has never really had seasonal data per se, despite having data that spans each season.

We think that, generally, there are several reasons the data has been difficult to find in a long-term format. First, there is the technical limitation: The PITCHf/x dataset is large—millions of pitches. This means that dynamic solutions—most PitchFX systems are dynamic—must have very good caching systems, well written databases, and powerful hosting solutions.

But beyond raw computing, which we can solve using some combination of duct tape and Moore’s law, there are really two critical issues that are unfortunately intertwined. ... Quality issues propagate through various parts of the system. So, you really want a very qualified human to do the tagging. But, good luck convincing THT's Harry Pavlidis [a SABR member] or Lucas Apostoleris to tag three and a half million pitches, because that would be insane.

What’s that you say? They’ve actually done that?!

By that, I mean individually tagged every pitch. This isn’t a very efficient solution, but it escapes the problems above by putting a human hand on the classification problem. When the cameras capture internally consistent data with park specific quirks, Harry can find adjust for those quirks and tag the pitches correctly. The raw numbers in the data aren’t changed, but the labels are—solving at least part of the problem that exists in the dataset. It’s not a perfect solution, but it allows us to present you with an enormous database of properly tagged, seasonal PITCHf/x data.


We really, really hope you enjoy this—we think it will be a great addition to the baseball resources on the Internet. Please don’t hesitate to contact us by leaving comments on this article, by leaving messages on the forums, or dropping us an email.

Read the full article here:

Use the Pitch f/x tool here:

This page was last updated February 3, 2012 at 4:50 pm MST.

