Carleton: Mining the meaning in batter-pitcher matchups

From Russell Carleton at Baseball Prospectus on December 2, 2014:

There was one relief pitcher whom former Mariners designated hitter Edgar Martinez really loved facing. In the 23 times he faced him, he recorded 11 hits (three doubles and two home runs) and four walks, for a slash line of .579/.652/1.053. What wretched quad-A filler guy did Martinez light up like this? Some guy named Mariano Rivera.

The batter-versus-pitcher matchup numbers are a staple of baseball broadcasts, and like a staple, they hurt when they’ve been driven into your head over and over. I think that at this point, it’s well understood that the fact that Smith is 2-for-4 lifetime against Jones does not mean that there’s a 50 percent chance of Smith getting a hit in his next at-bat against Jones. We don’t have the sample size to justify any conclusion at all on whether Smith has some special insight on Jones (or Jones on Smith). He might, but statistically, we need to say “Sorry, can’t help you either way.”

While an individual batter-pitcher matchup isn’t likely to produce enough of a sample to make any sort of meaningful inferences, there’s a school of thought that says maybe we could somehow cluster pitchers together. We can look at Smith’s performance against these 20 pitchers who share some meaningful characteristic to see how he has performed. This is essentially what we do with splits. We just assume that because Smith is a lefty, he is at some disadvantage against this particular lefty. That’s at once silly and likely true. We know that platoon effects persist over time, and we know that dividing pitchers into lefties and righties is easy (hi there, Pat Venditte!) and gives us big enough buckets that we can say meaningful things. But there’s probably more to it than just the geometry of left/right and much more that we need to account for.

Beyond just the platoon effect, how can we cluster pitchers (or batters, for that matter) together in a way that makes sense and where we could take “Smith is 2-for-4 lifetime against Jones” from fun fact to actually meaningful information? There’s been some amount of work around how different hitters fare against different pitch types. For example, there are stats on how well different hitters have fared against fastballs (vs. curves vs. changeups). If he’s done well against fastballs, then maybe teams should shy away from throwing him heat. PITCHf/x makes finding those sorts of splits a matter of few clicks on Brooks Baseball.

I’d like to suggest a different way of thinking about the problem. It’s mostly theoretical at this point, but all the good stuff starts out theoretical.

Read the full article here (subscription required):

Originally published: December 2, 2014. Last Updated: December 2, 2014.