Gennaro: Clustering pitchers by similarity

From SABR President Vince Gennaro at Diamond Dollars on April 22, 2013:

About six weeks ago I presented some of my latest research at the SABR Analytics Conference in Phoenix. The analysis focused on identifying pitchers who are similar to one another, grouping them into clusters, and determining how hitters have performed against various clusters. I worked closely with George Ng a data scientist at YarcData and made use of their sophisticated uRiKA hardware appliance, which specializes in graph analytics.

The intent of the project is to develop an alternative to the relatively uninformative one-on-one batter-pitcher match up data that teams tend to use to inform their lineup, pinch-hitting and bullpen match up decisions. Their are numerous problems with relying on the one-on-one batter-pitcher history, including small sample sizes and data that is old and stale. Is it relevant that Derek Jeter’s career stats vs. Roy Halladay includes a 4 for 10 in 1999?

The process to create pitcher clusters begins with determining the attributes that will define “similarity” between pitchers. I chose to tackle this issue from the batter’s perspective. In other words, what criteria would hitters use to “type” a pitcher? I matched the criteria–in the form of questions, with Pitch f/x attributes. The framework, which includes about 12 different attributes, is detailed in the chart below. Keeping with the approach of judging similarity from the perspective of the hitter, I segmented the data for each pitcher, based on left-handed vs. right-handed hitters. In other words, Jered Weaver wasn’t profiled once on these attributes.

Read the full article here:

Originally published: April 22, 2013. Last Updated: April 22, 2013.