Gennaro: Clustering pitchers by similarity, part 2

From SABR President Vince Gennaro at Diamond Dollars on June 3, 2013:

In my last post, I discussed one of my latest research projects, clustering pitchers by their similarities. The problem I’m trying to address with the analysis is to come up with an alternative to what is possibly the overall worst use of quantitative analysis in baseball–evaluating batter-pitcher match ups, based on career historical performance data between one batter and one pitcher. Instead, I’m trying to identify groups of pitchers that are likely to induce similar offensive performance by a single batter. If we can find a cluster of pitchers who present a similar challenge to a hitter, then we can enlarge the sample size of batter-pitcher “results” and at the same time shorten the timeframe over which we are measuring performance. For example, against right-handed hitters, my analysis suggests that lefty pitchers Barry Zito, Mark Buehrle, Paul Maholm, Zach Duke, Chris Narveson, Eric Stults, Joe Saunders and Jason Vargas (among others) are “similar”. This similarity is based on the profiling factors listed in the previous post, including the pitch repertoire, release points, most common 2-pitch sequences, the portion of the strike zone the pitcher favors, etc.

Below is a visual mapping of pitcher clusters. Each node represents a pitcher and each line between pitchers represents a “connection” or a similarity, based on a defined minimum threshold level. This graph includes only LHPs and it clusters them against only right-handed hitters.

