The driving force behind modern baseball statistical analysis, often called sabermetrics, is to replace subjective evaluation with objective analysis. Because the human eye has an inherent bias, the theory goes, opinions based on observation are more likely to be flawed. That’s a very reasonable approach. In fact, we see the same thing in fields like criminal investigation. Ask any detective if they’d rather have an eye witness or a DNA sample, and they’ll usually opt for the latter. The problem with many of the advanced metrics being used today, however, is their application has not been fully vetted. In fact, rather ironically, many rely on a hidden layer of observation.
The area in which sabermetrics encounters the most bias is in the evaluation of fielding. For years, this aspect of the game has defied the establishment of meaningful and reliable statistics. Recently, however, a whole host of measures have been created. Most notable, and most often referenced, among these are UZR, developed by Mitchel Lichtman, and the +/- system, created by John Dewans (both metrics are currently available on Fangraphs).
In a recent post on The Hardball Times (h/t Rob Neyer), JT Jordan took a look at some of the discrepancies that exist between the two metrics. In the analysis, he finds some interesting discrepancies, mostly based on sample sizes and calculation differences. However, what I think is more concerning about each statistic is the underlying data component (more on that later).
In a similar vein, Colin Weyers of Baseball Prospectus raised some questions about the validity of batted-ball data (trajectory data and location) data. There are two companies that currently compile this information: MLBAM, which uses stringers (or observers) at every game, and Baseball Information Solutions (BIS), which uses a video system. According to Weyers, the method of data collection has had a meaningful impact on the eventual conclusions.
Several years back I worked for ESPN/SportsTicker, when it was responsible for compiling the game data that MLBAM now tracks in house. One of my duties was to interact with the stringers in the press box. To say that some were not exactly scientific in their approach would be kind.
Of course, that doesn’t mean the video-based approach of BIS (both UZR and +/- use this data) is beyond bias. In order to eliminate any discrepancies, we’d have to ensure that all of the cameras were similarly situated at every ballpark. Otherwise, a new source of error would be introduced. Consider what would happen if the overhead camera was used to evaluate pitch location. Even the slightest camera adjustment has the potential to yield a different view. With 30 different stadiums requiring 30 different set ups, you can see the potential for error. The same holds true for the pitch/FX data compiled by MLBAM.
Data validity also impacts a variety of other advanced metrics, such as WAR (wins above replacement), that rely on similar applications of potentially flawed underlying information. These too need to be considered in the light of that context.
So, am I suggesting that we throw out all of these measurements and return to the basics like Batting Average, Wins and Fielding Percentage (themselves somewhat subjective due to the role of the official scorer)? Of course not. However, I do think we need to keep in mind the limitations on the “science” that currently exists. UZR, WAR, etc. are not baseball’s equivalent of DNA. At least not yet. All are worthwhile tools, but still only (an increasing) part of the discussion, but not the end.
[…] has released its first installment of 2010 UZR data. I am not a big fan of this metric for reasons previously outlined, but nonetheless, it is gaining prominence and is worthy of […]
[…] worth of data to be accurate. What’s more, because classification involves human intervention, inherent biases and user errors come into play. On that basis alone, the metric seems ill suited to be combined with more refined statistics that […]
[…] worth of data to be accurate. What’s more, because classification involves human intervention, inherent biases and user errors come into play. On that basis alone, the metric seems ill suited to be combined with more refined statistics that […]
[…] hard to believe that Williams was that much of a liability in the field. Considering the many doubts about UZR (the fielding component of fangraphs’ WAR since 2002), it seems intellectually lazy to simply […]
[…] Jeter’s defense has been the foundation of their jest. Putting aside metrics like UZR, which are inherently flawed, it does seem as if the Captain has lost a step in the field. Considering his range was never among […]