clock menu more-arrow no yes

Filed under:

A caution when using the new MLB StatCast exit velocity data: a significant amount of exit velocity readings are missing from the database

New, 4 comments

StatCast exit velocity data will be wonderful long term, but it currently has one major flaw: tons of data is missing.

MLB introduced StatCast this year, which is a tracking technology that gathers and displays the location and movement of baseballs and players (read a more detailed writeup of StatCast here). The one major measure from StatCast that MLB has released to the public is exit velocity, which measures how fast the ball comes off the bat of a hitter. This is a great development for the baseball public, because exit velocity is one of the key measures that progressive front offices inside the game use in their player evaluation.

It makes sense that the harder a ball is hit off the bat, the more likely it is to go for a hit. Through June 17, higher exit velocities have resulted in higher batting averages. This, from the excellent Daren Willman, who runs the incredible website BaseballSavant.com:


One of my favorite features of the new StatCast data is exit velocity charts, showing peaks and troughs by week. This data could help explain slumps, or if a hitter is being impacted by playing through a nagging injury. Most recently, I used Wilmer Flores' average exit velocity chart to show a significant uptick in exit velocity since moving to 2B. I used this as some support for the idea that Flores will now be a more productive hitter because he's playing a more comfortable position on defense, something that his manager Terry Collins has suggested, and has also been reflected by his positional splits throughout his short MLB career.

Unfortunately, it was brought to my attention in a reply to that tweet by a twitter user named Andrew (@DerpyMets) that significant amounts of exit velocity readings are missing from the public StatCast database, making the Flores example potentially invalid.


Andrew has the entire database in SQL and has been analyzing the information, and he estimates that 33-40% of the exit velocity readings are missing from the database on the year. As an example, from Sunday's games, 570 batted balls have exit velocity readings, while 218 do not.

This seems like a major problem. What good is an average exit velocity chart if 33-40% of a player's batted balls are missing from the database? That could significantly change the average velocity reading, either up or down, and provide misleading information.

It isn't clear exactly why such a large amount of data is missing. Andrew isn't sure, but he theorizes that MLB may want to limit the public data at the present moment. A different theory is that that the system is new, and there are just issues with it that will need to be ironed out over time, similar to when PitchFX was first released in 2007.

The bottom line is that the current public data on exit velocity is incomplete, and while the limited data still holds some value, it's probably best to avoid average exit velocity readings until the technology irons out its issues.