Thursday, March 28, 2013

Why the Stats Say Martin Truex Jr and Jimmie Johnson are More Similar Than You Think

By digging deeper into NASCAR's Loop Data statistics, we can now run more unique analysis, measuring driver performance from a smarter angle.

One way of presenting this information is by visualizing a correlation matrix.

Very quickly: correlations between a pair of drivers measure how they perform together. High-correlation drivers will perform well at the same tracks, and poorly at the same tracks. They go up and down similarly. Low-correlation drivers will have performances that are unrelated to each other.  A correlation matrix simply puts into one grid the measurement of correlations between every pair of driver.

I spent time analyzing driver performances throughout the entire history of NASCAR's Loop Data statistics (starting in 2005). By using correlation matrices, we can quickly see which pairs of drivers have similar profiles in their results from track to track.

The most amazing insight that comes out of this research is how good Martin Truex, Jr. compares to certain championship drivers.

In the charts below, the darker boxes mean the two drivers that intersect have a high correlation. The diagonal line of solid purple boxes just means that a driver is 100% correlated with himself.

One note: In the charts below, we include all drivers that ran a minimum of 250 races during 2005-2012.

Look below at the correlation matrix for accumulating Fastest Laps during a race:





Greg Biffle and Matt Kenseth
Focus on the squares shaded in dark-blue. These are the pairs of drivers with the highest correlations:

1) Greg Biffle and Matt Kenseth have a high correlation. The data suggests they are both fast or both slow together. This all makes sense because they both drove for Roush Racing, and they clearly must have benefited from having the same equipment, crew chief notes, and setups.

2) Similarly, another pair of high correlation drivers are teammates Jeff Gordon and Dale Earnhardt, Jr (notice their dark intersecting square). They both drive for Hendrick, and so it makes sense they would have a high correlation in fastest laps. Also from Hendrick, notice the Jeff Gordon / Jimmie Johnson pairing also shows a high correlation. Both of these Hendrick pairs suggest that their equipment and teamwork are causing them to rise and fall together. When one driver has a lot of speed, the others will too.

3) We also see Ryan Newman and Martin Truex, Jr. with a high correlation as well. They tend to accumulate fastest laps at the same tracks, but this is interesting because they drive for different teams. Is there something about their driving style that explains why they perform similarly?

The next chart focuses on Laps Led per race:
We see one pair of dark squares that sticks out: a high correlation between Ryan Newman and Martin Truex, Jr. Let's think about some reasons why this could be true:

  • They have a similar driving style.
  • They prefer the same types of tracks.
  • Their crew chiefs have a similar style of setup


Finally, our last chart is Pass Differential per race:
For those of you who don't know, the pass differential stat counts passes for position during green flag runs: it adds how many times a driver passed others, and subtracts how many times other drivers passed him. The pass differential number can end up being negative or positive by the end of a race.

Jimmie Johnson and Martin Truex, Jr.
Martin Truex Jr. has a high correlation with both Jimmie Johnson and with Tony Stewart. 

We know Stewart and Johnson are both champions, but what is it about Truex and his driving skill that gives him a similar relationship to these other two? Truex does not have the same team or equipment as Johnson or Stewart, so that can't be the reason. We know Truex doesn't win as often as Stewart and Johnson, but now we know from the data that Truex's profile of passing cars is very similar to the profiles of Stewart and Johnson.

The data suggests three theories:
  1. Truex is a very similar driver to these champions, more so than we realize.
  2. If Truex were in the same equipment as Johnson and Stewart, he could match their results.
  3. Perhaps Truex is a championship-caliber driver like Stewart and Johnson, and we will see him get that result in time, if he can benefit from good luck, fast equipment, and the right circumstances.
  4. Or email me your theories at 36races@gmail.com and I can throw some math at it for next time.

If you want to get crazy with correlation matrices, you could use them in multiple ways:

Is it the Car or the Driver? We can better answer this question by looking at how drivers within the same team are correlated with each other, and how these correlations shift when drivers change teams.

Hiring Drivers for Your Fantasy Team: Each week, do you want to load up on drivers that will perform similarly together (taking a risk they will all do badly), or find drivers that will hedge each other out (one driver's success can offset the failures of others)? Correlation matrices give you a way to attack that problem and customize your team.

Hiring Drivers for Your REAL Team: If you are a team owner, you can use driver correlations to see how drivers perform, better analyzing who has breakout potential. This works both for Cup free agents or minor-leaguers moving up the ranks. You can look beyond just their finishing position, and match up their performance with "benchmark" drivers who you would like them to emulate. You can also figure out which drivers might have a better fit or driving style that works with your equipment and setups.