Tuesday, March 19, 2013

The Stability of NASCAR Stats: Which numbers are luck and which are real?

If you are trying to predict how a driver will do over time, and you only have past data to work with, how do you know which stats are going to be the most accurate predictors of future performance? What are the most stable factors year-in and year-out?

And conversely, how do you know which performance measures are the least helpful predictors? Which ones are most susceptible to noise, randomness, and luck?

By running a linear regression of all the major performance stats, focusing on NASCAR's "modern era" (since 1972), and only including drivers who started at least 25 races in consecutive years, we can calculate the slope parameter for each statistic, and can order them by their predictive power.

In this chart, we see all the major performance stats, ranked by their stability across consecutive seasons. Most stable at the top, least stable at the bottom. The most consistent stats from year to year are Average Start and Lead Lap Finishes, while the least consistent stats are Wins and Poles.

Here are some interesting conclusions we can infer from the chart:

1) The two measures of starting position are at the two extremes of the chart: Average starting position is the most stable measurement over time, but poles per year is the least stable stat. A driver's average starting position this year will be very close to their average starting position last year. But poles will vary widely from year to year, and it will be much tougher to predict that. What does that mean in racing terms? A driver's starting ability on average over the course of a season and a career is well-defined, a property of who that driver is and what their driving style is. They are who they are, and you see that year after year. Poles, however, are more about randomness: The margins are so close at the top of the qualifying leaderboard, that a lot of luck plays a factor in who gets the pole. As we have seen before, pole winners most often do not win races anyway, because the factors that go into winning a pole are generally unrelated to those involved in winning the race.

2) Winning races is the second hardest-to-predict measure. Any fan will know this is true, as win numbers can change drastically from year to year (Remember when Carl Edwards had 9 wins in 2008 and then 0 in 2009? Or when Mark Martin had 5 wins in 2009 after several years of 0?) There are many lucky winners (think about lucky fuel mileage gambles), and of course a plethora of drivers who unluckily lost races they "coulda, shoulda, woulda" won.

3) Crashes, failures and bad luck are a major factor in randomness. Notice that Laps Completed and Races Running at the Finish are near the bottom of the list. Both of these are related to the concept of keeping your car clean and getting it to the finish line in one piece. Think about crashes, engine failures, flat tires, and getting pulled into in accidents caused by others. Drivers can have a good year where everything goes their way, and a bad year where they seem to hit everything around them. Of course these stats are going to be hard to forecast, because effectively you are trying to predict how many accidents a driver will have, and this is very hard to do, when most of these events are out of their control.

Alright, so how can I use this table?
  • If you are in a fantasy league, think about which past stats are actually going to be the most useful for you to forecast future performance. Wins and poles don't really help you that much.
  • If you are in the media and discussing driver performance, perhaps Lead Lap Finishes is a stat to consider as something that can be repeated over time.
  • If you are an owner or sponsor looking to hire a new driver, remember to be careful when considering that driver's past performance statistics. Think about those stats where the driver is doing well: Are they repeatable over time (higher on the table), or perhaps just the result of some good luck (lower on the table)?
Readers, what else do you see in here?