Thursday, February 28, 2013

Phoenix Preview Video

Here I am on Bloomberg Sports with Jamal Salmon discussing:

1) Yesterday's post about Danica Patrick
2) a preview of this weekend's Phoenix race (with a bonus chart below)

Wednesday, February 27, 2013

Measuring Danica Patrick after 59 Races

Question & Data
Let's try to assess Danica's Sprint Cup prospects by comparing the start of her career with other notable drivers. Which drivers have had performances that most closely resemble hers?

She has run very few Cup races, so let's use her much larger Nationwide dataset to form our predictions. Danica has competed in 59 Nationwide races. We can compare that to the first 59 Nationwide races run by several other drivers. This will help us forecast what to expect from her.

As a comparison set, let's consider the entire top 15 in 2012 Sprint Cup points, plus 5 other drivers with varying success (such as Regan Smith and Josh Wise) and IndyCar background (like Sam Hornish).

The Chart
Figure 1
What Does This Chart Mean?
Figure 1 is a box-and-whisker plot of the performance of 21 drivers after their first 59 Nationwide races. Each driver's median finish is the little red line. Half of a driver's finishes fall inside the blue box. A driver's best finish is the right-most point on the chart.

The chart is sorted so the drivers with the best median finish are at top. Notice how the little red line moves to the right as we go down the driver list. Out of 59 races, that red line is their 30th best finish (right in the middle of 59 races).

The Good Sign for Danica
Danica's median finish is 19th, which compares poorly with the group overall, but is in similar territory with Kasey Kahne and Tony Stewart, who became stars.

Remember that sometimes the median finish can be misleading, as you see Sam Hornish and David Stremme up reasonably high on the chart. Obviously they were not able to convert that success into productive Cup careers.

According to this chart, Danica has her work cut out to be a true star, but the examples set by Stewart and Kahne (former open-wheelers like Danica), show that there are potential paths to success.

The Concerning Sign for Danica
Past research (to be posted later) suggests the best predictor of future success is the ability win races early in one's career (not top 5s or top 10s, but wins only). It's easier for a driver to build consistency later if they have proven their ability to win early on.

Let's consider that point when looking at the best finish for these same drivers in their first 59 Nationwide races:

1 Brad Keselowski
1 Carl Edwards
1 Clint Bowyer
1 Dale Earnhardt, Jr.
1 Denny Hamlin
1 Greg Biffle
1 Jeff Gordon
1 Jimmie Johnson
1 Kasey Kahne
1 Kevin Harvick
1 Kyle Busch
1 Martin Truex, Jr.
1 Matt Kenseth
1 Ryan Newman
1 Sam Hornish, Jr.
1 Tony Stewart
2 David Stremme
3 David Ragan
4 Danica Patrick
5 Josh Wise
13 Regan Smith

Notice the big difference here in drivers who won a race compared to the drivers who didn't. The drivers in bold (Stremme, Ragan, Wise and Smith) have not made an impact in Cup. The concern for Patrick is that without a single win in her first 59 Nationwide races, she could find herself with unimpressive Cup results, similar to these four drivers.

When you look at the two main factors above: median finish and early-career wins, both are stacked against Danica, statistically suggesting she will not be a future Chase-caliber driver. Hopefully she can prove this analysis wrong. If she finds a way to on-track success, she will have created a new path that past statistics would not have expected.

Saturday, February 23, 2013

Video of Daytona 500 predictions

Here I am discussing the Daytona 500 with Bloomberg Sports Host Rob Shaw.

Wednesday, February 20, 2013

Creating a Formula for "Expected Wins"...or How Brad Keselowski Won the Championship

Guess who is the all-time leader in statistically winning more than expected?

Your answer: 2012 Champion Brad Keselowski.

Of every modern driver, he has won more races than his laps led performance would predict. Consider his very first win at Talladega: he only led one lap in that race: the final one. It was his first career lap led, and it got him a race win. That's the most extreme way of outperforming expectations.

The Question
"How many race wins should I expect from any given driver?" Based on how a driver performs during a race, what expectation should we place on them being able to win the race? And can we quantify this over an entire career? Could I use this number to help better predict long-term performance, and measure who is over-performing or under-performing?

The Inspiration
Bill James's "Pythagorean expectation", originally applied to baseball, is a way to predict how many wins a team would get based on how many runs it scored vs. how many runs it gave up. This equation was later modified for sports like football and basketball.

To update this approach from stick-and-ball sports (where each game has one winner and one loser), to racing (where there is only one winner but multiple losers), we have to make a couple tweaks.

The Racing Formula
Expected Win Percentage = Laps Led / Laps Competed

The idea here is that if a driver leads 5% of their career laps, then we expect them to win 5% of their races. If you take this example to the extreme, a driver who led 0 laps would win 0 races, and a driver who leads 100% of their laps would win 100% of their races.

When we test this formula going back nearly 50 years to all race winners who had a minimum of 25 career starts, we get a very good r-squared of 83% (across 93 qualifying drivers).
In the chart above, points above the red line mean these drivers won more races than expected. Drivers below the red line won fewer races than their laps led would have suggested. These are drivers that led a ton of laps but didn't have the right stuff at the end of the race.

The Implications
Drivers who are able to win more races than their laps led expectation are finding ways to win  without dominating races. They might just lead the final few laps, due to smart strategy, being patient, working with their crews, or not burning up their equipment for unnecessarily hard racing earlier in the race.

Drivers who lead many laps but don't win as many races are too fast, too furious. They have the speed but don't know how to consistently convert it into an actual win.

People should consider this wins expectation statistic when looking at the drivers who can make the most out of nothing. The over-performing drivers might be better picks for fantasy racing, as they find ways to win when others burn out. In the actual (non-fantasy) racing business, these drivers might be better picks for owners looking to hire new talent, since these drivers can help make your team better rather than burn out your equipment. Additionally, you could measure crew chief performance by calculating their rate at squeezing unexpected wins, no matter who drives for them.

Recent examples
As I mentioned above, Brad Keselowski is the all-time leader in winning more than expected. See Table 1 below.

We also see Jimmie Johnson and Kevin Harvick on that list, two more drivers who seem to have a knack for coming out of nowhere to get wins.

And on the flip side, Kyle Busch is the active driver who has most underperformed his in-race abilities.  This is no surprise to anybody who watches the races: he dominates the early and middle part of races, but so often will lose at the end.

Table 1: All-time best and worst drivers at winning races vs laps expectation

Friday, February 8, 2013

How important is a Pole Position really?

After a hiatus, I am back at 36 Races with a fresh web look, and some new data to look at for the 2013 season.

To start things off right, let's talk about the importance of starting positions, and how relevant they are to finishing position.

I looked at every race in NASCAR Sprint Cup History that had 43 cars (remember that in the old days you didn't always have 43 cars in a race), and plotted every start / finish relationship.

Most importantly, I split it up by each starting position.

For example, here is the finish for every driver who started first.

Not surprisingly, if you start first, your most likely finish is first.  And then second, then third, then fourth, then fifth. There is a nice curve that defines the top 10.  

But look what happens after that.

The chances of finishing between 10th and 16th are equal.  You see the flat red line there.

And after 16th place, the line drops in half, with another straight line from 16th all the way to last place. 

This is surprising, right?  I am sure you didn't expect that.  I certainly didn't.

I was expecting a nice even curve all the way down, not a short curve followed by straight lines.

What does this tell us?  That if a pole-winner can't finish in the top 15, their chance of finishing is completely random, all positions 16-43 are equally likely.

Think about what this means for your fantasy racing leagues, and for media stories about driver performance.  What is really the value of a pole starting place?  It is definitely correlated with top 10 and top 15 finishes, but beyond that it means nothing.  So the chance of finishing 16th or 26th or 36th or 43rd are all the same.

And the chances of finishing 10th or 15th are the same.

Are finishing positions much, much more random than we think?

We'll explore more start/finish relationships in upcoming posts. For a preview, here is the finish histogram for every starting position, 1-43:

email any comments to