Beyond the wins, poles, podiums, points & glory ... race lap time analysis

ZakspeedYakspeed

NeverUnderestimateThePredictabilityOfStupidity
Valued Member
In another attempt to try to differentiate drivers and their performance from simply looking at wins, poles, podiums, points and glory, I have spent some time trying to analyze their performance from the perspective of their race lap times. What started out as a simple data collection exercise for my own personal F1 database gratification morphed into something like a wave of infected humans from 28 Days Later charging at me from all directions … several large EXCEL files and a slew of output firing off in all directions that my poor small brain couldn’t quite understand (not an unusual occurrence per the wife of ZakYak).

To set the scene … in simple terms:


Starting from Brazil 1984 at the now sadly defunct Jacarepagua circuit to last weekend’s Monza shootout I have collected race lap times for each driver in each race:

  • 30 seasons
  • 502 races
  • 533,089 completed laps
  • 514,536 completed laps used for analysis

Laps excluded from analysis are:

  • All Safety Car period laps
Every lap for every driver that was deemed under caution is excluded

  • Any "Safety Car" affected laps
I.e. by looking at the completed lap times in and around safety car periods you might see either one, two or several cars posting a lap time significantly higher than what would be considered a true race pace… that entire lap for all drivers has been excluded

  • Any lap time that is >45 seconds of the drivers median lap time for that race
To remove as many outliers as possible, without removing laps that could be considered to be a “normal pit-stop” lap over the last 30 seasons. One of the data sets that I do not have full confidence in is a complete list of pit-stop data, pit-stop times and the laps these pit-stops were taken. If we cannot accurately remove all pit-stop data, we can at least have a crack at normalizing it … whilst removing all the outliers … e.g. the 4 min lap limping around to the pits on a puncture or with a rearranged front wing, etc.

  • Any lap time set by cars that were Disqualified either during or after an event

  • Any lap time set by cars that were Excluded (from the result)


After the ‘cleaning’ above we arrive at a data set for each race that has taken out the ‘obvious’ non racing laps. We have complete race lap data for each driver who is able to finish the race, but we can also include data to account for “he was leading the race comfortably until Huub Rothengatter skidded across the track backwards at the Variante Alta and skewered him with his Zakspeed”.


With this data we can:

  • Calculate the median, mean and standard deviation for each race
  • Calculate the median, mean and standard deviation for each driver in each race
  • Calculate the number of laps faster or slower than the driver mean for each race
  • RANK each driver by median, mean and standard deviation for each race
  • COMPARE each drivers median, mean and standard deviation to the race median, mean and standard deviation
  • And a whole lot more stuff that my small brain is still trying to figure out …

So bearing in mind that there is a boatload of data to monkey with … and the time spent on this could well be directly proportionate to me working (literally) on Elbow’s “cocktail called grounds for divorce”… I want to start by posting some general stuff … to which I will start adding some interesting stuff over the next few weeks / months …

To start ... by season, the race winners ranking by Median, Mean and Standard Deviation ...
 

Attachments

  • ZY_Median_Mean_StdDev.xls
    66 KB · Views: 223
Last edited:
Here is a 2013 Season to date summary by driver ...

Two tabs in file ...

First has the driver listing, averages data to date, with rankings from best to worst (1 to 22) ... I have overlaid three different "ranking" calculations ... Rank 1, Rank 2 and Rank 3 ... you can monkey around with the % splits in the green shaded section... it will change the rankings below

Second sheet contains the 12 races to date and the rankings of the median, mean and standard deviation for each driver for each race ... if a driver bombed out on the first lap you will see a yellow box containing no data point ...
 

Attachments

  • ZY_2013 Season_Ranking.xls
    68.5 KB · Views: 194
I suppose what would be interesting is converting all the race points from 1950 to the current points arrangement, so we could more accurately compare different drivers. But that would be a massive job - maybe a CTA project?
 
I believe people on CTA have already done that, at least I remember someone working out which championship winners would have changed under different points systems.
 
Fantastic task to take on. I wish you all the best with it!

I have long intended to do some analysis of the "consistency" of drivers by controlling for fuel, tyre wear and trying to see how closely their lap time progression matches a theoretical optimum - a least-squares regression line or somesuch. Never quite find the time, however, but this dataset is the perfect starting point.
 
Thanks Galahad ... it certainly grew larger than the Kraken after I started ... every time I open up the files I think of something else to measure / compare / contrast ...

Now if my lottery numbers come up this weekend ... :whistle: I might find some extra time during the week ...
 
A couple of graphs looking at some historical trends ...

Looking at "racing laps" ... and over the course of each season how many laps were faster than the drivers mean (average) at each race and how many were slower than the drivers mean (average)at each race...

The red shading indicated refueling being banned ... the green indicates refueling as active ... if you look at the last four seasons ... 2010 to 2013 ... the step change down from 2009 is not unexpected i.e. not that far from the historical average ... a solid improvement in 2011 ... and then perhaps the effect of the Pirelli tyre construction / tyre life / deg, etc ...

ZY_1984to2013_FasterSlower.jpg


The next really hammers home how reliability has changed over the past 30 seasons ... to reach the 100% line every car in every race would have to complete every lap ... either by racing ... or by safety car ... with no disqualifications, exclusions, etc ... as close to improbable as Mark Webber unretiring and driving a third Ferrari next year ...

The gap you start to see between racing laps and completed laps from 1994 highlights the impact of the safety car on F1 ... but for mine the sheer incredible increase in reliabilty ... coupled with a reduction in field/team size drives the "worm" onwards and upwards to the top right ... what looks like a strange occurence in 1984 is the Tyrrell Racing Organizations DQ from 1984 for a variety of offences ... suggested by some as politically motivated ... nevertheless a large number of completed laps were not tagged as racing laps ...

ZY_1984to2013_Avail,Completed,RacingLaps.jpg
 
Last edited:
The second graph is really nice, roughly speaking it looks like a 10% improvement in reliability per decade (or 1% per year I guess).

The first plot I'm struggling to understand the meaning of. It shows the percentage of laps faster than the mean, so if the lap times were normally distributed you would expect it to come out as 50% right? Assuming constant tyre performance throughout the race with no refuelling I think 50% is what you'd expect, with lap times improving by a constant amount each lap due to fuel burn off.

All the data points are above 50%, I guess this is due to having some slow lap times skewing the mean. Are they pit stops or something or is there a different interpretation in terms of tyre wear, fuel loads etc, that's beyond me?
 
sushifiesta ... I have not taken out the pit stop laps (lack of reliable & complete data over the period)... the "slower laps" would predominantly include pit stop laps and each races' first lap (if it is considered a racing lap) ... the frequency of pitstops per race could also affect the calc ... the current spec Pirelli which have to be nursed could also increase the number of "slower than mean" laps in the last couple of seasons ... agree with you ... there are a lot of variables ... hard to pinpoint ...

Have attached a graph by sum of laps faster / slower by year (the data behind the stacked column above) ... maybe this better expresses the past few seasons ... maybe not ...

ZY_1984to2013_FasterSlower_LapSum.jpg
 
ZY_2013 Median Lap time - MF Teams.jpg
ZY_2013 Median Lap time - FR Teams.jpg



For the 2013 season ... comparing the "team" (i.e. average both drivers median lap times for each race) versus the optimal median lap time for each race ... expressed in percentages (%)

One for the front running teams ... another for the midfield teams ...

=> From Montreal ... RBR have only been headed twice ... Nurburgring and Hungaroring by Lotus ...
=> The Lotus (in)consistency is quite telling when compared to the the RBR consistency ...
=> The gaps from RBR to the next best for the last three races underlying what we see on TV (what a surprise eh!) ...
=> McLaren up to Montreal matches the Whitmarst EKG as Checo bangs wheels with JB ...
=> Three winners whose "team" was not the fastest that race ... all non RBR ...
=> Kimi @ Albert Park (3rd behind Ferrari & RBR ... one less tyre stop)
=> Nico @ Silverstone (2nd to RBR ... safety cars and exploding tyres ... )
=> Lewis @ Hungaroring (a distant 3rd behind Lotus and RBR ... he simply drove the wheels of that car that day)


=> Force India pulling themselves back up ...
=> Williams haven't always been the caboose of the midfield runners ... but that doesn't take into account the Maldonado Imperative ...
 
Can you explain what the optimum median lap time in the second plot above is (percentage for the front runners)? I thought it would be the fastest median lap time for that race expressed as 100% but then shouldn't the point for the team with the best median time at each race lie on the optimum line? The optimum time in the last graph I understand because there is one point on the optimum line for each race.

EDIT: Oh, is that optimal line for the fastest driver rather than fastest team?
 
Last edited:
Can you explain what the optimum median lap time in the second plot above is (percentage for the front runners)? I thought it would be the fastest median lap time for that race expressed as 100% but then shouldn't the point for the team with the best median time at each race lie on the optimum line? The optimum time in the last graph I understand because there is one point on the optimum line for each race.

EDIT: Oh, is that optimal line for the fastest driver rather than fastest team?

In the post with the one chart above... the optimal is the "team" with the fastest median lap time ... eg Singapore ... SV of RBR ... expressed as 0.000 seconds ...
 
:whistle:And after reading your question ... ahem ... your EDIT is correct ... the optimal in plot with the % for front runners and mid-fielders is the driver fastest median ... not team ...
 
to mine ... if the trend for the front runners for the last three races holds ... or even stabilizes some ... then my money is on 5 SV wins ... and MW taking down his 3rd Brazilian GP by 1.5 seconds after SV had to manage KERS issues, missing gears, 4 'errant' cylinders and a broken suspension to "limp it" home ...
 
Top stuff zak :thumbsup:

We had to do a lot of adjustments and filtering when we did the tyre analysis a couple of years ago. With pitstops, we had the Pirelli data by then, along with all the fia PDFs so it was easy, but we could also find pistops by finding two "off laps" next to each other. Sometimes the inlap is noticeably slower, sometimes it is the outlap, depending on where the timing beam was. Always though, there are two laps which are different from those around them. You can write the formula to find these, I'm sure! You are obviously an excel junkie too :)

For this type of analysis though, you might want to think of median lap times rather than mean. I think what you are trying to find is "who was faster" taking out incidents that prevent being fast always equalling a win. Using median or percentiles might give you a clearer view and filter all sorts of other oddities too (including pitstops).

mjo was asking about converting championship points from one era to the next. That should be quite possible if you have actual finishing positions using an index(match()) formula. Not sure if that is really your direction though, zak?
 
:embarrassed:jez101

Excel junkie ...yes ....sad but true!!

Agreed on the pitstops and lap time delta's that spring from the data ... when I was looking at how to incorporate the pitstops with the older data, you can see what clearly look like the pit stops for each driver ... but I have this thing about being more certain than not ... that said I might run a cut of the data and drop my ">45 seconds" exclusion to "> 10 seconds-ish" exclusion to try to clean all pitstop data out ... but I would also need to tag the wet/dry and/or dry/wet races which would throw it off ...

I agree re. medians ... they are so much more descriptive than means ... although why spoil a vacant set of columns when there are just some easy formula's to run over the data... you are correct on one of the main purposes ... it is to try to decipher who is fast / good without using points / poles / wins / "the car you are driving" ... there are a number of different ways to do this ... I am trying to work through some options and variabilities when I get the time ... there are some good examples to test to try to compare and contrast with a "relative" performance picture of the car / field ... eg. Senna in the Toleman in 1984 ...

the points conversion exercise that mjo raised ... not the direction of this exercise ... I think I have something with each race tagged for about five different points regimes ... but I have not run any WDC calc's ... if time allows I will have a look ...
 
Back
Top Bottom