top of page

Data correction shows "least action" principle, or humans are the best optimization engine.

Updated: Feb 4

The need for accurate data


As you probably know by this time, we've decided to create the best analysis software for trail running and we called it Montis, as it turned to be an Artificial Intelligence based Agent. What we mean by analysis is not just looking at data points to check speed or heart rate at a particular location, but inferring whether the data point at the location is a progress or regress compared to a previous one at different place or the expected performance.


That in its own requires accurate data much more granular than the activity averages, segments or even 1 km splits we see at similar analysis software in the field.


We argue that data correction is the foundation of all data analysis and while it is very important, it can bring a lot of risks as well, distorting data that was captured in a context which is not available at the point of correction.


To understand this better, let's take a reference case.


Reference case


While Garmin and others are doing a very good job at fusing data (read the sources of error below) from multiple sensors on the device which is local and temporal context, they don't give us the impression they work on the data when uploaded to their servers, check against global context and correct for best match. This is propagated to the different platforms where this data is uploaded to. Please read below with the filter that Garmin is one of the best in data fusion, hence in accuracy, so this is not at all a campaign against them. Even with their best intention to provide the most accurate data, it is still not good enough at this point for the analysis we need without further work on our side.


Below you can see an easy run I've done back in December at one of the local parks to help visualize the concept of data correction in the easiest way possible. The objective of the run was to hold a pace of 5:20-5:30 min/km with a variation of +/- 1% slopes on asphalt which does not alter significantly the efficiency of the run (measured in meter/heart beat) and was looking to match the lap times as well, doing 12 in total. The heart rate I needed was in the 150 bpm range +/- 5 bpm and just at the end I needed to go to 160 bpm to hold the pace which in the last km I overshot a bit to be 5:14 as the fastest one. I was in my worst shape ever in December, so this is a reminder to work less on Spectro.Life and get back my previous fitness :) .


The above parameters stand as the ground truth for the timing information and let' s don't consider the heart rate info up for debate at this point.


The first thing we can observe in the chart below is the elevation error that is coming from my Garmin Fenix 6 Pro which is mostly probably due to change in pressure, it was just after a rain, so we can see that even slight variations in weather can influence the elevation measurement which influences 3D distance obviously. We should note also that the 2-3 m elevation differences were picked up correctly every lap.


Now, the main reason for this picture is not the elevation but the speed and inherently the Grade Adjusted Pace (GAP) or effort as we call it, which is the speed on flat that is the metabolic equivalent of speed on slope. As you can see already from Garmin, the speed has a variation from 4:30 to 6;30 min/km which is a 1 minute deviation up and down from the 5:20-5:30 ground truth and this variation is inherited in Strava as well without correction. Obviously this is not good enough for serious analysis on instant data, but it can work well when we take km averages as the base. It is interesting to observe that the actual fastest pace of  5:14 at the end which was unintentional, is not reflected, instead the surges are showing up at the beginning and in the middle.

What is more disturbing than this is the inheritance of this error in the GAP calculation which slightly reduces it due to +/-1% slope and becomes a variation from 4:40 to 6:00, but obviously a short and steady metabolic effort should not be reflected in a variation of 1 minute and 20 seconds especially in ideal conditions with regards to technicality of the terrain.


Speed and elevation errors are picked up by the watch and used without correction in analysis software
Speed and elevation errors are picked up by the watch and used without correction in analysis software

Now to read a little bit more about the sources of error expand the panels below. Let us show you the kind of correction we apply to improve the trust in this data.


As you can see, the reconstructed efficiency which we call Speed Indexed Efficiency is almost a flat line (when de-trended), showing how the human body and mind takes exactly this "least action" principle described by Euler-Lagrange to optimize the energy consumption in order to reach its objectives. You can read more about this principle just below the sources of error once you expand the panel.


This optimization leads to regulating both demand and supply, which is speed and heart rate to match demand based on objective and adversities like technicality of terrain, weather conditions and others.


We use this principle along with fusing data we trust more to correct the data we trust less.


In the following chart the left axis represent the efficiency in meter/beat and also show the slope in the unit interval, so +/- 1% is +/- 0.01 while the right axis from left to right are the heart rate in bpm, speed and effort in m/s, speed index in unit interval and elevation in meters.


Reconstructed efficiency shows how human body regulates demand and supply to achieve optimal performance
Reconstructed efficiency shows how human body regulates demand and supply to achieve optimal performance

In the above chart we can see a 3% drop in Speed Index (read below by expanding the panel), a value of 0.97 which can not be explained by speed and heart rate changes according to the running model used. This is normal as I was using a benchmark model because my individualized one was even further based on my fitness condition and this 3% speed index drop can be attributed to that, nothing to do with the technicality of the terrain in this case.


Going further I'll explain how we use these principles for data correction and how you can utilize the outcome of the process as well for measuring progress in your training.


Read more about the topics:

Sources of errors in activity data

What is "least action" principle?

What is Speed Index?


We absolutely need to have this data correction as understanding performance on each slope is not a matter of averages as well as understanding maximum aerobic work that can be done by the runner to optimize his/her race performance is also affected quite much by the area under the speed indexed effort curve.


"All models are wrong, but some are useful"


This holds true for measured data as well, which inherently is more or less inaccurate. That does not hold us back to trust it. The only question is "how much" we trust it. The objective of data correction is to trust the data better at the end of the process compared to just before that.


In other words if we see from the data that: a + b = c and a = c - d, then we must conclude that b = d. In other words if we see distance data, timing data, speed data, we need to have an appropriate measure in a larger context where we can determine the validity of the data and use the context to adjust the data to improve our trust in it. Practically in activity data, we have a number of measurement points that have speed information, timestamp, distance and many others just to simplify it a bit. If we multiply the speed data with the difference in timestamps and add them all together, the result should yield very close to the cumulative distance that is measured, in other words the length of the track. The smaller this difference is at the end, the higher our trust in the instantaneous speed measurements.


Now there is a catch to this. The question arises whether stops during the track are considered or not by the watch manufacturer or generally by the source of the data. Based on what we see, most of the big players do consider stops, but some of the players (let's not give names) don't bring the instant speed low enough as we can consider it as a stop and we could see even errors as high as 5% between (1) and (2) when we add up speed * timedelta (1) = distance (2) . That is not small.


Hopefully you can realize by now that data measurement is much more about trust than about exact science and every data is corrected with other data in a way that makes sense, i.e. increases probability that the new data point has higher quality than the initial one.


So what we initially thought is that we can perform correction based on Kalman filters at one end guarded by drift from total distance based on GPS and on the other side based on the idea that efficiency variations are not abrupt in the span of 2-3 minutes, obviously taking into account all the cross verifications of the data points. This approach helped us improve the data and consecutively trust it more, but still did not answer all our hypotheses and quite important variations remained in the efficiency data which pushed us to search for more.


GPS distance and speed data correction based on Kalman filters
GPS distance and speed data correction based on Kalman filters

Then we opened up the physics manual and started using the good old energy conservation principles combined with the Lagrangian mechanics (see above "least action" principle) with the general case that we are not talking about a closed system, but an open system, one which can receive energy from the surroundings (runner ingests CHO) and can transfer losses to the outside environment (heat, friction, etc.).


As the running model is based on a normalized heart rate / speed curve, everything that can't be explained with heart rate / speed variation needs to come from an adversity factor. Yes, even the drift from the running model is an adversity factor, not just technicality of the terrain in this context.


This started to make a lot of sense since all the runners, all the runs, all the data from various watches and tracks as well as knowledge about weather and tack conditions all started to line up and show that indeed, the speed index corrected efficiency is the one whose variation is minimized during the run and the runner is doing that somehow unconsciously, based on "feelings". We can safely say that taking all these large number of parameters into consideration, the human body and mind is absolutely mind-blowing in terms of optimization capabilities.


Data correction exemplified by real races


Let's take a look at both the competitive amateur and elite levels.


First example is a 85 km race finished in 1st place by Karcsi from our team at Piros 85 Hungary.

We can see that the reconstructed (speed indexed) efficiency is not a straight line, but has a lot less variation compared to the efficiency based on effort and heart rate mostly in the 1.3 - 1.5 m/b range while the efficiency varies between 0.75 - 1.5 m/b where speed index is firmly in the 85-100% range, dropping only at one point to 65%. All the data has its trend restored as well.

Speed Indexed Effort is not displayed here for "simplicity" (if we can call a chart like this simple) but that is one of the most important outcomes in terms of Maximum Aerobic Work calculation.


Winning performance at Piros 85 by Karoly Csaki from the Spectro.Life Team
Winning performance at Piros 85 by Karoly Csaki from the Spectro.Life Team

We presented Montis with the winning performance by Raul Butaci at Transgrancanaria Classic 129 km, so it makes sense to present our data correction in the context of that performance as well. It is fascinating to see how after 40 km Raul switched gears and started to hunt down everybody ahead to win the race. By that time his speed indexed efficiency was in the range of 1.5-1.6 m/b, then he moved up to the 1.7-1.8 m/b range and even higher at the last downhill.


Winning performance at TGC 2024 by Raul Butaci
Winning performance at TGC 2024 by Raul Butaci

Implications beyond data correction


Using the speed index numbers one can not only plan better for a race, but can actually understand progress in terms of his adaptations during the training. For example someone who just transitioned from road running to trail running, should gradually see the speed indices growing as he adapts to the technicality of the terrain and improves his own technique, as well as someone who is coming out of a long base period during the winter has his aerobic fitness quite high, however may not be adapted well to the cooling demands of the spring period and the speed indices start to drop. He should be able to see them climb up again when the adaptation is achieved and ready to race.


The examples above could be obviously extrapolated to the regular training periods as well and trigger a new running model generation when the speed indices deviate and don't just show the obvious adversities like the technicality of the terrain.


If we look at the principle of how the body regulates supply and demand to reach optimal performance, then whatever performance KPI we come up with, we need to reflect them also under the light of energy conservation.


For example let's just take the example of a calculated run/walk optimal grade. That is the grade when walking the hill becomes metabolically more efficient than running it.

If this number was calibrated under mixed conditions, it makes sense that the grip will be higher on asphalt, the body will need less heart rate for same speed at the same slope, so it will instruct the "feeling" of the runner to go to a higher slope to match the new break-even point, the expected feeling. That is the optimization we are talking about, the body thinks in "feelings", not in meter/hour. Similarly when the grip drops like in the case of sand, rolling rocks etc. the same feeling is reached under lower slopes just as well as during heat when there is extra cooling required to match the same performance.


Now with all these shifts up and down does it still make sense to make calculations regarding thresholds, performance and all these moving targets? Absolutely, while laws of physics are not all the time as straightforward as Newton's F=m*a equation, the energy conservation and other important principles still apply and these state that one cannot spend more than his fitness allows combined with the energy he can successfully ingest. He can still try to use this energy in the most efficient way possible.


Thanks for staying with me till the end! 🙏 I'd love to hear your thoughts on the subject.


We'll be continuing with Vertical Ascent and Descent topic. Subscribe to our Facebook, Instagram pages to learn when the new article is 🚀


Happy 🏃‍♂️!

93 views0 comments

Recent Posts

See All

Comments


bottom of page