I have recently taken an interest in statistical models of various sorts, where the statistics can be in terms of model anthropometry or in terms of the tasks performed by the model. Two very gifted students of mine, Brian Vendelbo Kloster and Kristoffer Iversen, developed a statistical model of running. Before I write about that model, I would like to give you some background for the whole idea.

## CAE

I perceive musculoskeletal modelling as a Computer-Aided Engineering (CAE) tool. More prominent representatives of CAE technology are finite element analysis for solid mechanics or heat transmission, or finite volume models of flow problems. These technologies are used to create virtual models of products before the products exist, and they have enabled much of the technological progress we have seen in recent decades. Mobile phones, modern cars and planes, and large wind turbines are just a few examples of products that would not exist in the absence of CAE.

So, what’s wrong with musculoskeletal modelling? Only the fact that most of our models require experimental input, usually in the form of motion capture data. If I want to model a complex motion like a baseball pitch or ingress into a car, then I need to feed measured motion data into my model to make it move realistically.

We have good interfaces for importing mocap data, so what’s the problem?

Well, if the model needs experimental input, then it can really only model something that happened in the past, i.e. the experiment. And the whole point of CAE tools is to create models of virtual products or situations, i.e. things that have not necessarily happened yet. It is fair to say that models are only real CAE models if they can predict the future.

## ADL models

This is where a new class of musculoskeletal models comes into the picture. These models are called ADL Models, where ADL means Activity of Daily Living and can be any recognizable movement or working task that humans perform. The idea is that, if the model already knows in general how to perform the task it was developed for, then it needs only a little more input to do the task in a certain way, and this input could even be dependent on other circumstances or statistically varying within a range.

It is really much easier to explain if we use an example. Let us look at the running model that Kristoffer and Brian developed.

## Running model

Watch any crowd of running people and you will quickly realize that different people run in very different ways. The running style also depends much on whether we are sprinting or running a Marathon. Despite these differences, running is a clearly distinguishable motion and we can recognize it easily when we see it. So there might be more similarity than difference to the various styles. The aforementioned two students and I decided to create an ADL model of running.

Running is also a complex motion, so it is not going to happen as a model unless we have some mocap data to begin with. Brian and Kristoffer collected 143 C3D files from various sources, and finally 90 of them turned out to produce reasonable running models. Some of the problems with the remaining models were too much marker dropout or too little of the running cycle recorded.

We then processed all of the models through AnyBody, resulting in the following:

- Anthropometric data for each subject, i.e. lengths of the segments. This comes automatically from the system’s parameter identification when it processes the marker data.
- Anatomical joint angle variations for the running cycle for each subject.

Now we could recreate each subject’s anthropometry and running pattern in the system and proceed to do some statistics on the motion patterns. We initially thought that it would not be too hard, but it turned out that there were more steps to the process than we had imagined.

## Data reduction and fixing

In the true spirit of ADL models, we aimed to make a parametric model of running, so that we can recreate all sorts of running in a single model. Just one single running model is driven by thousands and thousands of numbers, so we need a vast amount of data reduction. We are going to do this by principal component analysis (PCA) but, in such cases, it is always wise to reduce complexity by smart decisions from the beginning.

The first such decision was to reduce the complexity of each joint angle variation by approximating it with an analytical function using a small number of parameters. Fourier expansions are the obvious choice because running is a cyclic motion. It turned out that all joint angle movements could be approximated precisely by just a small number of terms in the row, maximally 5, and in most cases much less. So each joint angle movement was now represented by less than a handful of numbers.

The transfer of data to Fourier rows carries some additional opportunities with it. Several of the macap trials contained less than a full running cycle, and the trials came from different labs with the subjects running in different coordinate systems. Also, some were running on treadmills, and some were running overground. With the Fourier rows, it is relatively easy to transfer all subjects into the same coordinate system, make the motions symmetrical between right and left sides, and convert all of them to be treadmill running. Of course, this means that the data set does not allow for investigation of asymmetry in running. Finally, we made sure that all movement functions for each trial had the same basic frequency. All of this process we refer to as data fixing.

If you are choking on your coffee now because you think we are messing too much with empirical data, then please remember that the point of this exercise is not to reproduce how any particular person is running, but rather to obtain a data set that spans the parameter space of running.

## Ensuring foot contact

We now have parametric models of different people with different sizes running in different ways. For the further use of the model it is important to make sure that each model obtains proper ground contact with the feet. This might not automatically be the case in a parametric model, because the model is driven from the pelvis and outwards. If we, for instance, make the model shorter, then the feet might not reach the ground.

So we recorded the foot motions for each subject, performed another curve fit, and parameterized these curves such that the feet would always touch the ground in the stance phase.

## PCA

We now have a big table in which each row represents a running trial, and each column is a Fourier coefficient for the trial. The running style might actually depend on anthropometry; it is not unreasonable to suspect that people with longer legs tend to take longer steps. So we added to each trial the segment dimensions of the subject in additional rows.

Despite all the reductions we were left with 197 parameters describing the running trials.We could go ahead and start playing around with each of those parameters to see how they influence the model. However, this would not be statistically sound for a couple of related reasons:

- There is no way that 90 trials can adequately span a space of 197 parameters. We would need many more trials if we wanted the trial space to support 197 uncorrelated parameters.
- The parameters are statistically correlated with each other. For instance, the running speed and step length are known to correlate with the elevation of the foot in the forward swing. So random variations of parameters are likely to create absurd motions that do not exist in reality.

Principal Component Analysis is the go-to method to figure out how many independent parameters we need to describe the running motion. So we ran the table of trial parameters through PCA and found that the first three components accounted for 50% of the variance in the data set, and 90% of the variance could be explained by just 12 components. This is illustrated in the figure below.

Let me briefly explain the nature of PCA to those not familiar with the technique: Each of the principal components is a vector of the original parameters; it designates a principal direction in the data set. The principal components are uncorrelated, so we can vary each one independently, i.e. travel along its direction in the parameter space. PCA also tells us how far it is reasonable to vary each vector, because it gives us the standard deviations in each component direction.

Obviously, the first one is the more interesting in the sense that it accounts for almost 30% of the variation. Let us begin the exploration by taking a look at the average running pattern. This pattern is found exactly in the centre of the parameter space of the running trials. I think you will agree that the analysis has reproduced what appears to be a mainstream running motion.

We now take the first principal component and displace it by two standard deviations in the positive direction. This seems to produce a running pattern that much less intensive than the average. This guy or girl is really jogging.

As expected, changing the first principal component two standard deviations in the opposite direction creates a fast, intensive running motion.

We can compare the slow and the fast running by overlaying a couple of keyframes, First we look at the stride:

…and then at the elevation of the heel in the forward swing:

We can see that, as expected, the strides are longer and the heel elevation is higher for fast running. There are also some surprising elements that may indicate that we have to adjust the data processing a bit. It looks like none of the models fully extend the knee. This could be due to a necessary adjustment of the assumed marker locations on the models and would have to be investigated further.

## Outlook

We still have to do a lot of data mining left to figure out the physiological significance of the principal components. There is also much work remaining on automation of the data processing. Ideally, we want to create a C3D database of running trials that we can just add new trials to, and the whole processing is repeated automatically. Right now, the curve fitting and coordinate system transformations still need some manual intervention.

The applications of models like these are almost endless. With the parameter space we could:

- Identify plausible full running patterns for individuals about whom we only know a few things like their size and stride length.
- Add kinetics in the form of ground reaction force prediction, which we know that we can do reliably in AnyBody.
- Compute muscle and joint loads as a function of virtual running styles.
- Offer modellers the opportunity to investigate running without experimental input and ask the model what-if questions.