It generalizes the regression model in the sense that a continuous trajectory of shape changes is estimated. This average progression is not estimated from the repeated observation of a single object (or "subject"), but from several distinct subjects longitudinally observed. Two central differences between the longitudinal atlas and regression models must be noted.
A long-term scenario of shape changes is estimated from short-term individual windows of observations. Taking the example of brain aging, three subjects observed from ages 50, 60, 70 to 60, 70, 80 respectively is enough to infer the average aging pattern over the full period 50 - 80. By opposition, the regression model would have required data from a single subject followed from age 50 to 80.
No explicit common time-line is required. Taking the example of brain decay under the progression of Alzheimer's disease, which may start developing at any age or any pace, the longitudinal atlas model will jointly estimate an average pattern of progression and individual time-warp functions that align individual time-line with the average one.
It generalizes the Bayesian atlas model in the sense that the distribution of geometrical differences between individuals is estimated in terms of an average plus parameters of variability (generalized mean-variance analysis). Two central differences between the longitudinal atlas and bayesian atlas models must be noted.
All modes of geometrical variability are not treated in the same manner:
time-correlated geometrical variability is fully captured by the average scenario previously-evoked,
the remaining geometrical variability is captured with an approach similar to the Bayesian atlas model.
Only a limited number of modes of this remaining "static" geometrical variability is actually captured, similarly to the principal geodesic analysis model (generalized independent component analysis).
Longitudinal atlas models should be initialized by the dedicated pipeline, and estimated with the MCMC-SAEM algorithm from large longitudinal data sets.
Estimating a longitudinal atlas is computationally much more costly than any other Deformetrica model. For large high-dimensional data sets, the estimation duration scales in weeks. Specific efforts are currently being deployed to accelerate this procedure.
The number-of-sources, which controls the number of geometrical modes of variability to capture in addition to the time-correlated variability. It is somewhat similar to the latent-space-dimension dimension parameters, dedicated to the principal geodesic analysis model. If not specified, the number-of-sources is arbitrarily set to 4.
The random-seed, which is used in this tutorial for reproducibility. Indeed, the MCMC-SAEM algorithm is a stochastic procedure.
The data set should be longitudinal, i.e. composed of several subjects observed at several visits each.
The number of visits per subject does not need to be fixed, and the visit ages do not need to be regular of aligned.
Figure: the considered longitudinal data set of 10 "starmen". The starman subjects have between 3 and 12 visits, with unaligned ages.
The shared dynamic movement across all subject is the subtle raise of the left arm.
The MCMC-SAEM optimization method should be selected. The sample-every-n-mcmc-iters parameter controls the number of MCMC-sampling transition kernels that are chained at each iteration. Its default value is 10.
Pre-processing: dedicated initialization pipeline
Initialize command line
Initializing the longitudinal atlas model before its estimation with the MCMC-SAEM algorithm is strongly advised. A dedicated initialization pipeline has been developed, and can be launched with the command: deformetrica initialize model.xml data_set.xml -p optimization_parameters.xml. For the proposed example examples/longitudinal_atlas/landmark/2d/starmen, this procedure takes a few minutes.
This initialization pipeline takes advantage of the other Deformetrica models to compute initial parameters that will be stored in a folder called data. As a secondary output, the initialized_model.xml file is created from the original model.xml. Temporary files are stored in a folder called preprocessing. The results of the last initializing step are stored in the subfolder preprocessing/5_longitudinal_atlas_with_gradient_ascent, and offer a good glimpse to the final outputs that can be hoped for. Their structure is presented in the next section.
Processing: estimation with the MCMC-SAEM algorithm
Estimate command line
Now initialized, the estimation of the longitudinal atlas model can be launched with the command: deformetrica estimate initialized_model.xml data_set.xml -p optimization_parameters.xml. For the proposed example examples/longitudinal_atlas/landmark/2d/starmen, this procedure takes about 20 minutes.
In Deformetrica, the MCMC-SAEM algorithm alternatively:
draws random samples of the individual parameters (time-shifts, accelerations, and sources) thanks to a MCMC procedure,
updates the scalar population parameters (reference time and time-shift, acceleration, and noise variance) with analytical formulas,
updates the high-dimensional population parameters (template, control points, momenta, modulation matrix) with a gradient descent.
Structure of the results
The results are available in the created output folder.
The submanifold of shapes best adapted to the data set in encoded by the control points and template parameters.
The estimated average progression is encoded by the momenta parameter. The geodesic flow files offer a convenient way to visualize this estimated trajectory.
The temporal variability of this progression is encoded by the reference time, acceleration and time-shift standard deviations parameters. Heuristically approximated individual parameters further specify how each subject compares to the average progression.
The acceleration encodes wether the individual sequence is slower (< 1) or faster (> 1) than the average.
The onset age encodes wether the individual sequence is earlier (< reference time) or later (> reference time) than the average.
The geometrical (or "spatial") variability of this progression is encoded by the modulation matrix parameter. Heuristically approximatedsource individual parameters further specify how each subject geometrically differs from the average progression.
Individual trajectories are seen as (time-reparametrized) geometric shifts of the average progression.
Individual geometric shifts are seen as the superposition of several modes of geometrical variability (of total number equal to the user-specified parameter number-of-sources).
The geometric mode files offer a convenient way to visualize those modes of geometrical variability.
The sources individual parameter gives the contribution of each mode for the considered subject.
The reconstruction files give a convenient way to visualize the heuristically estimated fit of the longitudinal model to the data set.
Figure: Estimated population-average scenario of geometrical changes with time. Note that the left arm movement is captured, when the static geometrical variability of the data set is averaged into a representative template.
Accelerating the estimation
Estimating a longitudinal model with the MCMC-SAEM algorithm can take a long time. We gather here hints on how to deal with this computational burden, distinguishing purely optimization tricks from adapted modeling approaches.
Consider relying on the <state-file> tag to re-start an interrupted computation from the last saved state. Consider regularly saving intermediary resulting, by manually setting the <save-every-n-iters> tag.
Consider using a multi-processing computing scheme thanks to the tag <number-of-processes>.
Consider tweaking the <cuda-mode> tag to make a maximal usage of available GPUs (while still avoiding memory overflows). See the related performance wiki section.
Consider tweaking the <sample-every-n-mcmc-iters> tag towards lower values (default is 10). This might however degrade the quality of the estimated parameters.
Consider lowering the <concentration-of-timepoints> tag (default value is 10), which controls the time resolution of the estimated average progression.
Consider lowering the <number-of-timepoints> tag (default value is 11), which controls the allowed complexity of the spatial trajectory shifting operation.
Consider augmenting the deformation <kernel-width> tag (no default value, must be specified by the user), which controls the characteristic scale of the geometrical variability to capture.
In the case of mesh data, consider using the "landmark" <deformable-object-type> (or <attachment-type>) if there is point-to-point correspondence. In the case of necessary "current" or "varifold" attachment types, consider augmenting the corresponding <kernel-width> tag (no default value, must be specified by the user), which controls the characteristic scale of the geometrical noise to eliminate.
Post-processing: longitudinal registration
The longitudinal atlas model is hierarchical: individual trajectories are seen as spatio-temporal perturbations of a population trajectory. The parameters of the longitudinal model split in two categories.
In one hand are the population parameters, also called fixed parameters. Those eight parameters can be further split according to their dimensionality.
Half are high-dimensional parameters: the template, the control points, the momenta and the modulation matrix.
Half are scalar parameters: the reference time, the acceleration, time-shift and noise standard deviations (or variances).
Those parameters are estimated by the MCMC-SAEM algorithm, in the so-called calibration phase.
In the other hand are the individual parameters, also called random parameters. The spatio-temporal transformation that warps the population-average trajectory of shape changes into some given individual progression is parametrized by a few scalar parameters.
Two parametrize the temporal warp: the acceleration (or "acceleration factor") and the onset age (or "time-shift").
A user-fixed number (by the <number-of-sources> tag) parametrize the spatial (or "geometrical") warp: the sources.
Those parameters are not estimated by the MCMC-SAEM algorithm, which only offer an approximation of their optimal value thanks to an internal heuristic. The longitudinal registration <model-type> described in this section is dedicated to their estimation. We call this operation the personalization phase. This functionality runs much faster than the core MCMC-SAEM-based estimation.
Finalize command line
A utility command line creates the finalized_model.xml file (see below) from the estimated longitudinal atlas output: deformetrica finalize model.xml. This operation is just writing an XML file, and is virtually instantaneous.
Note that the <model-type> tag has been set to "LongitudinalRegistration".
Estimate the longitudinal registration model(s)
The estimation of the longitudinal registration model, which actually consist in solving several independent sub-problems for each individual, can be launched with the command: deformetrica estimate finalized_model data_set.xml -o output_registration.
Note that we propose here to specify a new output folder (named "output_registration") in order to avoid overwriting the longitudinal atlas estimation results.
Structure of the results
The produced "output_registration" folder has the same structure than the previously-created "output" folder. Population parameters remained constant, when the individual ones have been estimated.
The fit of the longitudinal atlas model to the longitudinally-registered data set (here identical to the calibration data set) can finally be visually assessed.
Figure: Superimposed original data points (in black) and associated reconstructions (in red), estimated by the longitudinal registration model.