neuroelf_methods

# Methods implemented in NeuroElf

Whenever a program is used for data analysis, it is important for the community at large to understand what algorithms were used in the analysis. And while NeuroElf is mostly written to make algorithms accessible (user friendliness aspect), it is equally relevant to ascertain that the methods implemented in any program have been accepted by the scientific community as “useful and reliable” (to achieve the intended goal) and are (as much as possible) free of errors, both when it comes to potential flaws in the algorithm as well as its specific implementation in the given program.

As an example, initially when using the alphasim button (NeuroElf GUI access to the `alphasim.m` function) the GUI would demand user input for the estimated smoothness of the data, and as a default value `6mm` was presented to the user. This choice (of default value) was motivated by the fact that, at the lab where I work, the smoothing operation during the preprocessing stage would be configured with a 6mm Gaussian kernel. However, the correct number to use ought to be an estimate of the spatial smoothness of the residual, because that determines how likely it is that, by chance, a cluster of a given size will be encountered in a statistical map (at any given uncorrected threshold), and this issue has since been addressed!

## List of methods (overview)

The following list gives an overview on what methods of analysis and parameter estimation are implemented in NeuroElf (as far as they exceed basic operations, such as for example plain averaging across a dimension, or auxiliary functions that are used for string manipulation, file in-/output, or extended array operations, etc.):

### Cluster size threshold estimation (alphasim)

Cluster size threshold estimation is a method that can be used to account for the fact that a regular whole-brain map is made up of multiple (partially) independent tests. One common way is to simply adapt the statistical threshold by dividing the desired false-positive rate (i.e. typically 5 per cent = 0.05) by (an estimate of) the number of independent tests. However, this can be too stringent in some cases where larger swaths of cortex (neurocomputational network nodes) respond to an experimental manipulation below the then required detection threshold. Instead of ensuring significance of results solely by applying a voxel-wise corrected statistical threshold it is possible to estimate how large clusters are, given the smoothness of the residual, that appear in a given search space at random. I.e. the alpha-rate (false positives among performed tests) can be estimated by simulating statistical maps of the desired kind and then selecting the appropriate cluster size threshold to ensure that at most 5 per cent of maps (with the residual exhibiting the same smoothness) would show a false positive cluster. The resulting pair of uncorrected statistical threshold and cluster size threshold together then correct a whole-brain map to a family-wise-error corrected threshold of desired strength (again usually 0.05). This algorithm is

• implemented in function `alphasim.m`
• accepts a mask (sub-space specification)
• can be applied to surface statistics (given the mesh vertices and topology, as well as an estimate of the smoothness)
• allows to estimate the cluster size threshold for fully independent components of a conjunction analysis
• as a still experimental feature allows to apply a shift in the Z-distribution to account for shifts in the observed distribution of a statistical map (e.g. by-chance global “signal” in a covariate regression)

### Cluster table generation

Cluster tables are often presented in publications describing analyses where whole-brain mapping was performed, i.e. the attempt in localizing the spatial nodes within cortex that subserve a specific function. This function is

• implemented in a combination of an M-file, `clustervol.m` and a compiled MEX-file, coded in `clustercoordsc.c`
• whereas the M-file provides a command-line interface with rich options for output formatting, converting coordinates, thresholding volumes, etc.
• and the C/MEX-file provides the actual clustering of the binary (thresholded and masked) volume into separate spatial nodes

Once a (thresholded) map has been segregated into separate volumes (such that voxels of different clusters do not “touch” voxels of another cluster), clusters of considerable size (e.g. more than 100 voxels) sometimes exhibit “local maxima”, i.e. the spatial gradient becomes positive again from the overall maximum outwards after being negative in the beginning. To detect this, a 3D watershed algorithm has been implemented in the function `splitclustercoords.m`.

### Conjunction analysis (minimum t-statistic)

A conjunction analysis can be informative when, across the brain, the overlap of two statistical tests is of interest. The most stringent test that can be applied is that of requiring that, in each considered voxel, both tests must be significant at the desired level. This functionality is

• implemented in the function `conjval` for statistics of the same kind and with the same D.F. parameter, i.e. higher value means greater significance
• implemented in the function `conjvalp` for p-values (and possibly other statistics for which lower values mean greater significance; also accepts negative values)

### Mediation analysis

Mediation analysis as a whole can be described as the estimation (and test) of separate path coefficients, a and b, as well as their product, a*b, such that the “transmission” of an existing effect between an indepedent/explanatory variable, X, and an outcome variable, Y, is accomplished via one or several mediators, Mi. The analysis includes a test for significance of the a*b product term (as well as the individual path coefficients), and also allows to specify covariates. It is

• implemented in function `mediationpset.m`, where the pset indicates that the function returns path coefficients (p), standard errors (se), and t-statistics (t)
• options are: a*b product testing via bootstrapping or Sobel test, and robust regression
• supports multi-dimensionaging) data for X, M, and Y

An example would be, on the level of a between-subject effect, that a randomly assigned condition (X, e.g. strategy to apply to stimuli) has an effect on outcome (Y, e.g. appetite to a specific type of stimulus or difference in appetite to two kinds of stimuli) via a specific brain region (or network of regions) that work/s as a mediator/s (Mi, e.g. pre-frontal control regions). For a within-subjects design, a test could be whether, on any given trial, the response in pre-frontal cortex during an instructional cue (strategy stimulus) has an effect on outcome (self-reported craving for depicted food) via another brain region. In that case, either X (which brain regions has an influence on the “craving center” of the brain) or M (which brain region is influenced by the “control region” of the brain) could be “searched for”…

### Multi-level kernel density analysis (MKDA / meta analysis)

Multi-level kernel density analysis is trying to determine whether reported “peak coordinates” in previously published papers (given a selection criterion, such as publications concerned with a specific psychological construct, e.g. fear or working memory) occur in specific spatial locations (spatial specificity) significantly more often than warranted by chance, as a means to pool several publications to reduce the influence of a single publication on the “knowledge” of spatial distributions of activation patterns. It is

### Ordinary least-squares (OLS) regression

Ordinary least-squares (OLS) regression is the most generic way of applying the General Linear Model (GLM) so as to estimate “effect sizes”. Given the different applications, there are several functions implementing forms of this regression:

• the most general implementation is done in the `calcbetas.m` function
• to assess the significance of the regression (single beta or computed contrasts), the `glmtstat.m` function must be used
• a special implementation is contained in the `rbalign.m (rigid-body alignment)` function that uses the GLM framework to estimate motion parameters

An additional small number of function files also perform some flavor of linear regression, but those are not applied to functional imaging data (e.g. the function `regress_coords.m` can be used to determine the transformation required to minimize the error between two sets of coordinates after a rigid-body transform).

### Robust regression

Robust regression, in NeuroElf, is the estimation of regression parameters using an iteratively-reweighted-least-squares approach where outliers are “detected” using the bi-square weighting function. It is

• implemented for a single univariate regression (e.g. a time course, T, regressed on a design matrix, X) in function `fitrobustbisquare.m`
• implemented for a common design matrix (X) on mass-univariate data (e.g. fMRI imaging on the first or second level) in function `fitrobustbisquare_img.m`
• is used in the GLM computation routine for first-level data (MDM::ComputeGLM) as well as from the NeuroElf GUI's contrast manager functionality when robust regression is selected
• implemented for individual design matrices (Xi, where the third dimension is the number of cases) on data (e.g. for a whole-brain robust mediation) in function `fitrobustbisquare_multi.m`
• this is used by the NeuroElf GUI's mediation interface when robust regression is selected
• after the regression, to compute t-statistics from the output (beta values and sample weights), the function `robustt.m` can be used (correcting for the loss in degrees of freedom)
• a special case is for when, in a correlation, both “dependent” and “independent” variables may contain outliers, in which case `robcorrcoef.m` should be used, which uses both as the explanatory variable as quick check
• and to compare means between groups, the two simplified functions `robustnsamplet.m` and `robustnsamplet_img.m` are available as well 