Scripting of the processing stream (example)

Motivation

While for educational purposes it is quite useful to perform steps more manually, on a daily basis it is much more convenient (and less error prone) to have one or several scripts do the “work” and only take care of their configuration.

Requirements

For this example, the following programs must be installed and information available:

bash, find, mkdir, mv, rsync, ssh, tar (come with most OSes that have a terminal or shell)
Matlab v7.0 or higher, SPM5/SPM8, and NeuroElf v0.9a or higher
location of subject data at the scanning facility (server)
local folder and knowledge about desired structure
onsets in a tabular structure where one column codes the subject and negative differences between onset times code the run separation (for an example, see below)

The scripts presented on this page can be downloaded (click on the filename in the tab above the script), and for bash scripts, please use the following command to make them executable:

chmod 755 SCANUnit_TASK_script.bash

To allow all users to execute the scripts, simply place them into a non-personal folder (e.g. /usr/local/bin).

For Matlab scripts, simply make sure they are located on your Matlab path.

Sub components

As part of this scripted approach requires the shell, there are multiple scripts. Part of the data handling is performed by shell commands (data retrieval, archiving) whereas the processing is implemented via SPM's job manager. The following steps are covered:

data import (from a remote SSH-accessible server)
correct renaming and creation of folders (into existing study structure)
archiving (possibly configured as a regularly executed job via crontab)
fMRI quality assessment (per run)
preprocessing (SPM5 or SPM8)
conversion into BrainVoyager QX's VTC format
semi-automatic logfile parsing (PRT generation)
computing a RFX-GLM of all available subjects (upon request)

Data import

SCANUnit_import_subject.bash

#!/bin/bash
 
# settings ; PLEASE ADAPT
LOCAL_FOLDER=/Users/XXX/Desktop/CPU
REMOTE_FOLDER=/export/data/scanner/a_to_c/CPU
REMOTE_HOST=host.domain.columbia.edu
REMOTE_USER=xxx
 
# ensure there are arguments
if [[ "a$@" == "a" ]] ; then
    echo "No subjects requested."
    echo "USAGE (import one subject): $0 subject1"
    echo "USAGE (import several subjects): $0 subject1 subject2 subject3 ..."
    echo "USAGE (import several subjects, pattern based): $0 subj\*patt"
    echo "---"
    echo "This script uses ssh. You will be asked to enter"
    echo "the password of user $REMOTE_USER on host $REMOTE_HOST."
    echo "---"
    echo "Data will be stored in $LOCAL_FOLDER."
    exit 1;
  fi
 
# change into local folder
cd $LOCAL_FOLDER
 
# retrieve requested files
ssh -l $REMOTE_USER REMOTE_HOST "cd $REMOTE_FOLDER ; tar -cf - $@" | tar -xf -

Folder naming and creation

SCANUnit_subject_foldernames.bash

#!/bin/bash
 
# settings ; PLEASE ADAPT
LOCAL_FOLDER=/Users/XXX/Desktop/CPU
SUBJ_PATTERN=CPU*
SUBFOLDERS="functional onsets structural"
RAWFOLDER=raw
 
# Note: the SUBFOLDERS variable can be easily adapted to create
# several levels of folders:
# SUBFOLDERS="functional functional/raw functional/prepro ..."
 
# for every folder found (matching the pattern)
find $LOCAL_FOLDER -type d -mindepth 1 -maxdepth 1 -name "$SUBJ_PATTERN" | while read subject_folder ; do
 
    # make sure all subfolders we need exist
    for sub in $SUBFOLDERS ; do
        mkdir $subject_folder/$sub 2>/dev/null
      done
 
    # rename original folder, if existent, to raw
    find $subject_folder -type d -mindepth 1 -maxdepth 1 -name "$SUBJ_PATTERN" -exec mv {} $RAWFOLDER \;
  done

Archiving

For this, there is really no need for a script, simply let rsync do the work:

rsync_example.sh

rsync -aut /Users/XXX/Desktop/CPU /Volumes/CPU/ImagingData

This can be easily added as a line to crontab (callable via crontab -e):

rsync_crontab_snip.txt

30 2 * * * /usr/bin/rsync -aut /Users/XXX/Desktop/CPU /Volumes/CPU/ImagingData

To look up the help of crontab, enter man 5 crontab in the Terminal.

Note: On MacOSX to ensure that the backuping volume is always available, add it to the user's startup items list in System Preferences, and store the password to that network volume in your keychain!

DICOM import

This part of the scripted process is already covered at its own, proper page. See image file conversion for more information.

Preprocessing

Using the spm5_preprojobs function, we can easily setup a small script that preprocesses all (remaining) subjects:

SCANUnit_CPU_prepro.m

% folder location
subjfolder = '/Users/Desktop/XXX/CPU';
subjpattern = 'CPU*';
funcfolder = 'functional';
anatfolder = 'struct';
 
% settings (for spm5_preprojobs)
funcpattern = 'r*340';
anatpattern = '';
usestruct = true;
runjobs = true;
disdaqs = 5;
smoothmm = 6;
sliceorder = 'aio';
tr = 2;
 
% complete settings
if ~isempty(funcpattern)
    ffp = [funcfolder '/' funcpattern];
else
    ffp = funcfolder;
end
if ~isempty(anatpattern)
    anatpattern = [anatfolder '/' anatpattern];
else
    anatpattern = anatfolder;
end
ppopts = struct( ...
    'fun2str', usestruct, ...
    'jobrun', runjobs, ...
    'skip', disdaqs, ...
    'smk', smoothmm, ...
    'sto', sliceorder, ...
    'sttr', tr);
 
% find subjects
subjects = findfiles(subjfolder, subjpattern, 'depth=1', 'dirs=1');
 
% for each subject
for sc = 1:numel(subjects)
 
    % determine whether the subject needs processing
    rps = findfiles([subjects{sc} '/' funcfolder], 'rp*.txt');
    if ~isempty(rps)
        continue;
    end
 
    % get subject ID
    [basefld, subjid] = fileparts(subjects{sc});
    subjid(subjid == '_') = [];
 
    % run fMRI quality on all runs
    runs = findfiles([subjects{sc} '/' funcfolder], funcpattern, 'dirs=1', 'depth=1');
    for rc = 1:numel(runs)
        q = fmriquality(findfiles(runs{rc}, {'*.img', '*.nii'}));
        save(sprintf('%s/fmriquality.mat', runs{rc}), 'q');
        clear q;
    end
 
    % run preprocessing
    [j, jh, ppfiles] = spm5_preprojobs(subjects{sc}, ffp, anatpattern, ppopts);
 
    % for each run, create one VTC
    for rc = 1:numel(ppfiles)
        [basefld, runid] = fileparts(fileparts(ppfiles{rc}{1}));
        vtc = importvtcfromanalyze(ppfiles{rc});
        vtc.TR = round(1000 * tr);
        vtc.SaveAs(sprintf('%s/%s/%s_%s.vtc', subjects{sc}, funcfolder, subjid, runid);
        vtc.ClearObject;
    end
end

Semi-automatic logfile parsing and PRT creation

The onsets have supposedly already been transformed into a tabular structure (e.g. in Microsoft Excel), so that they look like this:

SUBJ_ID	INST_on	CUE1_on	CUE2_on	INST_of	CUE1_of	CUE2_of
20216	3	nan	nan	5003	nan	nan												
20216	nan	3003	nan	nan	18003	nan												
20216	18000	nan	nan	23000	nan	nan												
20216	nan	nan	21000	nan	nan	26000												
20216	74057	nan	nan	79057	nan	nan												
20216	nan	77057	nan	nan	92057	nan												
20216	92068	nan	nan	97068	nan	nan												
20216	nan	nan	95067	nan	nan	100067												
20216	148112	nan	nan	153112	nan	nan												
20216	nan	151112	nan	nan	166112	nan												
20216	166122	nan	nan	171122	nan	nan												
20216	nan	nan	169122	nan	nan	174122												
20216	1	nan	nan	5001	nan	nan												
20216	nan	3001	nan	nan	18001	nan												
20216	17999	nan	nan	22999	nan	nan												
20216	nan	nan	21011	nan	nan	26011												
20216	74056	nan	nan	79056	nan	nan												
20216	nan	77056	nan	nan	92056	nan												
20216	92066	nan	nan	97066	nan	nan												
20216	nan	nan	95065	nan	nan	100065												
20216	148110	nan	nan	153110	nan	nan												
20216	nan	151110	nan	nan	166110	nan												
20216	166120	nan	nan	171120	nan	nan												
20216	nan	nan	169120	nan	nan	174120

In this table, the first column codes for the subject ID (which is used to detect when next subject occurs, in this sample there is only one subject!). The next three columns code the onset and the final three columns the offset of the three conditions used in this experiment (INST = Instruction, CUE1 = cue type 1, CUE2 = cue type 2).

The following script will then create a set of PRT files out of this table (which can be copied into Matlab):

SCANUnit_CPU_prtfromtable.m

% make sure the onset table is available in variable ot !!
 
% configuration
conds = {'INST', 'CUE1', 'CUE2'};
condcol = [255, 0, 0; 0, 255, 0; 0, 0, 255];
 
% find unique subject IDs
subjid = unique(ot(:, 1));
 
% for each subject ID
for sc = 1:numel(subjid)
 
    % get this subject's sub-table
    sot = ot(ot(:, 1) == subjid(sc), :);
 
    % set NaNs to zero (for now)
    sot(isnan(sot)) = 0;
 
    % get onsets and offsets of any trial
    onsets = sum(sot(:, 2:2+numel(conds)-1), 2);
 
    % get table with NaNs (again) for condition detection
    sot = ot(ot(:, 1) == subjid(sc), :);
 
    % find run-separators
    runseps = 1 + [0; find(diff(onsets) < 0)];
    runseps(end+1) = numel(onsets) + 1;
 
    % for each run
    for rc = 1:numel(runseps)-1
 
        % get part of table we need
        rot = sot(runseps(rc):runseps(rc+1)-1, 2:end);
 
        % create new PRT
        prt = xff('new:prt');
 
        % for each condition
        for cc = 1:numel(conds)
 
            % add to PRT
            prt.AddCond(conds{cc}, rot(~isnan(rot(:, cc)), [cc, cc + numel(conds)]), condcol(cc, :));
        end
 
        % save PRT
        prt.SaveAs(sprintf('%d_%d.prt', subjid(sc), rc));
 
        % clear object
        prt.ClearObject;
    end
end

Note: the PRT files will be written into the present working directory (unless the SaveAs line is appropriately altered), and they need to be moved or copied into their respective subject's folder!

Computing an RFX-GLM with VTCs and PRTs

Once all files are in place, the Protocol and VTC files can be used to compute an RFX-GLM. Please follow the instructions given in the example of the mdm.ComputeGLM method reference page.

NeuroElf wiki

Table of Contents