Selxavg3-sess: Difference between revisions

From CCN Wiki
Jump to navigation Jump to search
 
(24 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== Before you begin ==
== Before you begin ==
Ensure that your BOLD data has the correct timing info. See [[Freesurfer_BOLD_files |Freesurfer BOLD files]] for information on changing the TR for your data. If your BOLD data files are not named 'f.nii', you likely have missed a step.
Ensure that your BOLD data has the correct timing info. See [[Freesurfer_BOLD_files |Freesurfer BOLD files]] for information on changing the TR for your data. If your BOLD data files are not named 'f.nii', you likely have missed a step. Also ensure that your subject folders contain a subjectname file.


== Running selxavg3-sess ==
== Running selxavg3-sess ==
This command runs a GLM regression on the BOLD time series for each run separately. This creates a 3D file that stores a statistic for each voxel for each experimental run that indicates how well the activity for that voxel conformed with a predicted time series associated with each of your experimental conditions. Assuming you have run all the previous steps in the [[FreeSurfer#FS-FAST_Setup_Overview | FS-FAST overview]], there will be one, two or three analysis directories in your $SUBJECTS_DIR, generated by [[Configure_mkanalysis-sess|mkanalysis-sess]]. These directories will contain all the information to perform surface-space analyses in your left/right or mni305 (voxel-space, for subcortical regions which are not included in the surface models) models. If you have configured contrasts using [[Configure_mkcontrast-sess|mkcontrast-sess]], then ''selxavg3-sess'' will perform the statistical condition contrasts you had specified in that step.
This command runs a GLM regression on the BOLD time series for each run separately. This creates a 3D file that stores a statistic for each voxel for each experimental run that indicates how well the activity for that voxel conformed with a predicted time series associated with each of your experimental conditions. Assuming you have run all the previous steps in the [[FreeSurfer#FS-FAST_Setup_Overview | FS-FAST overview]], there will be one, two or three analysis directories in your $SUBJECTS_DIR, generated by [[Configure_mkanalysis-sess|mkanalysis-sess]]. These directories will contain all the information to perform surface-space analyses in your left/right or mni305 (voxel-space, for subcortical regions which are not included in the surface models) models. If you have configured contrasts using [[Configure_mkcontrast-sess|mkcontrast-sess]], then ''selxavg3-sess'' will perform the statistical condition contrasts you had specified in that step.
On workstation 3, selxavg3-sess takes 40 minutes to perform a GLM for 12 runs on a single surface.


=== Batch mode ===
=== Batch mode ===
Line 14: Line 16:


== Example ==
== Example ==
Here, we are going to run selxavg3-sess on a single participant (<code>FS_0001</code>) for the analysis configuration found in the folder <code>$SUBJECTS_DIR/LDT.sm6.lh</code>
  selxavg3-sess -s FS_0001 -analysis LDT.sm6.lh
  selxavg3-sess -s FS_0001 -analysis LDT.sm6.lh
Note that the analysis folder indicated in this example is configured only for the left hemisphere. You would most likely want to run this command a second time, and specify the configuration for the right hemisphere. In this way,  you will have run the analysis for both cortical hemispheres. A third execution of the command would be required to run analyses for subcortical regions found in MNI305.
==Scripting==
Because you will be running selxavg3-sess once for each of the surfaces/volumes, you might consider using a script to loop through the list of surfaces to automate it:
===Example selxavg3-sess shell script===
#!/bin/bash
PARADIGM=$1
shift
SMOOTHING=$1
shift
AROOT=${PARADIGM}.${SMOOTHING}
HEMIS=( lh rh mni )
for hemi in "${HEMIS[@]}"
do
  selxavg3-sess -sf subjects -analysis ${AROOT}.${hemi}
done
I saved the script to a file called '''do_selx''', and after I made it executable, I was able to automate running the script on all the subjects in the '''subjects''' session file:
do_selx FAM sm4
This process takes a little while to run, especially when running a batch of several participants. When running it remotely (i.e., connected over the internet via ssh), I didn't want to have to worry about the process stopping part-way through because of a network disconnection. My solution was to run it as a background process using the <code>nohup</code> command, which ensures that the process continues, even if the parent shell process terminates:
<span style="color:red;">nohup</span> do_selx FAM sm4 <span style="color:red;">&</span>
All the screen output will get redirected to nohup.out if you wish to track the progress of the background process:
cat nohup.out
With the process safely running in the background, I am free to shut down my laptop, confident that when I check back later on, the first level GLM analysis of all my subjects will have been completed.
== What Next? ==
If all goes well, you can move on to the [[Mri_glmfit | group level analysis]]


== Troubleshooting ==
== Troubleshooting ==
=== SVD input matrix contains NaN ===
I've encountered this problem twice, only documenting the solution the second time. When running selxavg3-sess, you may encounter the following error:
Creating Design Matrix
  ... creation time =  0.019 sec
DoMCFit = 1
ntptot = 2592, nX = 81, DOF = 2511
Saving X matrix to /home/chris/SEM/FS_0231/bold/FAM.self.sm4.down.rh/Xtmp.mat
Error using svd
Input to SVD must not contain NaN or Inf.
Error in cond (line 35)
    s = svd(A);
Error in fast_selxavg3 (line 279)
  XCond = cond(XtX);


=== ERROR: fast_selxavg3() ===
The problem is that the Xn matrix generated from the .par files contains several columns of NaN. The root cause of this is when your par files are missing one or more of the conditions that are listed in the analysis. In the above error, the familiarity analysis includes 3 familiarity conditions: Familiar (1), Unfamiliar_Time1 (2) and Unfamiliar_Time2 (3). The ''FAM.par'' files that were produced by a Matlab script only differentiated between familiar and unfamiliar items, and consequently only contained conditions 1 and 2.
====Solution====
The solution was to find the ''FAM.par'' files associated with Time 2 (runs 007 to 012, when both time points are analyzed as a single 12-run session) and recode the 2s into 3s. After doing this, all was well with the world (aside from the geopolitical strife and environmental disasters).
This can be done fairly easily using <code>sed</code> with the -i "in place" switch:
sed -i 's/\t2/\t3/g' FAM.par
 
More generally, this problem can occur if your condition codes don't start at 1. For example, you have 6 conditions and only want to compare 2 of them (e.g. 5 and 6), they need to be recoded as 1 and 2.
 
=== ERROR: Subscript indices must either be real positive integers or logicals===


<code>ERROR: fast_selxavg3() failed\n</code>
<code>ERROR: fast_selxavg3() failed\n</code>


Leading up to the error message, we see a message <code>Subscript indices must either be real positive integers or logicals.</code>
Leading up to the error message, we see a message <code>Subscript indices must either be real positive integers or logicals.</code>
There are two plausible causes here, both related to your PAR files.


When we see messages that refer to indices in the context of an error message, this is often symptomatic of a mismatch between the amount of data available, and the data access request. For example, if only 252 volumes of data are present (numbered 1, 2, ..., 252) but a script or function is asked to access volume 253 or volume 0, then this will likely generate an error.
When we see messages that refer to indices in the context of an error message, this is often symptomatic of a mismatch between the amount of data available, and the data access request. For example, if only 252 volumes of data are present (numbered 1, 2, ..., 252) but a script or function is asked to access volume 253 or volume 0, then this will likely generate an error.


If the same processing steps work fine with some data, but generate the above error for a particular dataset, check to see that there isn't something funny with the data. One potential culprit is the early termination of the BOLD sequence. For the LDT, which is self-paced, there were a few runs where the MRI acquisition stopped before the task ended. In this case, the timestamps for that data will make reference to points in time outside of the scope of the MRI data.
Another cause concerns the presence of NAN values in the event duration column in the par files when these files are generated from our MATLAB runtime files -- a NAN may be recorded for the participant response time if the participant failed to respond.
 
If the same processing steps work fine with some data, but generate the above error for a particular dataset, check to see that there isn't something funny with the data. One potential culprit is the early termination of the BOLD sequence. For the LDT, which is self-paced, there were a few runs where the MRI acquisition stopped before the task ended. In this case, the timestamps for the runtime behavioral data will make reference to points in time outside of the scope of the MRI data. The scripts we use to generate the .par files from these runtime data have no way of knowing that these timestamps go past the end of the BOLD sequence.
 
==== Solution ====
==== Solution ====
Try manually editing the .par file by removing timestamps outside of the MRI acquisition. You will have to do some math (''TR'' &times; ''Number of Volumes'') to figure out what the last valid timestamp would be.
I've written a script, '''sane_par.sh''', which you should find in your $PATH:
editfslhd f.nii
  sane_par.sh FS_0000
For FS_0092, we see
The script will go into all the run directories listed in the subject's bold/runs file, identify all the .par files, and truncate the .par file entries so that none of the events fall past the end of the scan, calculated my multiplying the TR by the number of volumes in f.nii.gz. It will also remove any NaN entries in the event duration column.
nt = '252'
  ...
dt = '2.047'
So the latest possible timepoint is 515.8. Any .par file events past this point are not in the data.


Also, while editing the .par file, ensure that any NaN values are set to a real number, which, now that I think of it, is another potential culprit (since NaN is neither a ''real positive integer or logical''). The FSTSExtractor.m script uses RT as the event duration, but this value is set to NaN if the participant didn't respond. Set the NaN value to 0, but also set the weight (the last column) to 0 so that these null events aren't modeled.  
===ERROR: analysis.cfg does not exist ===


An update to the FSTSExtractor function will need to be made so that this problem doesn't pop up, but in the meantime, ensuring that the .par file timestamps are sensible and contain no NaN values fixes this problem. A fast way to find any NaN values:
 
grep "NaN" `find ./ -name "LDT.par"`
This error occurs if you run the selxavg-sess function associated with older FS versions (e.g. check your command: selxavg-sess vs. selxavg3-sess). Freesurfer 5.x no longer creates an analysis.cfg file with the newer selxavg3-sess.


== Check your results ==
== Check your results ==
Line 57: Line 111:
#*When mapping voxels onto fsaverage, you can select the '''Compute Identity Matrix''' registration option.
#*When mapping voxels onto fsaverage, you can select the '''Compute Identity Matrix''' registration option.
#Click the '''OK''' button. In a moment you should see hot- and cold- colored blobs overlaid on the surface model. Depending on the nature of the contrast and your experience, you might be able to use this overlay to assess whether the data look reasonable. For example a contrast between button-press trials and rest should probably show a robust activation in left motor cortex (assuming right-handed participants).
#Click the '''OK''' button. In a moment you should see hot- and cold- colored blobs overlaid on the surface model. Depending on the nature of the contrast and your experience, you might be able to use this overlay to assess whether the data look reasonable. For example a contrast between button-press trials and rest should probably show a robust activation in left motor cortex (assuming right-handed participants).
[[Category: FreeSurfer]]
[[Category: First Level Analysis]]

Latest revision as of 14:16, 7 December 2020

Before you begin

Ensure that your BOLD data has the correct timing info. See Freesurfer BOLD files for information on changing the TR for your data. If your BOLD data files are not named 'f.nii', you likely have missed a step. Also ensure that your subject folders contain a subjectname file.

Running selxavg3-sess

This command runs a GLM regression on the BOLD time series for each run separately. This creates a 3D file that stores a statistic for each voxel for each experimental run that indicates how well the activity for that voxel conformed with a predicted time series associated with each of your experimental conditions. Assuming you have run all the previous steps in the FS-FAST overview, there will be one, two or three analysis directories in your $SUBJECTS_DIR, generated by mkanalysis-sess. These directories will contain all the information to perform surface-space analyses in your left/right or mni305 (voxel-space, for subcortical regions which are not included in the surface models) models. If you have configured contrasts using mkcontrast-sess, then selxavg3-sess will perform the statistical condition contrasts you had specified in that step.

On workstation 3, selxavg3-sess takes 40 minutes to perform a GLM for 12 runs on a single surface.

Batch mode

If you want to run the same analysis on multiple participants in batch mode, you must first create a plain text file containing all the participant IDs (i.e., the folder names) for those you wish to analyze. If you ran preproc-sess in batch mode, then you can simply reuse the sessid file you created for that purpose. See the preproc-sess instructions for creating that file. Batch mode uses the -sf switch, which is the same as the switch used for running preproc-sess in batch mode.

selxavg3-sess -sf <sessid_file> -analysis <analysis_dir>

Individual mode

Alternatively, you may run the analysis on one participant at the time using the -s switch, and providing a single participant ID name.

selxavg3-sess -s <sessid> -analysis <analysis_dir>

Example

Here, we are going to run selxavg3-sess on a single participant (FS_0001) for the analysis configuration found in the folder $SUBJECTS_DIR/LDT.sm6.lh

selxavg3-sess -s FS_0001 -analysis LDT.sm6.lh

Note that the analysis folder indicated in this example is configured only for the left hemisphere. You would most likely want to run this command a second time, and specify the configuration for the right hemisphere. In this way, you will have run the analysis for both cortical hemispheres. A third execution of the command would be required to run analyses for subcortical regions found in MNI305.

Scripting

Because you will be running selxavg3-sess once for each of the surfaces/volumes, you might consider using a script to loop through the list of surfaces to automate it:

Example selxavg3-sess shell script

#!/bin/bash
PARADIGM=$1
shift
SMOOTHING=$1
shift
AROOT=${PARADIGM}.${SMOOTHING}
HEMIS=( lh rh mni )
for hemi in "${HEMIS[@]}"
do
 selxavg3-sess -sf subjects -analysis ${AROOT}.${hemi} 
done

I saved the script to a file called do_selx, and after I made it executable, I was able to automate running the script on all the subjects in the subjects session file:

do_selx FAM sm4

This process takes a little while to run, especially when running a batch of several participants. When running it remotely (i.e., connected over the internet via ssh), I didn't want to have to worry about the process stopping part-way through because of a network disconnection. My solution was to run it as a background process using the nohup command, which ensures that the process continues, even if the parent shell process terminates:

nohup do_selx FAM sm4 &

All the screen output will get redirected to nohup.out if you wish to track the progress of the background process:

cat nohup.out

With the process safely running in the background, I am free to shut down my laptop, confident that when I check back later on, the first level GLM analysis of all my subjects will have been completed.

What Next?

If all goes well, you can move on to the group level analysis

Troubleshooting

SVD input matrix contains NaN

I've encountered this problem twice, only documenting the solution the second time. When running selxavg3-sess, you may encounter the following error:

Creating Design Matrix
 ... creation time =  0.019 sec
DoMCFit = 1
ntptot = 2592, nX = 81, DOF = 2511
Saving X matrix to /home/chris/SEM/FS_0231/bold/FAM.self.sm4.down.rh/Xtmp.mat
Error using svd
Input to SVD must not contain NaN or Inf.
Error in cond (line 35)
   s = svd(A);
Error in fast_selxavg3 (line 279)
 XCond = cond(XtX);

The problem is that the Xn matrix generated from the .par files contains several columns of NaN. The root cause of this is when your par files are missing one or more of the conditions that are listed in the analysis. In the above error, the familiarity analysis includes 3 familiarity conditions: Familiar (1), Unfamiliar_Time1 (2) and Unfamiliar_Time2 (3). The FAM.par files that were produced by a Matlab script only differentiated between familiar and unfamiliar items, and consequently only contained conditions 1 and 2.

Solution

The solution was to find the FAM.par files associated with Time 2 (runs 007 to 012, when both time points are analyzed as a single 12-run session) and recode the 2s into 3s. After doing this, all was well with the world (aside from the geopolitical strife and environmental disasters). This can be done fairly easily using sed with the -i "in place" switch:

sed -i 's/\t2/\t3/g' FAM.par

More generally, this problem can occur if your condition codes don't start at 1. For example, you have 6 conditions and only want to compare 2 of them (e.g. 5 and 6), they need to be recoded as 1 and 2.

ERROR: Subscript indices must either be real positive integers or logicals

ERROR: fast_selxavg3() failed\n

Leading up to the error message, we see a message Subscript indices must either be real positive integers or logicals.

There are two plausible causes here, both related to your PAR files.

When we see messages that refer to indices in the context of an error message, this is often symptomatic of a mismatch between the amount of data available, and the data access request. For example, if only 252 volumes of data are present (numbered 1, 2, ..., 252) but a script or function is asked to access volume 253 or volume 0, then this will likely generate an error.

Another cause concerns the presence of NAN values in the event duration column in the par files when these files are generated from our MATLAB runtime files -- a NAN may be recorded for the participant response time if the participant failed to respond.

If the same processing steps work fine with some data, but generate the above error for a particular dataset, check to see that there isn't something funny with the data. One potential culprit is the early termination of the BOLD sequence. For the LDT, which is self-paced, there were a few runs where the MRI acquisition stopped before the task ended. In this case, the timestamps for the runtime behavioral data will make reference to points in time outside of the scope of the MRI data. The scripts we use to generate the .par files from these runtime data have no way of knowing that these timestamps go past the end of the BOLD sequence.

Solution

I've written a script, sane_par.sh, which you should find in your $PATH:

sane_par.sh FS_0000

The script will go into all the run directories listed in the subject's bold/runs file, identify all the .par files, and truncate the .par file entries so that none of the events fall past the end of the scan, calculated my multiplying the TR by the number of volumes in f.nii.gz. It will also remove any NaN entries in the event duration column.

ERROR: analysis.cfg does not exist

This error occurs if you run the selxavg-sess function associated with older FS versions (e.g. check your command: selxavg-sess vs. selxavg3-sess). Freesurfer 5.x no longer creates an analysis.cfg file with the newer selxavg3-sess.

Check your results

You can look at the contrast results in the surface model in tksurfer. First, load the surface you want to overlay your results on. All of our examples have been done using the self surface, but it is possible to instead map data to the fsaverage template, which facilitates comparisons between individuals.

tksurfer FS_T1_501 lh pial

Next, you will load the relevant statistical overlay. This file will be found in the subject folder in the bold subdirectory, which has an analysis folder for each analysis that you ran with selxavg3-sess. In this example, there are two folders:

  • $SUBJECTS_DIR
    • FS_T1_501
      • bold
        • booth500.sm6.lh
        • booth500.sm6.rh

In each of these analysis folders are several compressed .nii.gz files. The statistical overlay you will want to overlay will depend on your interests, but most likely you will want to open up a diagnostic (i.e., easily interpretable) contrast map, which you will find in subdirectories within each of the analysis directories. In my case, I had run a word_vs_fix contrast which generated a t-statistic and f-statistic map. I chose to load the t-map for my left-hemisphere surface. To do so, I did the following:

  1. File>Load Overlay...
  2. Browse to $SUBJECTS_DIR/FS_T1_501/bold/booth500.sm6.lh/word_vs_fix/t.nii.gz
  3. Additional Registration information may be required to map each of the voxels on to your surface space.
    • When mapping voxels onto a self surface, it's safe to select the Find registration in data directory
    • When mapping voxels onto fsaverage, you can select the Compute Identity Matrix registration option.
  4. Click the OK button. In a moment you should see hot- and cold- colored blobs overlaid on the surface model. Depending on the nature of the contrast and your experience, you might be able to use this overlay to assess whether the data look reasonable. For example a contrast between button-press trials and rest should probably show a robust activation in left motor cortex (assuming right-handed participants).