Functional Connectivity (Cross-Correlation Method): Difference between revisions

From CCN Wiki
Jump to navigation Jump to search
No edit summary
 
(48 intermediate revisions by 11 users not shown)
Line 1: Line 1:
If you consider two vectors containing ''t'' values representing neural activity over ''t'' time points, the degree to which the values in these vectors move together, as indexed by the correlation between these vectors, can be taken as a measure of functional connectivity. The underlying assumption is that the activity in two regions should be related if there is some sort of underlying connection between them.
If you consider two vectors containing ''t'' values representing neural activity over ''t'' time points, the degree to which the values in these vectors move together, as indexed by the correlation between these vectors, can be taken as a measure of functional connectivity. The underlying assumption is that the activity in two regions should be related if there is some sort of underlying connection between them.


This page assumes you have already obtained time series matrices using the approaches [[Time_Series_Analysis | described here]]. If these data were obtained in FreeSurfer surface space, there will be 2 matrices (lh, rh) times 'r' runs. As I have not yet worked out the process for voxel-space (i.e., using SPM), details are sketchy, but it seems probable that there would only be a single matrix of values per run. In either case, some data cleaning may be required.
This page assumes you have already obtained time series matrices using the approaches [[Time_Series_Analysis | described here]]. If these data were obtained in FreeSurfer surface space, there will be 2 matrices (lh, rh) times 'r' runs. If subcortical activations have been obtained in MNI305 space, a third matrix will also be available. In either case, some data cleaning may be required.


These steps will be carried out in MATLAB. The first step then would be to start up MATLAB in your terminal (the ampersand lets the program run in the background so that you can continue to use your terminal window to do other things if you wish):
These steps will be carried out in MATLAB. The first step then would be to start up MATLAB in your terminal (the ampersand lets the program run in the background so that you can continue to use your terminal window to do other things if you wish):
Line 8: Line 8:
Now that we're in MATLAB, any code snippets can be interpreted as MATLAB code unless otherwise noted.
Now that we're in MATLAB, any code snippets can be interpreted as MATLAB code unless otherwise noted.


== Load Your TIme Series Data ==
== Load Your Time Series Data ==
The next step is to load your data matrices. Assuming the time series are plaintext files containing only structured numerical data (e.g., as produced by the gettimecourses.sh script), you can use the <code>load</code> function to import these data.
The general instructions for importing your detrended time series data into MATLAB can be found [[Importing_Time_Series_Into_MATLAB | here]]
 
=== Manual loading ===
cd (['FS_T1_501' filesep 'bold']');
ls('*.wav.txt');
The first of the above two commands will get you into the bold/ directory for subject FS_T1_501, which is where you would expect to find the time series data. The second command will display the names of all files ending in ''.wav.txt'', which is the convention used by the gettimecourses.sh script. Now that you have identified the relevant files, you can import them individually, assigning them meaningful variable names:
run05_left=load('FS_T1_501_lh_lausanne_005.fmcpr.sm6.self.wav.txt');
run05_right=load('FS_T1_501_rh_lausanne_005.fmcpr.sm6.self.wav.txt');
run06_left=load('FS_T1_501_lh_lausanne_006.fmcpr.sm6.self.wav.txt');
run06_right=load('FS_T1_501_rh_lausanne_006.fmcpr.sm6.self.wav.txt');
If you are using the Lausanne 2008 100-region parcellation, you should find that the lh matrices have 500 columns (regions) and the rh matrices have 502.
 
At this point, it makes sense to merge the left and right hemisphere data into a single matrix, but continue to keep the data for each run separate. If each row represents a time point and each column represents a region, the matrices for a given run will be time x nregions in size. We want to concatenate the matrices together along the column dimension so that the resulting matrix is time x (nregions + nregions) in size (and '''not''' (time+time) x nregions!). Horizontal concatenation (i.e., along the columnar dimension, which is what you want to do here) is easily done in MATLAB:
run_05 = [run05_left run05_right];
run_06 = [run06_left run06_right];
 
=== Functionally-assisted loading ===
The above steps have since been made simpler with the inevitable development of a MATLAB function, <code>loadFSTS.m</code> that can be found in the ubfs Scripts/Matlab folder. If you have copied it to a folder in your MATLAB path, it can be invoked thus:
M=loadFSTS(); %A dialog box will prompt you to select the files you want to open
Of course, you can and should read the help documentation to find out the full capabilities of the function, which I assure you is very clever. But the above functionality will probably get you through 99.99% of the time.


== Data Cleaning ==
== Data Cleaning ==
Steps undertaken during earlier fMRI data conversion and/or preprocessing may influence the sorts of steps you wish to perform in order to clean up your data:
Steps undertaken during earlier fMRI data conversion and/or preprocessing may influence the sorts of steps you wish to perform in order to clean up your data:
=== Drop Initial Volumes ===
When fMRI data are acquired, there is a short period at the beginning of each run, perhaps 6-12 seconds, during which the magnetic field fluctuates as the gradients are turned on. Measurements taken during this time are likely to contain artifacts of this instability. You can see this by plotting the values from a single region over time:
plot(run_05(:,1));
The above command plots the value from all rows (the ':' entry means 'everything') in the first column (the '1' entry for the column), and would show the activation of region 1 across the entire run.
If we know the TR, in seconds, we can figure out the number of volumes to drop from our matrices in order to eliminate these bad measurements. For example, if our TR is 2 seconds and we want to drop the first 12 seconds of data, we are dropping the first 6 rows of each matrix:
run_05 = run_05(7:end,:);
run_06 = run_06(7:end,:);
The above commands are replacing the run_05 (and _06) matrices with what's left over when you retain rows 7-''whatever the last row is''. Note that the ':' in the columns selector is saying that you want to retain all the columns. After running these commands,  you should find your matrices are 6 elements smaller in the rows dimension but contain the same number of columns.
'''Note: You should ensure that all team members are using the same data cleaning parameters. For the Booth data, it is sensible to drop the first 6 volumes, as the first trial begins at t=12 seconds (with TR=2 seconds)'''
=== Drop Regions ===
The same principle can be applied in the columns dimension if there are regions you wish to omit. For example, if you used the sets of scripts described elsewhere in this wiki, the first region in the 1000-region Lausanne annotation is the corpus callosum (a white matter tract) which would be reasonable to exclude from a connectivity analysis. Note that if you drop regions from your time series, you will need to keep that in mind when labeling the regions in your connectivity matrices.
If you plan on dropping regions, it is easiest to do so prior to merging hemispheres. The code snippet below drops the first column (by retaining only columns 2-''whatever the last column is'') from each of the matrices for run 5. The command for run 6 would be very similar:
run05_left=run05_left(:,2:end);
run05_right=run05_right(:,2:end);
Another example, this time dropping regions 1 and 5 (corresponding to the ''unknown'' and ''corpuscallosum'' regions in ?h.aparc_60.annot):
run06_left=run06_left(:,[2 3 4 6:end]);
run06_right=run06_right(:,[2 3 4 6:end]);
If you are unsure if the identity of a region, check the corresponding .sum files that were generated by the gettimecourses.sh script.
Note also that if you use the loadFSTS() function, you can specify the regions to be dropped at the time of loading. If you do so, the specified columns will have already been dropped from the time series data, and you will not need to do this as a separate step.
'''Note: If any regions are being dropped, you should ensure that this is consistent across team members for the same analysis. Reasonable regions to drop include activation values from the 'unknown' region, and from the 'corpuscallosum' region.'''


=== Normalize Data ===
=== Normalize Data ===
Line 66: Line 19:
  zrun06=zscore(run06);
  zrun06=zscore(run06);
The normalized matrix will have z-scores for each time point, calculated using the mean and standard deviation for each column (region) separately. Thus, a normalized value of 1 in Region A will mean that the activation value was 1 standard deviation above the average for that region.
The normalized matrix will have z-scores for each time point, calculated using the mean and standard deviation for each column (region) separately. Thus, a normalized value of 1 in Region A will mean that the activation value was 1 standard deviation above the average for that region.
=== Censor Data ===
 
You may be surprised to learn that a MATLAB function, <code>normalizeMatrix.m</code> has been created to facilitate the normalization of our time series data. If it has been copied to your path, you can easily normalize your multidimensional matrix or cell array '''''M''''' (recall that '''''M''''' is a cell array if you used the <code>loadFSTS.m</code> function). The built-in MATLAB function, <code>zscore()</code> unfortunately doesn't work on cell arrays. The <code>normalizeMatrix.m</code> function was written to permit seamless normalization of the time series matrices, whether they are 3D matrices or cell arrays.
 
Additional functionality was added to drop rows (i.e., time points) specified in an array, thus eliminating the need to drop specific time points in a separate step as described above. '''You should only do this if volumes were not already dropped from the .nii files prior to importing the timeseries!'''. If you drop 4 rows from the .nii files, and then drop the first four rows again, the first 8 volumes will have been dropped from your time series! However even if you have already dropped volumes from the start of the .nii file, you may wish to drop volumes from the end of the time series (e.g., if the experiment ended before the run was finished).
droprows=[-2 -1 0 1 2 3]; %omit rows 1, 2, 3 and end, end-1 and end-2
Z=normalizeMatrix(M, droprows);
Note however that when rows are dropped in this way, the same time points will be dropped from all time series matrices. In most cases, this will be sensible (e.g., to drop the first few volumes with unstable magnetic fields). However if you wish to drop different sets of time points for different sessions then it is advisable to load and normalize each session individually, specifying a different ''droprows'' matrix for each. The individual cell arrays can be concatenated afterwards.
 
==== Dropping Columns ====
I'll document this here because I've not yet mastered <code>cellfun</code>, but managed to use it to prune a cell array of matrices. Scenario: '''''dtM''''' is a 6 &times; 1 cell array of n &times; 1000 detrended values. I wanted to keep only the 16 columns, indexed in the vector '''''goodsnr'''''  with the best signal-to-noise ratio (which had already been calculated):
N=cellfun(@(x) x(:, goodsnr), dtM, 'UniformOutput', false);
Voila!
 
=== Censor Data (Optional Step) ===
It may be desirable to exclude extreme positive and negative spikes from the time series for each region, as they may reflect some sort of artifact. Additionally, there may be some other source of information that might be used to identify potentially problematic measurements (e.g., motion correction data, or a list of volumes identified in SPM when using the ArtRepair toolbox).
It may be desirable to exclude extreme positive and negative spikes from the time series for each region, as they may reflect some sort of artifact. Additionally, there may be some other source of information that might be used to identify potentially problematic measurements (e.g., motion correction data, or a list of volumes identified in SPM when using the ArtRepair toolbox).


Line 72: Line 38:
  replacement=NaN; %could instead have used replacement==thresh
  replacement=NaN; %could instead have used replacement==thresh
  thresh=2.5; %could use a more-or-less stringent threshold
  thresh=2.5; %could use a more-or-less stringent threshold
  zrun05(zrun05>thresh)=replacement;
  Zprime=Z
zrun05(zrun05<-thresh)=-replacement;
for i=1:length(Zprime)
 
  temp=Zprime{i}
  temp(temp>threshold)=replacement;
  temp(temp<-threshold)=-1*replacement;
  Zprime{i}=temp;
end
If you had an outside indicator of data quality, you could similarly use that to identify data to be removed. For example, a list of volumes with excessive motion (as determined during functional data realignment, or the ArtRepair toolbox) could be read in:
If you had an outside indicator of data quality, you could similarly use that to identify data to be removed. For example, a list of volumes with excessive motion (as determined during functional data realignment, or the ArtRepair toolbox) could be read in:
  bad_vols=load('art_censored.txt')
  bad_vols=load('art_censored.txt')
Line 83: Line 53:
       48
       48
       49
       49
Note that the above hypothetical example would assume that all the volumes were present in the data. If the initial ''n'' volumes have already been dropped from the data, then you would need to remove from consideration the first ''n'' volumes and then subtract ''n'' from the remaining bad_vols. This is because after dropping the first 6 volumes, what was originally the 7th time point is now actually the first time point in the remaining time series:
Note that the above hypothetical example would assume that all the volumes were present in the data. If the initial ''n'' volumes have already been dropped from the data, then you would need to remove from consideration the first ''n'' volumes and then subtract ''n'' from the remaining bad_vols. This is because after dropping the first 4 volumes, what was originally the 5th time point is now actually the first time point in the remaining time series.
n_dropped=6;
bad_vols=bad_vols(bad_vols>n_dropped);
bad_vols=bad_vols-n_dropped;
zrun05(bad_vols,:)=NaN; %sets the rows for time points indicated in bad_vols to NaN for all columns (':')


== Know What's What ==
== Know What's What ==
Line 93: Line 59:
  lh_region_info=parseFSSegments('FS_T1_501_lh_lausanne_005.fmcpr.sm6.self.sum.txt');
  lh_region_info=parseFSSegments('FS_T1_501_lh_lausanne_005.fmcpr.sm6.self.sum.txt');
  rh_region_info=parseFSSegments('FS_T1_501_rh_lausanne_005.fmcpr.sm6.self.sum.txt');
  rh_region_info=parseFSSegments('FS_T1_501_rh_lausanne_005.fmcpr.sm6.self.sum.txt');
lh_names=lh_region_info.segname;
 
rh_names=rh_region_info.segname;
If you have dropped regions from your time series data, ensure that you have also dropped the corresponding region from the segment information generated by <code>parseFSSegments.m</code>:
If you have dropped regions from your time series data, ensure that you have also dropped the corresponding region from the segment information generated by <code>parseFSSegments.m</code>:
  %In my examples, I have dropped regions 1 and 5 in both hemispheres
  %In my examples, I have dropped regions 1 and 5 in both hemispheres
  %These regions correspond to 'unknown' and 'corpuscallosum' in the myaparc_nnn .annot files  
  %These regions correspond to 'unknown' and 'corpuscallosum' in the myaparc_nnn .annot files  
  lh_names=lh_names([2 3 4 6:end]);
  lh_names=lh_region_info([2 3 4 6:end]);
  rh_names=rh_names([2 3 4 6:end]);
  rh_names=lh_region_info([2 3 4 6:end]);
Likewise, if you have merged left and right time series matrices together into a single matrix, do likewise with your region information.  
Likewise, if you have merged left and right time series matrices together into a single matrix, do likewise with your region information.  


Now, if you are particularly astute, or have already tried this, you will know that there is a problem if you just merge the region names together. This is because the same region names are used in both hemispheres (e.g., you might find there is a 'precuneus_1' in both the ''lh'' and ''rh'' region lists). There's a good chance that you'll want or need to differentiate between them at some point. Fortunately, MATLAB has a clever function called <code>strcat</code> that operates over cell arrays, which is conveniently how we have stored our region names:
Now, if you are particularly astute, or have already tried this, you will know that there is a problem if you just merge the region names together. This is because the same region names are used in both hemispheres (e.g., you might find there is a 'precuneus_1' in both the ''lh'' and ''rh'' region lists). There's a good chance that you'll want or need to differentiate between them at some point. Fortunately, MATLAB has a clever function called <code>strcat</code> that operates over cell arrays, which is conveniently how we have stored our region names:
  lh_names=strcat('lh_', lh_names);
  lh_names=strcat('lh_', 'lh_names');
  rh_names=strcat('rh_', rh_names);
  rh_names=strcat('rh_', 'rh_names');
  %now all the names listed in '''''lh_names''''' are prefixed with 'lh_', and likewise for the '''''rh_names''''' contents
  %now all the names listed in '''''lh_names''''' are prefixed with 'lh_', and likewise for the '''''rh_names''''' contents
Now we are all set to use the <code>vertcat</code> MATLAB function to '''vert'''ically con'''cat'''enate the two lists of region names (this is a vertical concatenation because the list of region names are in a '''n'''&times;1 cell array -- i.e., like a vertical stack of names. If they were in a 1&times;'''n''' cell array, it would be a horizontal concatenation using <code>horzcat</code>):
Now we are all set to use the <code>vertcat</code> MATLAB function to '''vert'''ically con'''cat'''enate the two lists of region names (this is a vertical concatenation because the list of region names are in a '''n'''&times;1 cell array -- i.e., like a vertical stack of names. If they were in a 1&times;'''n''' cell array, it would be a horizontal concatenation using <code>horzcat</code>):
Line 116: Line 81:
  Bseries=zrun05([27:44,79:96],:);
  Bseries=zrun05([27:44,79:96],:);
You are going to want to be really sure about what rows correspond to which points in time when you do this. You will find it is really easy to mess this up and accidentally get one fewer or one more row than you had intended. It might be a good idea to use another program like Excel to help you work out the correct row indices for the conditions you're interested in grouping together.
You are going to want to be really sure about what rows correspond to which points in time when you do this. You will find it is really easy to mess this up and accidentally get one fewer or one more row than you had intended. It might be a good idea to use another program like Excel to help you work out the correct row indices for the conditions you're interested in grouping together.
A MATLAB function has been written, '''findBlockBoundaries.m''', that can be found in the ubfs Scripts/Matlab folder. If you have some saved experiment run-time data generated by PsychToolbox, you will probably find this function to be helpful. This function identifies the volumes that form the boundaries of each block based on the timestamps of each of the trials:
A MATLAB function has been written, '''findBlockBoundaries.m''', that can be found in the ubfs Scripts/Matlab folder. If you have some saved experiment run-time data generated by PsychToolbox, you will probably find this function to be helpful. This function identifies the volumes that form the boundaries of each block based on the timestamps of each of the trials, and also returns an array of conditions corresponding to the condition labels associated with those trials:
  bookends=findBlockBoundaries([], 2.047);
  [bookends, conditions]=findBlockBoundaries([], 2.047);
  %the first parameter is a filename. If empty as in this example, a dialog box helps you find a file
  %the first parameter is a filename or a cell array of filenames.
  %the second parameter is the sampling frequency (i.e., the TR for fMRI). This cannot be left blank
%If empty as in this example, a dialog box helps you find one or more experiment run-time data .mat files.
  oddblockdata=zrun05([ ...
%If multiple files are selected, you will get a set of bookends for each of the files.
  %The second parameter is the sampling frequency (i.e., the TR for fMRI). This cannot be left blank
%Assume that your normalized detrended time series data for 1 run is found in the matrix '''''Z'''''
%You can use the bookends values to index timepoints of interest, as in this case where we wish to select
%and concatenate the time points from every other block:
  oddblockdata=Z([ ...
     bookends(1,1):bookends(1,2), ...
     bookends(1,1):bookends(1,2), ...
     bookends(3,1):bookends(3,2), ...
     bookends(3,1):bookends(3,2), ...
     bookends(5,1):bookends(5,2)], :);
     bookends(5,1):bookends(5,2)], :);
  %You can similarly use the returned '''conditions''' matrix to index the block boundaries for specific block conditions
<em>Note:</em> as of today (November 14, 2017), the findBlockBoundaries function doesn't adjust for dropped volumes. We now typically drop the first 4 volumes from our time series before any analyses are done. This means that the time points recorded in the source .mat files will be off by the number of dropped volumes. Until the findBlockBoundaries function is modified to take the number of volumes dropped as a parameter, the fix will be to subtract d from the bookends arrays using <code>cellfun()</code>:
ndropped=4;
new_bookends=cellfun(@(x) x-ndropped, bookends, 'UniformOutput', false);


== Save Your Data ==
== Save Your Data ==
Line 136: Line 112:
If you have NaN values in your data, be sure to use the 'pairwise' directive. After you have calculated your correlation matrix (especially if n-lag cross-correlations were iteratively pairwise-calculated), you may wish to append these data to your previously saved data. This will save redoing time-consuming steps should you need to quit and restart at a later time. Appending variables to an existing .mat file is accomplished by using the <code>save()</code> function with the <code>'-append'</code> parameter:
If you have NaN values in your data, be sure to use the 'pairwise' directive. After you have calculated your correlation matrix (especially if n-lag cross-correlations were iteratively pairwise-calculated), you may wish to append these data to your previously saved data. This will save redoing time-consuming steps should you need to quit and restart at a later time. Appending variables to an existing .mat file is accomplished by using the <code>save()</code> function with the <code>'-append'</code> parameter:
  save('T1_501_timeseries.mat', '-append', 'P05', 'R05', 'P06', 'R06');
  save('T1_501_timeseries.mat', '-append', 'P05', 'R05', 'P06', 'R06');
=== Using cellfun to Calculate corrcoef() on a Set of Matrices in a Cell Array ===
Many of the in-house developed MATLAB scripts operate on datasets (e.g., a series of runs), and organize the time series into cell arrays (e.g., 1 run's time series matrix stored in each cell). The above command can be nested in a call to cellfun() to compute the correlation matrices for each cell, and return a cell array of results:
[RLO,PLO]=cellfun(@(x) corrcoef(x,'rows', 'pairwise'), lo_freq, 'UniformOutput', false);
After calling this function, you will have 2 cell arrays, ''RLO'' and ''PLO'', which would contain the correlation matrices for all time series stored in the ''lo_freq'' cell array.
=== PROTIP: Quickly Replace Diagonal Values in MATLAB using eye() ===
The main diagonal of a correlation matrix, '''R''' will always contain 1s, because a vector is always perfectly correlated with itself. This nonsense correlation can be quickly replaced with some other value (e.g., NaN) by the following code:
replacement=NaN;
R(logical(eye(size(R))))=replacement;


== [[Network Analyses | Network Analysis]] ==
== [[Network Analyses | Network Analysis]] ==

Latest revision as of 16:37, 27 April 2022

If you consider two vectors containing t values representing neural activity over t time points, the degree to which the values in these vectors move together, as indexed by the correlation between these vectors, can be taken as a measure of functional connectivity. The underlying assumption is that the activity in two regions should be related if there is some sort of underlying connection between them.

This page assumes you have already obtained time series matrices using the approaches described here. If these data were obtained in FreeSurfer surface space, there will be 2 matrices (lh, rh) times 'r' runs. If subcortical activations have been obtained in MNI305 space, a third matrix will also be available. In either case, some data cleaning may be required.

These steps will be carried out in MATLAB. The first step then would be to start up MATLAB in your terminal (the ampersand lets the program run in the background so that you can continue to use your terminal window to do other things if you wish):

cd $SUBJECTS_DIR
matlab &

Now that we're in MATLAB, any code snippets can be interpreted as MATLAB code unless otherwise noted.

Load Your Time Series Data

The general instructions for importing your detrended time series data into MATLAB can be found here

Data Cleaning

Steps undertaken during earlier fMRI data conversion and/or preprocessing may influence the sorts of steps you wish to perform in order to clean up your data:

Normalize Data

The values in the activity matrices are aribtrary units: the absolute values are not important. As a result, these values are instead interpreted in terms of % of signal change over time. For example, if a region had a value of 50 at t=0 and a value of 75 at t=1, that would represent a 50% change in signal (which, incidentally, is an unrealistic change, and would probably represent some sort of artifact). In contrast, changing from a value of 1000 to 1025 represents a 2.5% signal change for the same difference in absolute signal units. To make it easier to identify outlier values and ensure all data are on the same scale, we normalize our data using the zscore function.

zrun05=zscore(run05);
zrun06=zscore(run06);

The normalized matrix will have z-scores for each time point, calculated using the mean and standard deviation for each column (region) separately. Thus, a normalized value of 1 in Region A will mean that the activation value was 1 standard deviation above the average for that region.

You may be surprised to learn that a MATLAB function, normalizeMatrix.m has been created to facilitate the normalization of our time series data. If it has been copied to your path, you can easily normalize your multidimensional matrix or cell array M (recall that M is a cell array if you used the loadFSTS.m function). The built-in MATLAB function, zscore() unfortunately doesn't work on cell arrays. The normalizeMatrix.m function was written to permit seamless normalization of the time series matrices, whether they are 3D matrices or cell arrays.

Additional functionality was added to drop rows (i.e., time points) specified in an array, thus eliminating the need to drop specific time points in a separate step as described above. You should only do this if volumes were not already dropped from the .nii files prior to importing the timeseries!. If you drop 4 rows from the .nii files, and then drop the first four rows again, the first 8 volumes will have been dropped from your time series! However even if you have already dropped volumes from the start of the .nii file, you may wish to drop volumes from the end of the time series (e.g., if the experiment ended before the run was finished).

droprows=[-2 -1 0 1 2 3]; %omit rows 1, 2, 3 and end, end-1 and end-2
Z=normalizeMatrix(M, droprows);

Note however that when rows are dropped in this way, the same time points will be dropped from all time series matrices. In most cases, this will be sensible (e.g., to drop the first few volumes with unstable magnetic fields). However if you wish to drop different sets of time points for different sessions then it is advisable to load and normalize each session individually, specifying a different droprows matrix for each. The individual cell arrays can be concatenated afterwards.

Dropping Columns

I'll document this here because I've not yet mastered cellfun, but managed to use it to prune a cell array of matrices. Scenario: dtM is a 6 × 1 cell array of n × 1000 detrended values. I wanted to keep only the 16 columns, indexed in the vector goodsnr with the best signal-to-noise ratio (which had already been calculated):

N=cellfun(@(x) x(:, goodsnr), dtM, 'UniformOutput', false);

Voila!

Censor Data (Optional Step)

It may be desirable to exclude extreme positive and negative spikes from the time series for each region, as they may reflect some sort of artifact. Additionally, there may be some other source of information that might be used to identify potentially problematic measurements (e.g., motion correction data, or a list of volumes identified in SPM when using the ArtRepair toolbox).

A common practice when looking at behavioral data (e.g., reaction times) is to use the deviation from the mean as a threshold for identifying extreme values. These values can be either replaced with the cutoff value or else scrubbed from the data entirely. The following code will do either, depending on what is designated as the replacement value:

replacement=NaN; %could instead have used replacement==thresh
thresh=2.5; %could use a more-or-less stringent threshold
Zprime=Z
for i=1:length(Zprime)
  temp=Zprime{i}
  temp(temp>threshold)=replacement;
  temp(temp<-threshold)=-1*replacement;
  Zprime{i}=temp;
end

If you had an outside indicator of data quality, you could similarly use that to identify data to be removed. For example, a list of volumes with excessive motion (as determined during functional data realignment, or the ArtRepair toolbox) could be read in:

bad_vols=load('art_censored.txt')
bad_vols = 
     1
     2
     6
     48
     49

Note that the above hypothetical example would assume that all the volumes were present in the data. If the initial n volumes have already been dropped from the data, then you would need to remove from consideration the first n volumes and then subtract n from the remaining bad_vols. This is because after dropping the first 4 volumes, what was originally the 5th time point is now actually the first time point in the remaining time series.

Know What's What

This is probably a good time to make sure that you know the identities of each column of activations represented in your matrices. If you have not already done so, use ParseFSSegments.m on the .sum.txt files corresponding to your time series data.

lh_region_info=parseFSSegments('FS_T1_501_lh_lausanne_005.fmcpr.sm6.self.sum.txt');
rh_region_info=parseFSSegments('FS_T1_501_rh_lausanne_005.fmcpr.sm6.self.sum.txt');

If you have dropped regions from your time series data, ensure that you have also dropped the corresponding region from the segment information generated by parseFSSegments.m:

%In my examples, I have dropped regions 1 and 5 in both hemispheres
%These regions correspond to 'unknown' and 'corpuscallosum' in the myaparc_nnn .annot files 
lh_names=lh_region_info([2 3 4 6:end]);
rh_names=lh_region_info([2 3 4 6:end]);

Likewise, if you have merged left and right time series matrices together into a single matrix, do likewise with your region information.

Now, if you are particularly astute, or have already tried this, you will know that there is a problem if you just merge the region names together. This is because the same region names are used in both hemispheres (e.g., you might find there is a 'precuneus_1' in both the lh and rh region lists). There's a good chance that you'll want or need to differentiate between them at some point. Fortunately, MATLAB has a clever function called strcat that operates over cell arrays, which is conveniently how we have stored our region names:

lh_names=strcat('lh_', 'lh_names');
rh_names=strcat('rh_', 'rh_names');
%now all the names listed in lh_names are prefixed with 'lh_', and likewise for the rh_names contents

Now we are all set to use the vertcat MATLAB function to vertically concatenate the two lists of region names (this is a vertical concatenation because the list of region names are in a n×1 cell array -- i.e., like a vertical stack of names. If they were in a 1×n cell array, it would be a horizontal concatenation using horzcat):

all_names=vertcat(lh_names, rh_names); %lh regions are listed before rh regions in this example

Partition Time Series

Time series matrices from separate runs are necessarily partitioned, by virtue of existing in separate matrices (because the data were loaded separately). One option is to concatenate the matrices from all runs together, to make a sort of super-run (though consider Normalizing the data first, because there is no guarantee that the signal units will be consistent across runs within the same session, let alone across multiple sessions!). Conversely, windows of time points within a single run may be analyzed separately. For example, in a blocked experiment, the time series corresponding to each block might be analyzed separately. This would allow for the detection of changes in network structure as a function of experimental condition.

As an example of how to do this, suppose our experiment had alternating blocks for conditions A and B, each 36 seconds in duration, and with 16 seconds of rest between them, and with a TR of 2 seconds. In this case, The first 18 time points correspond to A, followed by 8 time points of rest, followed by 18 time points of B, followed by 8 time points of rest, and so on. We can recombine these measurements into separate matrices for A and B events (and even rest events):

Aseries=zrun05([1:18,53:70], :);
Bseries=zrun05([27:44,79:96],:);

You are going to want to be really sure about what rows correspond to which points in time when you do this. You will find it is really easy to mess this up and accidentally get one fewer or one more row than you had intended. It might be a good idea to use another program like Excel to help you work out the correct row indices for the conditions you're interested in grouping together. A MATLAB function has been written, findBlockBoundaries.m, that can be found in the ubfs Scripts/Matlab folder. If you have some saved experiment run-time data generated by PsychToolbox, you will probably find this function to be helpful. This function identifies the volumes that form the boundaries of each block based on the timestamps of each of the trials, and also returns an array of conditions corresponding to the condition labels associated with those trials:

[bookends, conditions]=findBlockBoundaries([], 2.047);
%the first parameter is a filename or a cell array of filenames.
%If empty as in this example, a dialog box helps you find one or more experiment run-time data .mat files. 
%If multiple files are selected, you will get a set of bookends for each of the files.
%The second parameter is the sampling frequency (i.e., the TR for fMRI). This cannot be left blank

%Assume that your normalized detrended time series data for 1 run is found in the matrix Z
%You can use the bookends values to index timepoints of interest, as in this case where we wish to select
%and concatenate the time points from every other block:
oddblockdata=Z([ ...
    bookends(1,1):bookends(1,2), ...
    bookends(3,1):bookends(3,2), ...
    bookends(5,1):bookends(5,2)], :);
 %You can similarly use the returned conditions matrix to index the block boundaries for specific block conditions

Note: as of today (November 14, 2017), the findBlockBoundaries function doesn't adjust for dropped volumes. We now typically drop the first 4 volumes from our time series before any analyses are done. This means that the time points recorded in the source .mat files will be off by the number of dropped volumes. Until the findBlockBoundaries function is modified to take the number of volumes dropped as a parameter, the fix will be to subtract d from the bookends arrays using cellfun():

ndropped=4;
new_bookends=cellfun(@(x) x-ndropped, bookends, 'UniformOutput', false);

Save Your Data

Whatever steps you have chosen to clean up or partition your data, it's probably wise to use MATLAB's save() function to save your scrubbed data (and possibly data along different stages of scrubbing) to a .mat file for future use.

%For more information on how to use the save() function, type: help save
save('T1_501_timeseries.mat', 'zrun05', 'zrun06');

The default format is MATLAB's native .mat file format.

Calculate Correlation Matrix

The MATLAB corrcoef function will calculate an N×N correlation matrix, where N=number of brain regions in the parcellation. As mentioned previously, the Pearson correlation is equivalent to the zero-lag cross-correlation between the time series.

[R05,P05]=corrcoef(zrun05, 'rows', 'pairwise');

If you have NaN values in your data, be sure to use the 'pairwise' directive. After you have calculated your correlation matrix (especially if n-lag cross-correlations were iteratively pairwise-calculated), you may wish to append these data to your previously saved data. This will save redoing time-consuming steps should you need to quit and restart at a later time. Appending variables to an existing .mat file is accomplished by using the save() function with the '-append' parameter:

save('T1_501_timeseries.mat', '-append', 'P05', 'R05', 'P06', 'R06');

Using cellfun to Calculate corrcoef() on a Set of Matrices in a Cell Array

Many of the in-house developed MATLAB scripts operate on datasets (e.g., a series of runs), and organize the time series into cell arrays (e.g., 1 run's time series matrix stored in each cell). The above command can be nested in a call to cellfun() to compute the correlation matrices for each cell, and return a cell array of results:

[RLO,PLO]=cellfun(@(x) corrcoef(x,'rows', 'pairwise'), lo_freq, 'UniformOutput', false);

After calling this function, you will have 2 cell arrays, RLO and PLO, which would contain the correlation matrices for all time series stored in the lo_freq cell array.

PROTIP: Quickly Replace Diagonal Values in MATLAB using eye()

The main diagonal of a correlation matrix, R will always contain 1s, because a vector is always perfectly correlated with itself. This nonsense correlation can be quickly replaced with some other value (e.g., NaN) by the following code:

replacement=NaN;
R(logical(eye(size(R))))=replacement;

Network Analysis

The correlation matrices for each time series serve as a measure of functional connectivity. The correlation entry(i,j) indexes the functional connectivity between regions i and j. The entire matrix, if binarized, is an adjacency matrix among the nodes in the brain network. At this point, various network analyses may be performed on the correlation matrix to generate useful network metrics. Additionally, clustering algorithms may be applied to detect communities within these networks.