Functional Connectivity (Cross-Correlation Method)

From CCN Wiki
Jump to navigation Jump to search

If you consider two vectors containing t values representing neural activity over t time points, the degree to which the values in these vectors move together, as indexed by the correlation between these vectors, can be taken as a measure of functional connectivity. The underlying assumption is that the activity in two regions should be related if there is some sort of underlying connection between them.

This page assumes you have already obtained time series matrices using the approaches described here. If these data were obtained in FreeSurfer surface space, there will be 2 matrices (lh, rh) times 'r' runs. If subcortical activations have been obtained in MNI305 space, a third matrix will also be available. In either case, some data cleaning may be required.

These steps will be carried out in MATLAB. The first step then would be to start up MATLAB in your terminal (the ampersand lets the program run in the background so that you can continue to use your terminal window to do other things if you wish):

cd $SUBJECTS_DIR
matlab &

Now that we're in MATLAB, any code snippets can be interpreted as MATLAB code unless otherwise noted.

Load Your Time Series Data

The general instructions for importing your detrended time series data into MATLAB can be found here

Data Cleaning

Steps undertaken during earlier fMRI data conversion and/or preprocessing may influence the sorts of steps you wish to perform in order to clean up your data:

Normalize Data

The values in the activity matrices are aribtrary units: the absolute values are not important. As a result, these values are instead interpreted in terms of % of signal change over time. For example, if a region had a value of 50 at t=0 and a value of 75 at t=1, that would represent a 50% change in signal (which, incidentally, is an unrealistic change, and would probably represent some sort of artifact). In contrast, changing from a value of 1000 to 1025 represents a 2.5% signal change for the same difference in absolute signal units. To make it easier to identify outlier values and ensure all data are on the same scale, we normalize our data using the zscore function.

zrun05=zscore(run05);
zrun06=zscore(run06);

The normalized matrix will have z-scores for each time point, calculated using the mean and standard deviation for each column (region) separately. Thus, a normalized value of 1 in Region A will mean that the activation value was 1 standard deviation above the average for that region.

You may be surprised to learn that a MATLAB function, normalizeMatrix.m has been created to facilitate the normalization of our time series data. If it has been copied to your path, you can easily normalize your multidimensional matrix or cell array M (recall that M is a cell array if you used the loadFSTS.m function). The built-in MATLAB function, zscore() unfortunately doesn't work on cell arrays. The normalizeMatrix.m function was written to permit seamless normalization of the time series matrices, whether they are 3D matrices or cell arrays.

Additional functionality was added to drop rows (i.e., time points) specified in an array, thus eliminating the need to drop specific time points in a separate step as described above. You should only do this if volumes were not already dropped from the .nii files prior to importing the timeseries!. If you drop 4 rows from the .nii files, and then drop the first four rows again, the first 8 volumes will have been dropped from your time series! However even if you have already dropped volumes from the start of the .nii file, you may wish to drop volumes from the end of the time series (e.g., if the experiment ended before the run was finished).

droprows=[-2 -1 0 1 2 3]; %omit rows 1, 2, 3 and end, end-1 and end-2
Z=normalizeMatrix(M, droprows);

Note however that when rows are dropped in this way, the same time points will be dropped from all time series matrices. In most cases, this will be sensible (e.g., to drop the first few volumes with unstable magnetic fields). However if you wish to drop different sets of time points for different sessions then it is advisable to load and normalize each session individually, specifying a different droprows matrix for each. The individual cell arrays can be concatenated afterwards.

Dropping Columns

I'll document this here because I've not yet mastered cellfun, but managed to use it to prune a cell array of matrices. Scenario: dtM is a 6 × 1 cell array of n × 1000 detrended values. I wanted to keep only the 16 columns, indexed in the vector goodsnr with the best signal-to-noise ratio (which had already been calculated):

N=cellfun(@(x) x(:, goodsnr), dtM, 'UniformOutput', false);

Voila!

Censor Data (Optional Step)

It may be desirable to exclude extreme positive and negative spikes from the time series for each region, as they may reflect some sort of artifact. Additionally, there may be some other source of information that might be used to identify potentially problematic measurements (e.g., motion correction data, or a list of volumes identified in SPM when using the ArtRepair toolbox).

A common practice when looking at behavioral data (e.g., reaction times) is to use the deviation from the mean as a threshold for identifying extreme values. These values can be either replaced with the cutoff value or else scrubbed from the data entirely. The following code will do either, depending on what is designated as the replacement value:

replacement=NaN; %could instead have used replacement==thresh
thresh=2.5; %could use a more-or-less stringent threshold
Zprime=Z
for i=1:length(Zprime)
  temp=Zprime{i}
  temp(temp>threshold)=replacement;
  temp(temp<-threshold)=-1*replacement;
  Zprime{i}=temp;
end

If you had an outside indicator of data quality, you could similarly use that to identify data to be removed. For example, a list of volumes with excessive motion (as determined during functional data realignment, or the ArtRepair toolbox) could be read in:

bad_vols=load('art_censored.txt')
bad_vols = 
     1
     2
     6
     48
     49

Note that the above hypothetical example would assume that all the volumes were present in the data. If the initial n volumes have already been dropped from the data, then you would need to remove from consideration the first n volumes and then subtract n from the remaining bad_vols. This is because after dropping the first 4 volumes, what was originally the 5th time point is now actually the first time point in the remaining time series.

Know What's What

This is probably a good time to make sure that you know the identities of each column of activations represented in your matrices. If you have not already done so, use ParseFSSegments.m on the .sum.txt files corresponding to your time series data.

lh_region_info=parseFSSegments('FS_T1_501_lh_lausanne_005.fmcpr.sm6.self.sum.txt');
rh_region_info=parseFSSegments('FS_T1_501_rh_lausanne_005.fmcpr.sm6.self.sum.txt');

If you have dropped regions from your time series data, ensure that you have also dropped the corresponding region from the segment information generated by parseFSSegments.m:

%In my examples, I have dropped regions 1 and 5 in both hemispheres
%These regions correspond to 'unknown' and 'corpuscallosum' in the myaparc_nnn .annot files 
lh_names=lh_region_info([2 3 4 6:end]);
rh_names=lh_region_info([2 3 4 6:end]);

Likewise, if you have merged left and right time series matrices together into a single matrix, do likewise with your region information.

Now, if you are particularly astute, or have already tried this, you will know that there is a problem if you just merge the region names together. This is because the same region names are used in both hemispheres (e.g., you might find there is a 'precuneus_1' in both the lh and rh region lists). There's a good chance that you'll want or need to differentiate between them at some point. Fortunately, MATLAB has a clever function called strcat that operates over cell arrays, which is conveniently how we have stored our region names:

lh_names=strcat('lh_', 'lh_names');
rh_names=strcat('rh_', 'rh_names');
%now all the names listed in lh_names are prefixed with 'lh_', and likewise for the rh_names contents

Now we are all set to use the vertcat MATLAB function to vertically concatenate the two lists of region names (this is a vertical concatenation because the list of region names are in a n×1 cell array -- i.e., like a vertical stack of names. If they were in a 1×n cell array, it would be a horizontal concatenation using horzcat):

all_names=vertcat(lh_names, rh_names); %lh regions are listed before rh regions in this example

Partition Time Series

Time series matrices from separate runs are necessarily partitioned, by virtue of existing in separate matrices (because the data were loaded separately). One option is to concatenate the matrices from all runs together, to make a sort of super-run (though consider Normalizing the data first, because there is no guarantee that the signal units will be consistent across runs within the same session, let alone across multiple sessions!). Conversely, windows of time points within a single run may be analyzed separately. For example, in a blocked experiment, the time series corresponding to each block might be analyzed separately. This would allow for the detection of changes in network structure as a function of experimental condition.

As an example of how to do this, suppose our experiment had alternating blocks for conditions A and B, each 36 seconds in duration, and with 16 seconds of rest between them, and with a TR of 2 seconds. In this case, The first 18 time points correspond to A, followed by 8 time points of rest, followed by 18 time points of B, followed by 8 time points of rest, and so on. We can recombine these measurements into separate matrices for A and B events (and even rest events):

Aseries=zrun05([1:18,53:70], :);
Bseries=zrun05([27:44,79:96],:);

You are going to want to be really sure about what rows correspond to which points in time when you do this. You will find it is really easy to mess this up and accidentally get one fewer or one more row than you had intended. It might be a good idea to use another program like Excel to help you work out the correct row indices for the conditions you're interested in grouping together. A MATLAB function has been written, findBlockBoundaries.m, that can be found in the ubfs Scripts/Matlab folder. If you have some saved experiment run-time data generated by PsychToolbox, you will probably find this function to be helpful. This function identifies the volumes that form the boundaries of each block based on the timestamps of each of the trials, and also returns an array of conditions corresponding to the condition labels associated with those trials:

[bookends, conditions]=findBlockBoundaries([], 2.047);
%the first parameter is a filename or a cell array of filenames.
%If empty as in this example, a dialog box helps you find one or more experiment run-time data .mat files. 
%If multiple files are selected, you will get a set of bookends for each of the files.
%The second parameter is the sampling frequency (i.e., the TR for fMRI). This cannot be left blank

%Assume that your normalized detrended time series data for 1 run is found in the matrix Z
%You can use the bookends values to index timepoints of interest, as in this case where we wish to select
%and concatenate the time points from every other block:
oddblockdata=Z([ ...
    bookends(1,1):bookends(1,2), ...
    bookends(3,1):bookends(3,2), ...
    bookends(5,1):bookends(5,2)], :);
 %You can similarly use the returned conditions matrix to index the block boundaries for specific block conditions

Note: as of today (November 14, 2017), the findBlockBoundaries function doesn't adjust for dropped volumes. We now typically drop the first 4 volumes from our time series before any analyses are done. This means that the time points recorded in the source .mat files will be off by the number of dropped volumes. Until the findBlockBoundaries function is modified to take the number of volumes dropped as a parameter, the fix will be to subtract d from the bookends arrays using cellfun():

ndropped=4;
new_bookends=cellfun(@(x) x-ndropped, bookends, 'UniformOutput', false);

Save Your Data

Whatever steps you have chosen to clean up or partition your data, it's probably wise to use MATLAB's save() function to save your scrubbed data (and possibly data along different stages of scrubbing) to a .mat file for future use.

%For more information on how to use the save() function, type: help save
save('T1_501_timeseries.mat', 'zrun05', 'zrun06');

The default format is MATLAB's native .mat file format.

Calculate Correlation Matrix

The MATLAB corrcoef function will calculate an N×N correlation matrix, where N=number of brain regions in the parcellation. As mentioned previously, the Pearson correlation is equivalent to the zero-lag cross-correlation between the time series.

[R05,P05]=corrcoef(zrun05, 'rows', 'pairwise');

If you have NaN values in your data, be sure to use the 'pairwise' directive. After you have calculated your correlation matrix (especially if n-lag cross-correlations were iteratively pairwise-calculated), you may wish to append these data to your previously saved data. This will save redoing time-consuming steps should you need to quit and restart at a later time. Appending variables to an existing .mat file is accomplished by using the save() function with the '-append' parameter:

save('T1_501_timeseries.mat', '-append', 'P05', 'R05', 'P06', 'R06');

Using cellfun to Calculate corrcoef() on a Set of Matrices in a Cell Array

Many of the in-house developed MATLAB scripts operate on datasets (e.g., a series of runs), and organize the time series into cell arrays (e.g., 1 run's time series matrix stored in each cell). The above command can be nested in a call to cellfun() to compute the correlation matrices for each cell, and return a cell array of results:

[RLO,PLO]=cellfun(@(x) corrcoef(x,'rows', 'pairwise'), lo_freq, 'UniformOutput', false);

After calling this function, you will have 2 cell arrays, RLO and PLO, which would contain the correlation matrices for all time series stored in the lo_freq cell array.

PROTIP: Quickly Replace Diagonal Values in MATLAB using eye()

The main diagonal of a correlation matrix, R will always contain 1s, because a vector is always perfectly correlated with itself. This nonsense correlation can be quickly replaced with some other value (e.g., NaN) by the following code:

replacement=NaN;
R(logical(eye(size(R))))=replacement;

Network Analysis

The correlation matrices for each time series serve as a measure of functional connectivity. The correlation entry(i,j) indexes the functional connectivity between regions i and j. The entire matrix, if binarized, is an adjacency matrix among the nodes in the brain network. At this point, various network analyses may be performed on the correlation matrix to generate useful network metrics. Additionally, clustering algorithms may be applied to detect communities within these networks.