Functional Connectivity (Cross-Correlation Method): Difference between revisions

From CCN Wiki
Jump to navigation Jump to search
Line 29: Line 29:
  M=loadFSTS(); %A dialog box will prompt you to select the files you want to open
  M=loadFSTS(); %A dialog box will prompt you to select the files you want to open
Of course, you can and should read the help documentation to find out the full capabilities of the function, which I assure you is very clever. But the above functionality will probably get you through 99.99% of the time. Rather than load each time series for each hemisphere into a separate array, loadFSTS will return a single cell array, where each cell contains the merged lh & rh data for each session. Thus, if you open up the lh/rh time series values for 6 fMRI runs, '''''M''''' will be a 6×1 cell array, where '''''M'''''{i} is a ''timepoints'' × ''nregions'' matrix (''nregions'' is the sum of the regions retained for lh and rh data sets)
Of course, you can and should read the help documentation to find out the full capabilities of the function, which I assure you is very clever. But the above functionality will probably get you through 99.99% of the time. Rather than load each time series for each hemisphere into a separate array, loadFSTS will return a single cell array, where each cell contains the merged lh & rh data for each session. Thus, if you open up the lh/rh time series values for 6 fMRI runs, '''''M''''' will be a 6×1 cell array, where '''''M'''''{i} is a ''timepoints'' × ''nregions'' matrix (''nregions'' is the sum of the regions retained for lh and rh data sets)
If you load in ''r'' runs, M will be a cell array containing ''r'' cells, and each cell will contain a t timepoints × s segments matrix of values. If each matrix is the same size, you can convert the cell array into a 3D matrix, which you might find facilitates a number of tasks:
new_M=cat(3,M{1}, M{2}, ... , M{r}); %concatenates each 2D matrix stored in M along a 3rd dimension
%new_M is now a 3D matrix of size t × nregions × r


== Data Cleaning ==
== Data Cleaning ==

Revision as of 15:19, 22 June 2016

If you consider two vectors containing t values representing neural activity over t time points, the degree to which the values in these vectors move together, as indexed by the correlation between these vectors, can be taken as a measure of functional connectivity. The underlying assumption is that the activity in two regions should be related if there is some sort of underlying connection between them.

This page assumes you have already obtained time series matrices using the approaches described here. If these data were obtained in FreeSurfer surface space, there will be 2 matrices (lh, rh) times 'r' runs. As I have not yet worked out the process for voxel-space (i.e., using SPM), details are sketchy, but it seems probable that there would only be a single matrix of values per run. In either case, some data cleaning may be required.

These steps will be carried out in MATLAB. The first step then would be to start up MATLAB in your terminal (the ampersand lets the program run in the background so that you can continue to use your terminal window to do other things if you wish):

cd $SUBJECTS_DIR
matlab &

Now that we're in MATLAB, any code snippets can be interpreted as MATLAB code unless otherwise noted.

Load Your TIme Series Data

The next step is to load your data matrices. Assuming the time series are plaintext files containing only structured numerical data (e.g., as produced by the gettimecourses.sh script), you can use the load function to import these data.

Manual loading

cd (['FS_T1_501' filesep 'bold']');
ls('*.wav.txt');

The first of the above two commands will get you into the bold/ directory for subject FS_T1_501, which is where you would expect to find the time series data. The second command will display the names of all files ending in .wav.txt, which is the convention used by the gettimecourses.sh script. Now that you have identified the relevant files, you can import them individually, assigning them meaningful variable names:

run05_left=load('FS_T1_501_lh_lausanne_005.fmcpr.sm6.self.wav.txt');
run05_right=load('FS_T1_501_rh_lausanne_005.fmcpr.sm6.self.wav.txt');
run06_left=load('FS_T1_501_lh_lausanne_006.fmcpr.sm6.self.wav.txt');
run06_right=load('FS_T1_501_rh_lausanne_006.fmcpr.sm6.self.wav.txt');

If you are using the Lausanne 2008 100-region parcellation, you should find that the lh matrices have 500 columns (regions) and the rh matrices have 502.

At this point, it makes sense to merge the left and right hemisphere data into a single matrix, but continue to keep the data for each run separate. If each row represents a time point and each column represents a region, the matrices for a given run will be time x nregions in size. We want to concatenate the matrices together along the column dimension so that the resulting matrix is time x (nregions + nregions) in size (and not (time+time) x nregions!). Horizontal concatenation (i.e., along the columnar dimension, which is what you want to do here) is easily done in MATLAB:

run_05 = [run05_left run05_right];
run_06 = [run06_left run06_right];

Functionally-assisted loading

The above steps have since been made simpler with the inevitable development of a MATLAB function, loadFSTS.m that can be found in the ubfs Scripts/Matlab folder. If you have copied it to a folder in your MATLAB path, it can be invoked thus:

M=loadFSTS(); %A dialog box will prompt you to select the files you want to open

Of course, you can and should read the help documentation to find out the full capabilities of the function, which I assure you is very clever. But the above functionality will probably get you through 99.99% of the time. Rather than load each time series for each hemisphere into a separate array, loadFSTS will return a single cell array, where each cell contains the merged lh & rh data for each session. Thus, if you open up the lh/rh time series values for 6 fMRI runs, M will be a 6×1 cell array, where M{i} is a timepoints × nregions matrix (nregions is the sum of the regions retained for lh and rh data sets)

If you load in r runs, M will be a cell array containing r cells, and each cell will contain a t timepoints × s segments matrix of values. If each matrix is the same size, you can convert the cell array into a 3D matrix, which you might find facilitates a number of tasks:

new_M=cat(3,M{1}, M{2}, ... , M{r}); %concatenates each 2D matrix stored in M along a 3rd dimension
%new_M is now a 3D matrix of size t × nregions × r

Data Cleaning

Steps undertaken during earlier fMRI data conversion and/or preprocessing may influence the sorts of steps you wish to perform in order to clean up your data:

Drop Initial Volumes

When fMRI data are acquired, there is a short period at the beginning of each run, perhaps 6-12 seconds, during which the magnetic field fluctuates as the gradients are turned on. Measurements taken during this time are likely to contain artifacts of this instability. You can see this by plotting the values from a single region over time:

plot(run_05(:,1));

The above command plots the value from all rows (the ':' entry means 'everything') in the first column (the '1' entry for the column), and would show the activation of region 1 across the entire run.

If we know the TR, in seconds, we can figure out the number of volumes to drop from our matrices in order to eliminate these bad measurements. For example, if our TR is 2 seconds and we want to drop the first 12 seconds of data, we are dropping the first 6 rows of each matrix:

run_05 = run_05(7:end,:);
run_06 = run_06(7:end,:);

The above commands are replacing the run_05 (and _06) matrices with what's left over when you retain rows 7-whatever the last row is. Note that the ':' in the columns selector is saying that you want to retain all the columns. After running these commands, you should find your matrices are 6 elements smaller in the rows dimension but contain the same number of columns.

Note: You should ensure that all team members are using the same data cleaning parameters. For the Booth data, it is sensible to drop the first 6 volumes, as the first trial begins at t=12 seconds (with TR=2 seconds)

Drop Regions

The same principle can be applied in the columns dimension if there are regions you wish to omit. For example, if you used the sets of scripts described elsewhere in this wiki, the first region in the 1000-region Lausanne annotation is the corpus callosum (a white matter tract) which would be reasonable to exclude from a connectivity analysis. Note that if you drop regions from your time series, you will need to keep that in mind when labeling the regions in your connectivity matrices. If you plan on dropping regions, it is easiest to do so prior to merging hemispheres. The code snippet below drops the first column (by retaining only columns 2-whatever the last column is) from each of the matrices for run 5. The command for run 6 would be very similar:

run05_left=run05_left(:,2:end);
run05_right=run05_right(:,2:end);

Another example, this time dropping regions 1 and 5 (corresponding to the unknown and corpuscallosum regions in ?h.aparc_60.annot):

run06_left=run06_left(:,[2 3 4 6:end]);
run06_right=run06_right(:,[2 3 4 6:end]);

If you are unsure if the identity of a region, check the corresponding .sum files that were generated by the gettimecourses.sh script.

Note also that if you use the loadFSTS() function, you can specify the regions to be dropped at the time of loading. If you do so, the specified columns will have already been dropped from the time series data, and you will not need to do this as a separate step.

Note: If any regions are being dropped, you should ensure that this is consistent across team members for the same analysis. Reasonable regions to drop include activation values from the 'unknown' region, and from the 'corpuscallosum' region.

Normalize Data

The values in the activity matrices are aribtrary units: the absolute values are not important. As a result, these values are instead interpreted in terms of % of signal change over time. For example, if a region had a value of 50 at t=0 and a value of 75 at t=1, that would represent a 50% change in signal (which, incidentally, is an unrealistic change, and would probably represent some sort of artifact). In contrast, changing from a value of 1000 to 1025 represents a 2.5% signal change for the same difference in absolute signal units. To make it easier to identify outlier values and ensure all data are on the same scale, we normalize our data using the zscore function.

zrun05=zscore(run05);
zrun06=zscore(run06);

The normalized matrix will have z-scores for each time point, calculated using the mean and standard deviation for each column (region) separately. Thus, a normalized value of 1 in Region A will mean that the activation value was 1 standard deviation above the average for that region.

You may be surprised to learn that a MATLAB function, normalizeMatrix.m has been created to facilitate the normalization of our time series data. If it has been copied to your path, you can easily normalize your multidimensional matrix or cell array M (recall that M is a cell array if you used the loadFSTS.m function). The built-in MATLAB function, zscore() unfortunately doesn't work on cell arrays. The normalizeMatrix.m function was written to permit seamless normalization of the time series matrices, whether they are 3D matrices or cell arrays. Additional functionality was added to drop rows (i.e., time points) specified in an array, thus eliminating the need to drop specific time points in a separate step as described above.

droprows=[-2 -1 0 1 2 3]; %omit rows 1, 2, 3 and end, end-1 and end-2
Z=normalize(M, droprows);

Note however that when rows are dropped in this way, the same time points will be dropped from all time series matrices. In most cases, this will be sensible (e.g., to drop the first few volumes with unstable magnetic fields). However if you wish to drop different sets of time points for different sessions then it is advisable to load and normalize each session individually, specifying a different droprows matrix for each. The individual cell arrays can be concatenated afterwards.

Censor Data

It may be desirable to exclude extreme positive and negative spikes from the time series for each region, as they may reflect some sort of artifact. Additionally, there may be some other source of information that might be used to identify potentially problematic measurements (e.g., motion correction data, or a list of volumes identified in SPM when using the ArtRepair toolbox).

A common practice when looking at behavioral data (e.g., reaction times) is to use the deviation from the mean as a threshold for identifying extreme values. These values can be either replaced with the cutoff value or else scrubbed from the data entirely. The following code will do either, depending on what is designated as the replacement value:

replacement=NaN; %could instead have used replacement==thresh
thresh=2.5; %could use a more-or-less stringent threshold
zrun05(zrun05>thresh)=replacement;
zrun05(zrun05<-thresh)=-replacement;

If you had an outside indicator of data quality, you could similarly use that to identify data to be removed. For example, a list of volumes with excessive motion (as determined during functional data realignment, or the ArtRepair toolbox) could be read in:

bad_vols=load('art_censored.txt')
bad_vols = 
     1
     2
     6
     48
     49

Note that the above hypothetical example would assume that all the volumes were present in the data. If the initial n volumes have already been dropped from the data, then you would need to remove from consideration the first n volumes and then subtract n from the remaining bad_vols. This is because after dropping the first 6 volumes, what was originally the 7th time point is now actually the first time point in the remaining time series:

n_dropped=6;
bad_vols=bad_vols(bad_vols>n_dropped);
bad_vols=bad_vols-n_dropped; 
zrun05(bad_vols,:)=NaN; %sets the rows for time points indicated in bad_vols to NaN for all columns (':')

Know What's What

This is probably a good time to make sure that you know the identities of each column of activations represented in your matrices. If you have not already done so, use ParseFSSegments.m on the .sum.txt files corresponding to your time series data.

lh_region_info=parseFSSegments('FS_T1_501_lh_lausanne_005.fmcpr.sm6.self.sum.txt');
rh_region_info=parseFSSegments('FS_T1_501_rh_lausanne_005.fmcpr.sm6.self.sum.txt');
lh_names=lh_region_info.segname;
rh_names=rh_region_info.segname;

If you have dropped regions from your time series data, ensure that you have also dropped the corresponding region from the segment information generated by parseFSSegments.m:

%In my examples, I have dropped regions 1 and 5 in both hemispheres
%These regions correspond to 'unknown' and 'corpuscallosum' in the myaparc_nnn .annot files 
lh_names=lh_names([2 3 4 6:end]);
rh_names=rh_names([2 3 4 6:end]);

Likewise, if you have merged left and right time series matrices together into a single matrix, do likewise with your region information.

Now, if you are particularly astute, or have already tried this, you will know that there is a problem if you just merge the region names together. This is because the same region names are used in both hemispheres (e.g., you might find there is a 'precuneus_1' in both the lh and rh region lists). There's a good chance that you'll want or need to differentiate between them at some point. Fortunately, MATLAB has a clever function called strcat that operates over cell arrays, which is conveniently how we have stored our region names:

lh_names=strcat('lh_', lh_names);
rh_names=strcat('rh_', rh_names);
%now all the names listed in lh_names are prefixed with 'lh_', and likewise for the rh_names contents

Now we are all set to use the vertcat MATLAB function to vertically concatenate the two lists of region names (this is a vertical concatenation because the list of region names are in a n×1 cell array -- i.e., like a vertical stack of names. If they were in a 1×n cell array, it would be a horizontal concatenation using horzcat):

all_names=vertcat(lh_names, rh_names); %lh regions are listed before rh regions in this example

Partition Time Series

Time series matrices from separate runs are necessarily partitioned, by virtue of existing in separate matrices (because the data were loaded separately). One option is to concatenate the matrices from all runs together, to make a sort of super-run (though consider Normalizing the data first, because there is no guarantee that the signal units will be consistent across runs within the same session, let alone across multiple sessions!). Conversely, windows of time points within a single run may be analyzed separately. For example, in a blocked experiment, the time series corresponding to each block might be analyzed separately. This would allow for the detection of changes in network structure as a function of experimental condition.

As an example of how to do this, suppose our experiment had alternating blocks for conditions A and B, each 36 seconds in duration, and with 16 seconds of rest between them, and with a TR of 2 seconds. In this case, The first 18 time points correspond to A, followed by 8 time points of rest, followed by 18 time points of B, followed by 8 time points of rest, and so on. We can recombine these measurements into separate matrices for A and B events (and even rest events):

Aseries=zrun05([1:18,53:70], :);
Bseries=zrun05([27:44,79:96],:);

You are going to want to be really sure about what rows correspond to which points in time when you do this. You will find it is really easy to mess this up and accidentally get one fewer or one more row than you had intended. It might be a good idea to use another program like Excel to help you work out the correct row indices for the conditions you're interested in grouping together. A MATLAB function has been written, findBlockBoundaries.m, that can be found in the ubfs Scripts/Matlab folder. If you have some saved experiment run-time data generated by PsychToolbox, you will probably find this function to be helpful. This function identifies the volumes that form the boundaries of each block based on the timestamps of each of the trials:

bookends=findBlockBoundaries([], 2.047);
%the first parameter is a filename. If empty as in this example, a dialog box helps you find a file
%the second parameter is the sampling frequency (i.e., the TR for fMRI). This cannot be left blank
oddblockdata=zrun05([ ...
    bookends(1,1):bookends(1,2), ...
    bookends(3,1):bookends(3,2), ...
    bookends(5,1):bookends(5,2)], :);

Save Your Data

Whatever steps you have chosen to clean up or partition your data, it's probably wise to use MATLAB's save() function to save your scrubbed data (and possibly data along different stages of scrubbing) to a .mat file for future use.

%For more information on how to use the save() function, type: help save
save('T1_501_timeseries.mat', 'zrun05', 'zrun06');

The default format is MATLAB's native .mat file format.

Calculate Correlation Matrix

The MATLAB corrcoef function will calculate an N×N correlation matrix, where N=number of brain regions in the parcellation. As mentioned previously, the Pearson correlation is equivalent to the zero-lag cross-correlation between the time series.

[R05,P05]=corrcoef(zrun05, 'rows', 'pairwise');

If you have NaN values in your data, be sure to use the 'pairwise' directive. After you have calculated your correlation matrix (especially if n-lag cross-correlations were iteratively pairwise-calculated), you may wish to append these data to your previously saved data. This will save redoing time-consuming steps should you need to quit and restart at a later time. Appending variables to an existing .mat file is accomplished by using the save() function with the '-append' parameter:

save('T1_501_timeseries.mat', '-append', 'P05', 'R05', 'P06', 'R06');

Network Analysis

The correlation matrices for each time series serve as a measure of functional connectivity. The correlation entry(i,j) indexes the functional connectivity between regions i and j. The entire matrix, if binarized, is an adjacency matrix among the nodes in the brain network. At this point, various network analyses may be performed on the correlation matrix to generate useful network metrics. Additionally, clustering algorithms may be applied to detect communities within these networks.