Network Analyses: Difference between revisions

From CCN Wiki
Jump to navigation Jump to search
No edit summary
 
(22 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Network analyses use graph-theoretic approaches to identifying/quantifying/describing/classifying networks of any sort. For example, a network analysis could be used to identify potential bottlenecks in a city design when large numbers of people have to evacuate, or the robustness of a distributed computer network to disruption in the event that one or more redundant servers goes offline.
Network analyses use graph-theoretic approaches to identifying/quantifying/describing/classifying networks of any sort. For example, a network analysis could be used to identify potential bottlenecks in a city design when large numbers of people have to evacuate, or the robustness of a distributed computer network to disruption in the event that one or more redundant servers goes offline.


In our lab, we apply these analytic approaches to networks that describe functional (primarily) or anatomical (potentially) connectivity. Under certain assumptions about what functional role various network nodes play, we can test hypotheses about how the connections between these nodes relates to cognitive processing. Most of our work here will be done in MATLAB and using networks that are represented as adjacency matrices. An adjacency matrix is an NxN square matrix that has entries that indicate whether there is a connection (and potentially how strong it is) between any two of the N nodes in the network. Using the Louvain 2008 parcellation scheme, for example, we break the cortical surface up into 1000 regions. Each region is a node in the network, and so the adjacency matrix for this network is a 1000 x 1000 square matrix. What follows assumes that you have obtained an adjacency matrix, one way or another, and wish to carry out different types of network analyses on it.
In our lab, we apply these analytic approaches to networks that describe functional (primarily) or anatomical (potentially) connectivity. Under certain assumptions about what functional role various network nodes play, we can test hypotheses about how the connections between these nodes relates to cognitive processing. Most of our work here will be done in MATLAB and using networks that are represented as adjacency matrices. An adjacency matrix is an NxN square matrix that has entries that indicate whether there is a connection (and potentially how strong it is) between any two of the N nodes in the network. Using the Louvain 2008 parcellation scheme, for example, we break the cortical surface up into up to 1000 regions. Each region is a node in the network, and so the adjacency matrix for this network is a 1000 x 1000 square matrix. What follows assumes that you have obtained an adjacency matrix, one way or another, and wish to carry out different types of network analyses on it. We have used [[ Functional_Connectivity_(Neural_Network_Method) | neural networks]] and [[Functional_Connectivity_(Cross-Correlation_Method) | correlation-based]] methods to obtain these adjacency matrices.
==Visualize Connectivity Distributions ==
=== Histograms ===
We can plot a histogram of weights for two different connectivity matrices. For example, if we have weights from two different conditions,  '''lo''' and '''hi''':
bins=51;
figure();
hist(lo(:), bins);
h=findobj(gca, 'Type', 'patch');
set(h, 'FaceColor', 'r', 'EdgeColor', 'w', 'facealpha', 0.75);
hold on;
hist(hi(:), bins);
h1=findobj(gca, 'Type', 'patch');
set(h1, 'facealpha', 0.75);


== Hierarchical Clustering ==
=== Graph Diagram ===
Sarah Muldoon recommended starting off with a hierarchical clustering of the adjacency matrix in order to visualize the structure of the matrix and get a sense of the sort of structure that exists within the data. The [http://www.mathworks.com/help/stats/hierarchical-clustering.html MATLAB help page for hierarchical clustering] has some background information, but here are the highlights.
MATLAB's graphing tools are helpful here. If we have a NxN adjacency matrix, '''''M''''', we can turn it into a graph  (if symmetrical) or a digraph (if asymmetrical) , '''''G''''', and plot the nodes and edges using the plot function.
 
  M=rand(8)-0.5; %adjacency matrix in matrix form, includes positive and negative values
The first thing to note is that these steps are too computationally intensive for the adjacency matrices generated for the larger networks produced using the Lausanne 2008 parcellation (Lausanne 500 and 250 are too large). The myaparc_60 network seems to work smoothly; I'll try the 125 next.
  M(logical(eye(size(M))))=0; %0 disconnects two nodes; remove self-connections if desired
 
  G1=digraph(M);
=== Obtain Linkage ===
figure(1);
The first step of a hierarchical cluster is to obtain some kind of distance metric between each node in the graph. This information already exists in the form of the weighted adjacency matrices (either correlations or trained network weights). We pick up here at the linkage step. MATLAB has a <code>linkage</code> function that expects a vector of pairwise distances between all nodes, as would be generated by the <code>pdist</code> function. If you read the documentation for ''pdist'', you will find that it outputs a vector for all pairs ({2,1}, {3,1}, ... , {m, 1}, {3, 2}, ..., {m, 2}, ... , {m, m-1}), where ''m'' is the number of nodes in the network. In relation to the adjacency matrices, these values would represent the lower (or upper) triangle of the matrix.
  plot(G1); %all edges in blue
==== Filter Weights ====
The lower triangle of the adjacency matrix can be obtained in MATLAB using the <code>tril</code> function. An ''m''x''m'' (i.e., square) matrix has k diagonals, numbered 0 to (-1)*m-1. The main diagonal, k=0, will be the diagonal containing all ones (in a correlation matrix), which are meaningless in this context. We will want to get all the diagonals where k<0 (if you are getting the upper triangle using the <code>triu</code> function, use k>0).
  mask=tril(ones(size(M)),-1); %identify the elements in the lower triangle, excluding diagonal k=0
lowert=M(find(mask)); %use mask to obtain the values.
The above code returns the values from the lower triangle of the adjacency matrix '''''M''''' as a vector in the same order that pdist generates.
 
==== Run <code>linkage</code> ====
There are several methods for running the linkage function, but common choices are ''single'', ''complete'', and ''average''.
  Z=linkage(lowert,'average');
Note that the solutions provided by each of these methods can yield drastically different results. For example, while I was working through this workflow for the first time, I used the ''single'' linkage distance calculation method. The sorted correlation matrix displayed at the end didn't seem to have any well-defined clusters of any size (leading me to assume that I had done something wrong). Since there doesn't seem to be any "correct" method, and the purpose of this exercise is to identify some sort of underlying structure to the weights, you might try different methods until you find something that makes that structure apparent.
 
=== Generate the Dendrogram ===
Using the dendrogram function, you'll need to force it to provide the entire leaf structure, specifying ''p'', the maximum number of leaf nodes (if not specified, the default behavior is to display up to 30 leaf nodes):
  [H, T, outperm] = dendrogram(Z, size(M,1));
Looking at the dendrogram can be used to inform the number of clusters you might look for in a data set (e.g., if something like k-means clustering was subsequently employed).
 
=== Sort and Visualize Adjacency Matrices ===
Use the ''outperm'' vector generated by the dendrogram as a sorting index for the adjacency matrix to reveal the network clustering structure.
  sorted_M=M(outperm,outperm);  
  figure(2);
  figure(2);
  imagesc(sorted_M);
  G2=rmedge(G1,find(abs(G1.Edges.Weight)<0.1)); %drop weights below a certain threshold
 
  handle=plot(G2, 'EdgeColor', 'g'); %plot pruned graph in green
==== Visualize Multiple Adjacency Matrices ====
  isneg=G2.Edges.EndNodes(G2.Edges.Weight<0,:); %get endpoints of edges that are negative
Suppose you have several sorted adjacency matrices that you would like to display side-by-side. Use the <code>subplot</code> function to set up a grid-like organization of your figure:
  highlight(handle, isneg(:,1), isneg(:,2), 'EdgeColor', 'r'); %recolor negative edges to red
subplot(2,2,1); %indexing within a 2x2 grid, the next thing you will be displaying will go in slot 1, which is the 1st row, 1st column
  imagesc(sorted_run1);
subplot(2,2,2); %still in the 2x2 grid, the next displayed item goes into slot 2; slots increment by column within a row, until end of row is reached
  imagesc(sorted_run2);
subplot(2,2,3); %3rd slot is the first column of the second row
  imagesc(sorted_run3);
subplot(2,2,4); %4th row is the 2nd column of the 2nd row
imagesc(sorted_run4);
 
== Stochastic Clustering ==
A powerful and widely-used clustering algorithm is the Louvain greedy clustering algorithm. It's stochastic (i.e., contains a random element to it) because it's computationally intractable to completely search the entire space of possible clusterings for anything other than small "toy" networks.


The Louvain algorithm is described [https://perso.uclouvain.be/vincent.blondel/research/louvain.html here], and has been incorporated in a collection of MATLAB network analysis scripts archived [http://commdetect.weebly.com/ here] and [http://netwiki.amath.unc.edu/GenLouvain/GenLouvain here] but also stored in our ubfs Scripts/Matlab folder.
== Identify Clusters/Subnetworks ==
One of the first things we might wish to do is [[Community_Detection | identify the network community structure]] of our brain networks. When visualized as a sorted connectivity matrix or rendered on a brain map, this allows us to make inferences of the nature of processing taking place during the period captured in the time series data.
== Obtain Network Metrics ==
Though visualization of network communities is instrumental to interpreting the data, we are going to rely on the hard numbers provided by various [[Connectivity Metrics | metrics of network properties]] over which we can carry out any number of statistical analyses we have at our disposal.


=== Create a Modularity Matrix ===
==Stochastic (Non-Parametric) Analyses==
The first thing to note is that the genlouvain function doesn't work directly on the adjacency matrix, but instead on a modularity matrix, which is ''derived from'' your adjacency matrix. The number of partitions that are detected is influenced in part by a parameter, gamma, that is often omitted when this modularity matrix is calculated. When omitted, it defaults to 1, however larger gammas permit a greater number of final partitions. The following code sample is pulled from the genlouvain wiki site. The MATLAB function, <code>full</code>, transforms a sparse matrix into a full matrix. Sparse matrices are a more compact way of storing large data sets when there are a large number of empty values. I'm not clear whether the code works properly if the adjacency matrix, '''''A''''', starts off as a full matrix, so instead I will begin by transforming the full matrix '''''M''''' (e.g., a 1000 x 1000 network) into a sparse matrix data type and proceed from there.
MATLAB functions exist to create synthetic networks or randomize existing networks to test against. For example, we can ask whether a particular network measure (e.g., modularity) is greater in our set of functional networks than you would expect by chance in a randomly connected set of networks with the same overall connectivity. This sort of Monte Carlo (a.k.a. Boostrap or jacknife) analysis can be set up using a simple procedure:
A=sparse(M);
#Obtain the set of network metrics for the intact networks
gamma = 1.1; %gamma=1 tends to produce 3-community solutions. Higher values allow more communities.
#Loop for a large number of times:
k = full(sum(A));
##Create a new set of randomized or synthetic networks
twom = sum(k); %twom ("2m") is a normalization parameter
##Compute these network metrics for the new networks
B = full(A - gamma*k'*k/twom); %B is the modularity matrix
##Record whether the network metrics for the intact network is greater than (or less than, if that's your hypothesis) the new networks
#Compute the fraction of times the intact networks were better (or worse) than the simulated networks (e.g., modularity for intact networks was greater than shuffled networks 980 simulations out of 1000)


=== Detect Communities Using <code>genlouvain</code> ===
Useful functions include:
At this point, you now have '''''B''''', the modularity matrix for the adjacency matrix '''''M'''''. Running the genlouvain function on B will stochastically generate a partitioning of your network that maximizes modularity (within its search space).
<code>randomizer_bin_und</code> and <code>randomize_graph_partial_und</code> from the Brain Connectivity Toolbox, and <code>randomize_graph_full</code> which was written in-house and can be found on UBFS/Scripts/Matlab.
[S,Q] = genlouvain(B);
Q = Q/twom
Because this algorithm is stochastic, repeated applications of the genlouvain function are likely to generate somewhat different partitionings. Fortunately, the code runs rather quickly on relatively small networks, and so we can run the function repeatedly and store each partitioning.
maxiterations=10000; %how many partitioning attempts do you want?
S=repmat(NaN, maxiterations, length(k)); %preallocate storage for each partitioning solution
Q=repmat(NaN, maxiterations, 1); %preallocate storage for the modularity of each partitioning solution
for i=1:maxiterations
  [s, q]=genlouvain(B);
  q=q/twom;
  S(i,:)=s(:);
  Q(i)=q;
end


=== Determine the Consensus Best Partitioning ===
With ''maxiterations'' different partitionings of your adjacency matrix into different communities (stored in matrix '''''S''''' in the previous step), we can ascertain the consensus community assignment (i.e., the modal partitioning) for each node in the network. The commdetect toolbox has a function, <code>consensus_iterative</code> for that task.
[S2 Q2 X_new3 qpc] = consensus_iterative(S);
'''''S2''''' is a matrix of <code>size(S)</code>. <code>mode(S2)</code> can be used to list the representative partition for each node in the adjacency matrix, '''''M'''''. Each row of S2 should be the same, though I'm not 100% positive that this is an absolute guarantee, which is why I suggested using <code>mode</code> rather than <code>mean</code> or just selecting one of the ''maxiteration'' rows of '''''S2'''''. The value ''gpc'' is the quality of the consensus, where smaller values are better (it might be the case that gpc>0 is associated with S2 matrices with rows that differ).
[[Category: Networks]]
[[Category: Networks]]
[[Category: Network Analyses]]
[[Category: Time Series]]
[[Category: Time Series]]

Latest revision as of 13:26, 30 October 2018

Network analyses use graph-theoretic approaches to identifying/quantifying/describing/classifying networks of any sort. For example, a network analysis could be used to identify potential bottlenecks in a city design when large numbers of people have to evacuate, or the robustness of a distributed computer network to disruption in the event that one or more redundant servers goes offline.

In our lab, we apply these analytic approaches to networks that describe functional (primarily) or anatomical (potentially) connectivity. Under certain assumptions about what functional role various network nodes play, we can test hypotheses about how the connections between these nodes relates to cognitive processing. Most of our work here will be done in MATLAB and using networks that are represented as adjacency matrices. An adjacency matrix is an NxN square matrix that has entries that indicate whether there is a connection (and potentially how strong it is) between any two of the N nodes in the network. Using the Louvain 2008 parcellation scheme, for example, we break the cortical surface up into up to 1000 regions. Each region is a node in the network, and so the adjacency matrix for this network is a 1000 x 1000 square matrix. What follows assumes that you have obtained an adjacency matrix, one way or another, and wish to carry out different types of network analyses on it. We have used neural networks and correlation-based methods to obtain these adjacency matrices.

Visualize Connectivity Distributions

Histograms

We can plot a histogram of weights for two different connectivity matrices. For example, if we have weights from two different conditions, lo and hi:

bins=51;
figure();
hist(lo(:), bins);
h=findobj(gca, 'Type', 'patch');
set(h, 'FaceColor', 'r', 'EdgeColor', 'w', 'facealpha', 0.75);
hold on;
hist(hi(:), bins);
h1=findobj(gca, 'Type', 'patch');
set(h1, 'facealpha', 0.75);

Graph Diagram

MATLAB's graphing tools are helpful here. If we have a NxN adjacency matrix, M, we can turn it into a graph (if symmetrical) or a digraph (if asymmetrical) , G, and plot the nodes and edges using the plot function.

M=rand(8)-0.5; %adjacency matrix in matrix form, includes positive and negative values
M(logical(eye(size(M))))=0; %0 disconnects two nodes; remove self-connections if desired
G1=digraph(M);
figure(1);
plot(G1); %all edges in blue
figure(2);
G2=rmedge(G1,find(abs(G1.Edges.Weight)<0.1)); %drop weights below a certain threshold
handle=plot(G2, 'EdgeColor', 'g'); %plot pruned graph in green
isneg=G2.Edges.EndNodes(G2.Edges.Weight<0,:); %get endpoints of edges that are negative
highlight(handle, isneg(:,1), isneg(:,2), 'EdgeColor', 'r'); %recolor negative edges to red

Identify Clusters/Subnetworks

One of the first things we might wish to do is identify the network community structure of our brain networks. When visualized as a sorted connectivity matrix or rendered on a brain map, this allows us to make inferences of the nature of processing taking place during the period captured in the time series data.

Obtain Network Metrics

Though visualization of network communities is instrumental to interpreting the data, we are going to rely on the hard numbers provided by various metrics of network properties over which we can carry out any number of statistical analyses we have at our disposal.

Stochastic (Non-Parametric) Analyses

MATLAB functions exist to create synthetic networks or randomize existing networks to test against. For example, we can ask whether a particular network measure (e.g., modularity) is greater in our set of functional networks than you would expect by chance in a randomly connected set of networks with the same overall connectivity. This sort of Monte Carlo (a.k.a. Boostrap or jacknife) analysis can be set up using a simple procedure:

  1. Obtain the set of network metrics for the intact networks
  2. Loop for a large number of times:
    1. Create a new set of randomized or synthetic networks
    2. Compute these network metrics for the new networks
    3. Record whether the network metrics for the intact network is greater than (or less than, if that's your hypothesis) the new networks
  3. Compute the fraction of times the intact networks were better (or worse) than the simulated networks (e.g., modularity for intact networks was greater than shuffled networks 980 simulations out of 1000)

Useful functions include: randomizer_bin_und and randomize_graph_partial_und from the Brain Connectivity Toolbox, and randomize_graph_full which was written in-house and can be found on UBFS/Scripts/Matlab.