Reading Development project
The Reading Development project uses an open dataset collected between 2008-2013 in James Booth's Developmental Child Neuroscience Lab at Northwestern. The data are available at openneuro.org.
General Notes
To be added later.
Constrained Classifier Project
Participants
28 Children (14 TD, 14 RD) with either strong or weak (incl. dyslexia diagnosis) reading skills and no other diagnosis and data from 2 time points. Children were scored according to Z score on 3 reading measures (WATT, Pseudoword decoding, something else), with an additional -1 or -0.5 assigned for a clinical dyslexia diagnosis (-1) or reading difficulty diagnosis (-.5). I selected the children with the top and bottom 14 summed scores.
Preprocessing
Anatomical surfaces reconstructed using T1w images at both early and late time points. Functional data from VV runs (word and nonword) preprocessed with both 6mm and 2mm blurring, Siemens slice time.
Functional Masking Using GLMA
A GLMA was carried out, contrasting all lexical vs. baseline to create group-level functional masks for TD and RD (p<.001, cluster P<.05). Union of masks was calculated. Masks were subdivided along template anatomical boundaries using custom scripts intersection clusters with template region boundaries. These regions were further subdivided using mris_divide_whatever.
Time Series Extraction
The 2mm-smoothed data were detrended using detrend_wmcsf.sh to remove linear trend, motion and wm and csf signal as nuisance regressors. The surfacetimecourses.sh shell script was used to extract time series in fsaverage space to plaintext files.
Functional Connectivity Estimation
Cross-mutual information (XMI) was used to calculate a functional connectivity matrix for each functional run (8 runs/participant) using a custom MATLAB script.
MATLAB FC Pseudocode
for each subject S: for each time series file: load the files for lh and rh merge lh and rh into a single ts matrix eliminate timepoints at spikes (3SD from mean) initialize empty adjacency matrix A for each ordered pair of regions (i,j) in upper triangle of A: calculate the number of bins required for the cross-mutual-information calculation calculate the cross-mutual-information for time series in regions i and j and store in A[i,j] append A to collection of adjacency matrices for S save all adjacency matrices for S to a .mat file
FC Pattern Generation
Upper triangle of XMI matrix was extracted as a vector and normalized (Z Score) and scaled to fall between 0 and 1. The resulting distribution was very positively skewed so the square root of these values was calculated to make the distribution more normal. The scaled FC vectors for each of the 8 runs per participant were tagged for lexicality (word=1, pseudoword=0) at element n-2, and reading skill (RD=0, TD=1) at element n-1 and the decimal equivalent of the lexicality+reading skill code (e.g., 10 = 2) at element n, which was necessary to allow the k-folds cross-validation to balance the category distributions. This generated a matrix of 224 (28 subjects * 8 runs) by 6789 (6786 connections + lexicality code + group code + combined code). The matrix was written to a single .csv file.
Alternative Approaches
I had two options for generating the normalized XMI matrices. My first intuition was to normalize the XMI values within each matrix individually. This meant that the connectivity values were scaled with reference to the rest of the connectivity values for that participant for that run. It also occurred to me that I might instead normalize the XMI values globally, with respect to all XMI values occurring in all runs for all participants. I ended up writing a script that generated the FC patterns using normalized and scaled values computed individually and globally as two separate pattern files. It turned out that this was a lucky happenstance, because using both sets of patterns together lead to much better performance than either of them individually. I'm not sure whether that is because I doubled the amount of training data or because the two methods of pattern generation bolstered each other to give the models a better picture of the feature relations.