Deep Learning Semantic Category Classification

From CCN Wiki
Revision as of 14:28, 6 June 2018 by Chris (talk | contribs)
Jump to navigation Jump to search

In this project, cortical activity patterns associated with imagery of familiar category members are used to train a 4-layer feedforward network implemented in TensorFlow. Unfamiliar item patterns pre- and post-exposure are classified by the trained network.

Time Series Extraction and Normalization

Import Time Series

ldropregions=[1 5];
rdropregions=[1 5];
M=loadFSTS('ldropregions', ldropregions, 'rdropregions', rdropregions);

Normalize and Rescale Matrices

thresh=1.96;
ZM=normalizeMatrix(M)
[BIN, SCALED]=binarizeMatrix(ZM, 'thresh', thresh);

I ran into a problem after going through the remaining steps of this process and appending the tagged time series values to an existing training/testing set. Python was having a problem parsing some of the scaled activation values (it was complaining that some lines had 998 columns instead of the expected 999 columns). My first attempt at a fix seemed to work: rather than use MATLAB's default precision, use the round() function to round off the scaled activation values to a reasonable level of precision:

SCALED=round(SCALED, 4); %round the values to 4 decimal places, which seems plenty precise for our purposes.

Event Onset Extraction

PTBParser() is used to aggregate the associated onset files. If the run numbering hasn't already been changed, because the data are T1 and T2, it will be necessary to supply a new run numbering vector when running PTBParser:

eio=PTBParser('run' [1 7 2 8 3 9 4 10 5 11 6 12]); 

Event Tagging

The TSTagger() function is used to assign conditions to event windows. It is run separately for the category conditions for high familiarity and low familiarity items:

hiconds=[11,21,31]
loconds=[12,22,32];
TSTagger('tr', 2.047, 'condition', hiconds, 'volumes_dropped', 4, 'mat', SCALED, 'expinfo', eio);
%Before running the next line, move all the generated .csv files to a new subdirectory (e.g., /hi_fam) or else the next command will overwrite them!
TSTagger('tr', 2.047, 'condition', loconds, 'volumes_dropped', 4, 'mat', SCALED, 'expinfo', eio);
%Move these to a new subdirectory (e.g., /low_fam)

Merge and Relabel Training Files

TSTagger produces a set of .csv files, which you would have moved to two separate directories. At this point, the last column in the .csv files contains the original condition codes: [11, 21, 12, 22, 31, 32]. These need to be changed so that they represent one of three condition values: 0, 1 or 2. The easiest way to do this is in a shell terminal, where you first concatenate all the files in a directory and then use sed to globally replace the codes:

cd  hi_fam
SUB=202 #replace with the appropriate subject number and you can probably copy-paste the commands below!
cat *.csv > 0${SUB}_hi.csv
sed -i .bak 's/,11/,0/g' 0${SUB}_hi.csv
sed -i .bak 's/,21/,1/g' 0${SUB}_hi.csv
sed -i .bak 's/,31/,2/g' 0${SUB}_hi.csv

For the low-familiar items, we must distinguish between the pre- (01-06) and post-exposure (07-12) runs:

cd lo_fam
cat *_00[1-6].csv > 0${SUB}_lo_pre.csv
cat *_00[7-9].csv > 0${SUB}_lo_post.csv
cat *_01*.csv >> 0${SUB}_lo_post.csv
sed -i .bak 's/,12/,0/g' 0${SUB}_lo_post.csv 
sed -i .bak 's/,22/,1/g' 0${SUB}_lo_post.csv 
sed -i .bak 's/,32/,2/g' 0${SUB}_lo_post.csv 
sed -i .bak 's/,12/,0/g' 0${SUB}_lo_pre.csv 
sed -i .bak 's/,22/,1/g' 0${SUB}_lo_pre.csv 
sed -i .bak 's/,32/,2/g' 0${SUB}_lo_pre.csv

Training

Last attempt was done using 5 subjects with T1 and T2. Reduced network size to 998 -> 32 -> 12 -> 3

Save trained networks using python readout.py

Get region cluster activations using dummy.csv (generated using eye(998) with a dummy column 999 tacked on at the end. Node activation values were set to the average per-trial summed activation value in the training set (~490.0). Run python printact.py to get the hidden-layer activity associated with each region.