Deep Learning Semantic Category Classification: Difference between revisions
Line 53: | Line 53: | ||
Save trained networks using <code>python readout.py</code> | Save trained networks using <code>python readout.py</code> | ||
Get region cluster activations using dummy.csv (generated using <code>eye(998)</code> with a dummy column 999 tacked on at the end. | Get region cluster activations using dummy.csv (generated using <code>eye(998)</code> with a dummy column 999 tacked on at the end. Run <code>python printact.py dummy</code> to get the hidden-layer activity associated with each region. | ||
[[Category: Machine Learning]] | [[Category: Machine Learning]] | ||
[[Category: Neural Networks]] | [[Category: Neural Networks]] |
Revision as of 15:10, 6 June 2018
In this project, cortical activity patterns associated with imagery of familiar category members are used to train a 4-layer feedforward network implemented in TensorFlow. Unfamiliar item patterns pre- and post-exposure are classified by the trained network.
Time Series Extraction and Normalization
Import Time Series
ldropregions=[1 5]; rdropregions=[1 5]; M=loadFSTS('ldropregions', ldropregions, 'rdropregions', rdropregions);
Normalize and Rescale Matrices
thresh=1.96; ZM=normalizeMatrix(M) [BIN, SCALED]=binarizeMatrix(ZM, 'thresh', thresh);
I ran into a problem after going through the remaining steps of this process and appending the tagged time series values to an existing training/testing set. Python was having a problem parsing some of the scaled activation values (it was complaining that some lines had 998 columns instead of the expected 999 columns). My first attempt at a fix seemed to work: rather than use MATLAB's default precision, use the round()
function to round off the scaled activation values to a reasonable level of precision:
SCALED=round(SCALED, 4); %round the values to 4 decimal places, which seems plenty precise for our purposes.
Event Onset Extraction
PTBParser() is used to aggregate the associated onset files. If the run numbering hasn't already been changed, because the data are T1 and T2, it will be necessary to supply a new run numbering vector when running PTBParser:
eio=PTBParser('run' [1 7 2 8 3 9 4 10 5 11 6 12]);
Event Tagging
The TSTagger() function is used to assign conditions to event windows. It is run separately for the category conditions for high familiarity and low familiarity items:
hiconds=[11,21,31] loconds=[12,22,32]; TSTagger('tr', 2.047, 'condition', hiconds, 'volumes_dropped', 4, 'mat', SCALED, 'expinfo', eio); %Before running the next line, move all the generated .csv files to a new subdirectory (e.g., /hi_fam) or else the next command will overwrite them! TSTagger('tr', 2.047, 'condition', loconds, 'volumes_dropped', 4, 'mat', SCALED, 'expinfo', eio); %Move these to a new subdirectory (e.g., /low_fam)
Merge and Relabel Training Files
TSTagger produces a set of .csv files, which you would have moved to two separate directories. At this point, the last column in the .csv files contains the original condition codes: [11, 21, 12, 22, 31, 32]. These need to be changed so that they represent one of three condition values: 0, 1 or 2. The easiest way to do this is in a shell terminal, where you first concatenate all the files in a directory and then use sed
to globally replace the codes:
cd hi_fam SUB=202 #replace with the appropriate subject number and you can probably copy-paste the commands below! cat *.csv > 0${SUB}_hi.csv sed -i .bak 's/,11/,0/g' 0${SUB}_hi.csv sed -i .bak 's/,21/,1/g' 0${SUB}_hi.csv sed -i .bak 's/,31/,2/g' 0${SUB}_hi.csv
For the low-familiar items, we must distinguish between the pre- (01-06) and post-exposure (07-12) runs:
cd lo_fam cat *_00[1-6].csv > 0${SUB}_lo_pre.csv cat *_00[7-9].csv > 0${SUB}_lo_post.csv cat *_01*.csv >> 0${SUB}_lo_post.csv sed -i .bak 's/,12/,0/g' 0${SUB}_lo_post.csv sed -i .bak 's/,22/,1/g' 0${SUB}_lo_post.csv sed -i .bak 's/,32/,2/g' 0${SUB}_lo_post.csv sed -i .bak 's/,12/,0/g' 0${SUB}_lo_pre.csv sed -i .bak 's/,22/,1/g' 0${SUB}_lo_pre.csv sed -i .bak 's/,32/,2/g' 0${SUB}_lo_pre.csv
Training
Last attempt was done using 5 subjects with T1 and T2. Reduced network size to 998 -> 32 -> 12 -> 3
Save trained networks using python readout.py
Get region cluster activations using dummy.csv (generated using eye(998)
with a dummy column 999 tacked on at the end. Run python printact.py dummy
to get the hidden-layer activity associated with each region.