Deep Learning Semantic Category Classification: Difference between revisions

From CCN Wiki
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 12: Line 12:


I ran into a problem after going through the remaining steps of this process and appending the tagged time series values to an existing training/testing set. Python was having a problem parsing some of the scaled activation values (it was complaining that some lines had 998 columns instead of the expected 999 columns). My first attempt at a fix seemed to work: rather than use MATLAB's default precision, use the <code>round()</code> function to round off the scaled activation values to a reasonable level of precision:
I ran into a problem after going through the remaining steps of this process and appending the tagged time series values to an existing training/testing set. Python was having a problem parsing some of the scaled activation values (it was complaining that some lines had 998 columns instead of the expected 999 columns). My first attempt at a fix seemed to work: rather than use MATLAB's default precision, use the <code>round()</code> function to round off the scaled activation values to a reasonable level of precision:
  SCALED=round(SCALED, 4); %round the values to 4 decimal places, which seems plenty precise for our purposes.
  SCALED=cellfun(@(x) round(x,4), SCALED, 'UniformOutput', false);
%round the values to 4 decimal places, which seems plenty precise for our purposes.


== Event Onset Extraction ==
== Event Onset Extraction ==
Line 20: Line 21:
== Event Tagging ==
== Event Tagging ==
The TSTagger() function is used to assign conditions to event windows. It is run separately for the category conditions for high familiarity and low familiarity items:
The TSTagger() function is used to assign conditions to event windows. It is run separately for the category conditions for high familiarity and low familiarity items:
  hiconds=[11,21,31]
  hiconds=[11,21,31];
  loconds=[12,22,32];
  loconds=[12,22,32];
  TSTagger('tr', 2.047, 'condition', hiconds, 'volumes_dropped', 4, 'mat', SCALED, 'expinfo', eio);
  TSTagger('tr', 2.047, 'condition', hiconds, 'volumes_dropped', 4, 'mat', SCALED, 'expinfo', eio);
Line 47: Line 48:
  sed -i .bak 's/,22/,1/g' 0${SUB}_lo_pre.csv  
  sed -i .bak 's/,22/,1/g' 0${SUB}_lo_pre.csv  
  sed -i .bak 's/,32/,2/g' 0${SUB}_lo_pre.csv
  sed -i .bak 's/,32/,2/g' 0${SUB}_lo_pre.csv
== Training ==
Last attempt was done using 5 subjects with T1 and T2. Reduced network size to 998 -> 32 -> 12 -> 3
Save trained networks using <code>python readout.py</code>
Get region cluster activations using dummy.csv (generated using <code>eye(998)</code> with a dummy column 999 tacked on at the end.  Run <code>python printact.py dummy</code> to get the hidden-layer activity associated with each region.


[[Category: Machine Learning]]
[[Category: Machine Learning]]
[[Category: Neural Networks]]
[[Category: Neural Networks]]

Latest revision as of 21:20, 12 June 2019

In this project, cortical activity patterns associated with imagery of familiar category members are used to train a 4-layer feedforward network implemented in TensorFlow. Unfamiliar item patterns pre- and post-exposure are classified by the trained network.

Time Series Extraction and Normalization

Import Time Series

ldropregions=[1 5];
rdropregions=[1 5];
M=loadFSTS('ldropregions', ldropregions, 'rdropregions', rdropregions);

Normalize and Rescale Matrices

thresh=1.96;
ZM=normalizeMatrix(M)
[BIN, SCALED]=binarizeMatrix(ZM, 'thresh', thresh);

I ran into a problem after going through the remaining steps of this process and appending the tagged time series values to an existing training/testing set. Python was having a problem parsing some of the scaled activation values (it was complaining that some lines had 998 columns instead of the expected 999 columns). My first attempt at a fix seemed to work: rather than use MATLAB's default precision, use the round() function to round off the scaled activation values to a reasonable level of precision:

SCALED=cellfun(@(x) round(x,4), SCALED, 'UniformOutput', false);
%round the values to 4 decimal places, which seems plenty precise for our purposes.

Event Onset Extraction

PTBParser() is used to aggregate the associated onset files. If the run numbering hasn't already been changed, because the data are T1 and T2, it will be necessary to supply a new run numbering vector when running PTBParser:

eio=PTBParser('run' [1 7 2 8 3 9 4 10 5 11 6 12]); 

Event Tagging

The TSTagger() function is used to assign conditions to event windows. It is run separately for the category conditions for high familiarity and low familiarity items:

hiconds=[11,21,31];
loconds=[12,22,32];
TSTagger('tr', 2.047, 'condition', hiconds, 'volumes_dropped', 4, 'mat', SCALED, 'expinfo', eio);
%Before running the next line, move all the generated .csv files to a new subdirectory (e.g., /hi_fam) or else the next command will overwrite them!
TSTagger('tr', 2.047, 'condition', loconds, 'volumes_dropped', 4, 'mat', SCALED, 'expinfo', eio);
%Move these to a new subdirectory (e.g., /low_fam)

Merge and Relabel Training Files

TSTagger produces a set of .csv files, which you would have moved to two separate directories. At this point, the last column in the .csv files contains the original condition codes: [11, 21, 12, 22, 31, 32]. These need to be changed so that they represent one of three condition values: 0, 1 or 2. The easiest way to do this is in a shell terminal, where you first concatenate all the files in a directory and then use sed to globally replace the codes:

cd  hi_fam
SUB=202 #replace with the appropriate subject number and you can probably copy-paste the commands below!
cat *.csv > 0${SUB}_hi.csv
sed -i .bak 's/,11/,0/g' 0${SUB}_hi.csv
sed -i .bak 's/,21/,1/g' 0${SUB}_hi.csv
sed -i .bak 's/,31/,2/g' 0${SUB}_hi.csv

For the low-familiar items, we must distinguish between the pre- (01-06) and post-exposure (07-12) runs:

cd lo_fam
cat *_00[1-6].csv > 0${SUB}_lo_pre.csv
cat *_00[7-9].csv > 0${SUB}_lo_post.csv
cat *_01*.csv >> 0${SUB}_lo_post.csv
sed -i .bak 's/,12/,0/g' 0${SUB}_lo_post.csv 
sed -i .bak 's/,22/,1/g' 0${SUB}_lo_post.csv 
sed -i .bak 's/,32/,2/g' 0${SUB}_lo_post.csv 
sed -i .bak 's/,12/,0/g' 0${SUB}_lo_pre.csv 
sed -i .bak 's/,22/,1/g' 0${SUB}_lo_pre.csv 
sed -i .bak 's/,32/,2/g' 0${SUB}_lo_pre.csv

Training

Last attempt was done using 5 subjects with T1 and T2. Reduced network size to 998 -> 32 -> 12 -> 3

Save trained networks using python readout.py

Get region cluster activations using dummy.csv (generated using eye(998) with a dummy column 999 tacked on at the end. Run python printact.py dummy to get the hidden-layer activity associated with each region.