Time Series Simulations

From CCN Wiki
Jump to navigation Jump to search

Wernickesarea has some compiled MikeNet code to generate simulate time series data for a small network that generates activation throughout a 3-layer network in response to inputs 0 and 1 (A and B). Activation of these units drives activation in 3 banks of 4 units in layer 1 (1A, 1NULL, 1B). Up to 4 1A units are active when input[0] is active, and up to 4 units of 1B are active when input[1] is active. The 1NNULL units are active only when both inputs are inactive. Also, one of the units in 1A and 1B are inhibited .8 of the time. 1A and 1B connect fully to 4 units in 2A and 2B respectively, and all units in 2A and 2B connect to layer3, which is maximally active only when both inputs are active (i.e., superadditivity).

Procedure

Generate Time Series

A shell script generates a specified (hard-coded) number of simulations:

#!/bin/bash
for i in $(seq -f "%04g" 0 999)
do
 echo $i
 ./xor >> simdat.txt
done

Remove Delimiters

Each of the banks of units are delimited by a ^ (caret) symbol for readability, but this character interferes with MATLAB import. To strip that character out with sed

sed -i 's/\^//g' simdat.txt #this syntax works on linux, but not on OSX

Conventional Correlation-Based Connectivity

The full work-up based on correlations is implemented in the script simdata_connectivity.m, which implements connectivity analyses posted elsewhere on this wiki. The interesting thing to look for is how do units 1 and 9 cluster, because these units are in banks 1A and 1B, respectively, but are inhibited .80 of the time. In a large data set, it looks like these units fail to cluster with their layer buddies because they are more weakly correlated.

Multivariate Connectivity Analysis

Convert the artificial time series into patterns, keeping only the maximal activation for the trial window (ends up being the last time step because of how the simulator examples run), and for only trials where both inputs are either 0 or 1 (omit the .5 .5 ambiguous trials). This is implemented in the script gen_simcon_patterns.m. 10K simulations produced 5K patterns in the file simcon.csv. Copy the file to a workstation with Keras/TensorFlow.

Implement Autoencoder

Connectivity estimated with a straight autoencoder with hidden layer:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (None, 24)                0         
_________________________________________________________________
input_noise (GaussianNoise)  (None, 24)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 6)                 150       
_________________________________________________________________
ae_out (Dense)               (None, 24)                168       
=================================================================
Total params: 318
Trainable params: 318
Non-trainable params: 0

A quick analysis of the correlations between input/output patterns looks like the outputs closely matched the input patterns:

Rs=nan(1,4);
Ps=nan(1,4);
for r=1:4
  inp=sprintf('act_auto_in_%02d.txt',r);
  outp=sprintf('act_auto_out_%02d.txt',r);
  inpat=load(inp);
  outpat=load(outp);
  [rval,pval]=corrcoef(inpat(:),outpat(:));
  Rs(r)=rval(2);
  Ps(r)=pval(2);
end

Implement Autoencoder with Classifier

A complementary autoencoder can be created that also includes 2 classifier units to assess the benefits and costs that the additional constraints bring. 40 simulations were completed using the AE network and 40 were completed using the AE network with embedded classifier. It appears that adding the classifier negatively impacts the AE reconstruction accuracy in the synthetic data (t(78)=-6.10, p<2E-8). Classifier accuracy was uncorrelated with the AE error in the CAT/AE networks (r(39)=-.15, p>.35).

Having done this, we can extract the autoencoder weights for the CAT/AE networks and see whether they are structured differently than the —/AE networks.

Extract Autoencoder Weights

The Keras save_weights() function saves the network parameters in HDF5 format (.h5), which is a pain in the butt to parse. After some rummaging around, I cobbled together the following python script that will pull the weights for each layer it encounters (surely there's some name information in there for more informative output filenames, but this is quick 'n dirty land).

import numpy as np
import keras
import sys
from keras.models import Model, Sequential, load_model
nmodels=int(sys.argv[1])

#eval each of the models
for midx in range(nmodels):
 m= midx+1
 modelname='model_'  + format(m, '02') + '.h5'
 print modelname
 model=load_model(modelname)
 l=0
 for layer in model.layers:
   l=l+1
   dat = np.asarray(layer.get_weights())
   if dat.size > 0:
    fname=modelname + format(l,'02') + '.csv' 
    wts=dat[0]
    np.savetxt(fname, wts, fmt='%.4f', delimiter=",")

The resultant .csv files can be loaded into MATLAB using csvread(). In the simple autoencoder example, two .csv files were produced: one was a 24 × 6 array and the other was a 6 × 24 array. Multiply the two arrays together to get a 24 × 24 matrix.

for m=1:4
   modelname=sprintf('model_%02d',m);
   l1=csvread([modelname '.h503.csv']);
   l2=csvread([modelname '.h504.csv']);
   path=l1*l2;
   paths(m,:,:)=path;
   figure(m);
   imagesc(path);
end

I knew I had solved this problem before! Here's a script called h5tocsv.py that will export connections to the named layers to separate CSV files:

import numpy as np
import sys
from keras.models import load_model

#pass the name of modelfile.h5 as a parameter
modelfile=sys.argv[1]

#no good reason for an if-true statement; this was just there for 
#debugging/testing
if True:
  model=load_model(modelfile)
  print model.summary()
  dense1=model.get_layer('dense_1')
  dense1_weights=dense1.get_weights()
  savename=modelfile + ".hidden1.csv"
  np.savetxt(savename, dense1_weights[0], delimiter=",", fmt="%10.5f")
  output=model.get_layer('ae_output')
  output_weights=output.get_weights()
  savename=modelfile+".ae_output.csv"
  np.savetxt(savename, output_weights[0], delimiter=",", fmt="%10.5f")

With 40 models for each type of network to process, I used a shell script:

#!/bin/bash
for i in $(seq -f "%02g" 1 40)
do
 echo $i
 fname=catAE_model_${i}.h5
 python h5tocsv.py ${fname}
done

Note that there was a bug when trying to apply the h5tocsv.py script to the &ndash/AE models because I was inconsistent with my layer names. The autoencoder output layer in these models was called ae_out rather than ae_output. The python code had to be modified to reflect this difference.