Time Series Simulations
Wernickesarea has some compiled MikeNet code to generate simulate time series data for a small network that generates activation throughout a 3-layer network in response to inputs 0 and 1 (A and B). Activation of these units drives activation in 3 banks of 4 units in layer 1 (1A, 1NULL, 1B). Up to 4 1A units are active when input[0] is active, and up to 4 units of 1B are active when input[1] is active. The 1NNULL units are active only when both inputs are inactive. Also, one of the units in 1A and 1B are inhibited .8 of the time. 1A and 1B connect fully to 4 units in 2A and 2B respectively, and all units in 2A and 2B connect to layer3, which is maximally active only when both inputs are active (i.e., superadditivity).
Procedure
Generate Time Series
A shell script generates a specified (hard-coded) number of simulations:
#!/bin/bash for i in $(seq -f "%04g" 0 999) do echo $i ./xor >> simdat.txt done
Remove Delimiters
Each of the banks of units are delimited by a ^ (caret) symbol for readability, but this character interferes with MATLAB import. To strip that character out with sed
sed -i 's/\^//g' simdat.txt #this syntax works on linux, but not on OSX
Conventional Correlation-Based Connectivity
The full work-up based on correlations is implemented in the script simdata_connectivity.m
, which implements connectivity analyses posted elsewhere on this wiki. The interesting thing to look for is how do units 1 and 9 cluster, because these units are in banks 1A and 1B, respectively, but are inhibited .80 of the time. In a large data set, it looks like these units fail to cluster with their layer buddies because they are more weakly correlated.
Multivariate Connectivity Analysis
Convert the artificial time series into patterns, keeping only the maximal activation for the trial window (ends up being the last time step because of how the simulator examples run), and for only trials where both inputs are either 0 or 1 (omit the .5 .5 ambiguous trials). This is implemented in the script gen_simcon_patterns.m
. 10K simulations produced 5K patterns in the file simcon.csv. Copy the file to a workstation with Keras/TensorFlow.
Implement Autoencoder
Connectivity estimated with a straight autoencoder with hidden layer:
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input (InputLayer) (None, 24) 0 _________________________________________________________________ input_noise (GaussianNoise) (None, 24) 0 _________________________________________________________________ dense_1 (Dense) (None, 6) 150 _________________________________________________________________ ae_out (Dense) (None, 24) 168 ================================================================= Total params: 318 Trainable params: 318 Non-trainable params: 0
A quick analysis of the correlations between input/output patterns looks like the outputs closely matched the input patterns:
Rs=nan(1,4); Ps=nan(1,4); for r=1:4 inp=sprintf('act_auto_in_%02d.txt',r); outp=sprintf('act_auto_out_%02d.txt',r); inpat=load(inp); outpat=load(outp); [rval,pval]=corrcoef(inpat(:),outpat(:)); Rs(r)=rval(2); Ps(r)=pval(2); end
Implement Autoencoder with Classifier
A complementary autoencoder can be created that also includes 2 classifier units to assess the benefits and costs that the additional constraints bring. 40 simulations were completed using the AE network and 40 were completed using the AE network with embedded classifier. It appears that adding the classifier negatively impacts the AE reconstruction accuracy in the synthetic data (t(78)=-6.10, p<2E-8). Classifier accuracy was uncorrelated with the AE error in the CAT/AE networks (r(39)=-.15, p>.35).
Having done this, we can extract the autoencoder weights for the CAT/AE networks and see whether they are structured differently than the —/AE networks.
Extract Autoencoder Weights
The Keras save_weights()
function saves the network parameters in HDF5 format (.h5), which is a pain in the butt to parse. After some rummaging around, I cobbled together the following python script that will pull the weights for each layer it encounters (surely there's some name information in there for more informative output filenames, but this is quick 'n dirty land).
import numpy as np import keras import sys from keras.models import Model, Sequential, load_model nmodels=int(sys.argv[1]) #eval each of the models for midx in range(nmodels): m= midx+1 modelname='model_' + format(m, '02') + '.h5' print modelname model=load_model(modelname) l=0 for layer in model.layers: l=l+1 dat = np.asarray(layer.get_weights()) if dat.size > 0: fname=modelname + format(l,'02') + '.csv' wts=dat[0] np.savetxt(fname, wts, fmt='%.4f', delimiter=",")
The resultant .csv files can be loaded into MATLAB using csvread()
. In the simple autoencoder example, two .csv files were produced: one was a 24 × 6 array and the other was a 6 × 24 array. Multiply the two arrays together to get a 24 × 24 matrix.
for m=1:4 modelname=sprintf('model_%02d',m); l1=csvread([modelname '.h503.csv']); l2=csvread([modelname '.h504.csv']); path=l1*l2; paths(m,:,:)=path; figure(m); imagesc(path); end
I knew I had solved this problem before! Here's a script called h5tocsv.py that will export connections to the named layers to separate CSV files:
import numpy as np import sys from keras.models import load_model #pass the name of modelfile.h5 as a parameter modelfile=sys.argv[1] #no good reason for an if-true statement; this was just there for #debugging/testing if True: model=load_model(modelfile) print model.summary() dense1=model.get_layer('dense_1') dense1_weights=dense1.get_weights() savename=modelfile + ".hidden1.csv" np.savetxt(savename, dense1_weights[0], delimiter=",", fmt="%10.5f") output=model.get_layer('ae_output') output_weights=output.get_weights() savename=modelfile+".ae_output.csv" np.savetxt(savename, output_weights[0], delimiter=",", fmt="%10.5f")
With 40 models for each type of network to process, I used a shell script:
#!/bin/bash for i in $(seq -f "%02g" 1 40) do echo $i fname=catAE_model_${i}.h5 python h5tocsv.py ${fname} done
Note that there was a bug when trying to apply the h5tocsv.py script to the &ndash/AE models because I was inconsistent with my layer names. The autoencoder output layer in these models was called ae_out rather than ae_output. The python code had to be modified to reflect this difference.