BASH Tricks: Difference between revisions
Line 20: | Line 20: | ||
cd ALLSUBJECTS | cd ALLSUBJECTS | ||
#next command lists only directories (-1), sorts the list (sort), | #next command lists only directories (-1), sorts the list (sort), | ||
#makes sure it only lists folder names starting with "ND" ( grep "^ND"), and then uses sed to strip the trailing backslash | #makes sure it only lists folder names starting with "ND" ( grep "^ND"), and then uses | ||
#sed to strip the trailing backslash | |||
ls -d -1 */ | grep "^ND" | sort | sed 's/\///g' >> ../allsubs.txt | ls -d -1 */ | grep "^ND" | sort | sed 's/\///g' >> ../allsubs.txt | ||
cd ../gooddata | cd ../gooddata |
Revision as of 12:41, 3 April 2020
How many lines in my text file?
Totally useful when you have some kind of training file with many rows and columns:
FILENAME=myfile.csv nl ${FILENAME} | awk '{ print $1 }'
I want to drop the first line of my text file
tail
echoes the last n lines (default: 10) of a text file to stdout. Using the -n flag flips it around so that it echoes back all up to the last n lines of the file. So -n +2 will echo back the file up to the 2nd line of the file (i.e., dropping the first line). We can pipe this to a temp file (so we don't write out an empty file), and then rename:
tail -n +2 "$FILE" > "$FILE.tmp" && mv "$FILE.tmp" "$FILE"
PROTIP: I'm pretty sure the same trick applies using the head
command to drop the last n lines from a file.
Make a list of directory names
We often organize subject data so that each subject gets their own directory. Freesurfer uses a subjects file when batch processing. Rather than manually type out each folder name into a text file, it can be generated in one line of code:
ls -1 -d */ | sed "s,/$,," > subjects
This lists in 1 column all the directories (-1 -d) and uses sed to snip off the trailing forward slashes in the directory names
What directories are in one directory but not the other?
Scenario: We had a directory, let's call it ALLSUBJECTS that had a bunch of subject directories named NDARINVxxxxxxxx. Some of them had a full dataset, but many of them did not. Sophia made a directory called gooddata that contained only the subset of folders that had full datasets. What's the fastest way to figure out who has incomplete data? Look for folders appearing in ALLSUBJECTS that don't appear in gooddata.
First, we need sorted directory lists:
cd ALLSUBJECTS #next command lists only directories (-1), sorts the list (sort), #makes sure it only lists folder names starting with "ND" ( grep "^ND"), and then uses #sed to strip the trailing backslash ls -d -1 */ | grep "^ND" | sort | sed 's/\///g' >> ../allsubs.txt cd ../gooddata ls -d -1 */ | grep "^ND" | sort | sed 's/\///g' >> ../goodsubs.txt #next line finds lines appearing in allsubs that do not appear in goodsubs: comm -23 allsubs.txt goodsubs.txt
Making a tar
archive containing only the minimal set of structural files for FreeSurfer
FreeSurfer makes a zillion files during the recon-all step. I have no idea what most of them are there for. Which do we need? The absolute minimal list is a work in progress, but I have made a file called fsaverage.required (copied to the ubfs scripts directory) based on the contents of the fsaverage template subject directory contents. I dropped the obvious directories (e.g., mri 2mm), so what's left should hopefully be close to the minimal required set for getting things done with a subject. The idea is to reduce the number of superfluous files that you store or copy over the network so that we don't waste as much time and disk space with useless nonsense.
So here's what you do (note the text in red will vary - don't just blindly copy the code snippets below and expect them to work; that's how Chernobyl happens! You need to understand what you're doing):
- Copy fsaverage.required to
$SUBJECTS_DIR
- Inspect fsaverage.required to make sure that it has any idiosyncratic files that you might wish to include
- e.g., the original version only includes f.nii.gz files. If you want to also grab all your preprocessed .mgz files, then you'll want to include *.mgz up at the top. Save any changes.
- Navigate to a subject directory:
cd FS_sub-001
- The following command will use the files listed in
$SUBJECTS_DIR/fsaverage.required
to find and archive the desired files for this subject:tar -czvf sub-001.minimal.tgz `find . | grep -G -f ${SUBJECTS_DIR}/fsaverage.required`
- When you're done, you'll have the bare-bones minimum files to permit FS-FAST analyses of your BOLD data for your subject.
- You can copy the .tgz files to an external drive or over the network. Be sure to unpack the .tgz archive in an empty subject directory
- e.g.:
mkdir ~/new_project/ #starting a new project directory - in this case on the same computer, but it could be anywhere cd new_project #enter the new project directory mkdir FS_sub-001 #making an empty subject directory for the files we're about to unpack cd FS_sub-001 #navigate into the new empty subject directory #next line copies the minimal file archive from the source directory into the new empty subject directory cp ${SUBJECTS_DIR}/FS_sub-001/sub-001.minimal.tgz ./ #next line unzips the file archive into the empty directory tar -xzvf sub-001.minimal.tgz
With the minimal set of structural files, you should be able to unzip the surface and T1 anatomical files and inspect for reconstruction accuracy, or add BOLD files from elsewhere (the BOLD files are what really does you in, and I've developed a similar procedure to grab your blob analysis files)
Making a tar
archive containing only the minimal set of GLM Analysis Files for FreeSurfer
After running a first level GLM analysis (a "blob" analysis) using selxavg3-sess
, each of your subject/bold directories will contain an analysis directory for each of the surfaces you included in your analysis (typically for lh and rh, and possibly also for mni305). Assuming the analyses were done in fsaverage template space (and there's no good reason anymore why they wouldn't be), then if you would like to download the bare minimum set of files required to inspect the subject-level analyses, then you can do so with the following script:
#!/bin/bash #usage: ./zip1stla.sh SUBJECT_ID ANALYSIS_DIR_1 [ANALYSIS_DIR_2 ... etc] #this first step is going to be to enforce that this only works when SUBJECTS_DIR is set cd ${SUBJECTS_DIR} #first param is subject id SUB=${1} shift #remaining params are analysis directories DIRS="$@" #we're going to clone the analysis directory structure cd ${SUB}/bold mkdir --parents ${SUB}/bold #iterate through analysis directories for DIR in "${DIRS[@]}"; do cp -r ${DIR} ${SUB}/bold/ done #zip up our cloned directory structure tar -czvf ${SUBJECTS_DIR}/${SUB}.1stla.tgz ${SUB} #delete the clone rm -rf ${SUB} #go back to where we started cd ${SUBJECTS_DIR}
If you were to copy/paste the above script to a file named zip1stla.sh and make it executable (chmod ug+x zip1stla.sh
) then you would run it this way:
#suppose my analysis directories are called FAM.sm6.lh, FAM.sm6.rh and FAM.sm6.mni zip1stla.sh FS_SUB01 FAM.sm6.lh FAM.sm6.rh FAM.sm6.mni
This will create a file called FS_SUB01.1stla.tgz. When you unzip the file, it will create a subject folder with the following structure:
- FS_SUB01
- bold
- FAM.sm6.lh
- {some files}
- FAM.sm6.rh
- {some files}
- FAM.sm6.mni
- {some files}
- FAM.sm6.lh
- bold
No other files will be included in the archive, which keeps the archive size to a minimum. If FS_SUB01 already exists, then the contents of this archive will be added to the existing directory. This can be useful if you previously used the method described above to archive a minimal set of FreeSurfer structural files. Note that the FreeSurfer structural files are not needed to view the first level GLM data if you ran the analysis in fsaverage space, because these data are mapped to the fsaverage template, which you will already have on your local machine if you have FreeSurfer installed.
Make a series of numbered directories
FreeSurfer BOLD data goes in a series of directories, numbered 001, 002, ... , 0nn. A one-liner of code to create these directories in the command line: for i in $(seq -f "%03g" 1 6); do mkdir ${i}; done #this will create directories 001 to 006. Obviously, if you need more directories, change the second value from 6 to something else
Protip: If you want to also make the runs
file that some of our scripts use at the same time, the above snippet can be modified:
for i in $(seq -f "%03g" 1 6); do mkdir ${i}; echo ${i} >> runs; done
Restart Window Manager
This has happened a couple times before: you step away from the computer for awhile (maybe even overnight) and when you come back, you find it is locked up and completely unresponsive. The nuclear option is to reboot the whole machine:
sudo shutdown -r now #Sad for anyone running autorecon or a neural network
Unfortunately, that will stop anything that might be running in the background. A less severe solution might be to just restart the window manager. To do this you will need to ssh into the locked-up computer from a different computer, and then restart the lightdm process. This will require superuser privileges.
ssh hostname
Then after you have connected to the frozen computer:
sudo restart lightdm
Any processes that were dependent on the window manager will be terminated (e.g., so if you had been in the middle of editing labels in tksurfer, you will find that tksurfer has been shutdown and you will need to start over), however anything that was running in the background (e.g., autorecon) should be unaffected.
Renaming Multiple Files
Rename Using rename
A perl command, called rename
might be available on your *nix system:
rename [OPTIONS] perlexpr files
Among useful options are the -n
flag, which just reports what all the file renames would be, but doesn't actually execute them.
A handy application of rename is to hide files and/or directories. Files with names beginning with a dot are hidden by default and don't show up in directory listings. This can be a handy way of excluding chunks of data from your scripts.
Use-Case: Hiding Session 2 Data
In our Multisensory Imagery experiment, we collect 6 runs at time points 1 and 2. If we wish to be able to analyze all the data, these would be stored together as runs 001 to 012. Suppose we wish to temporarily hide the second time point data:
rename -n 's/01/\._01/' `find ./ -type d -name "01*"`
This would find all the directories ("-type -d") named 01*, then it would show you how it would rename them. If everything looked right, you would execute the same command again, but omit the -n flag so that the renaming actually takes place. Note that this example only gets the 010, 011 and 012 directories. You would do something similar for directories 00[6-9].
Use-Case: Unhiding Directories
This one is easier, since all the hidden directories start with "._" using the approach described above:
rename 's/\._//' `find ./ -type d -name "._0*"`
In case you're curious about the syntax of the perl expression, you might want to read up a bit about regular expressions, but in this case, 's/\._//' indicates we are doing a substitution that will replace every instance of ._ with an empty string (//). The extra back-slash in front of the period is an escape character, which is needed because otherwise the dot (period) will be interpreted as a special character.
Rename Using mv
If you don't have access to the rename command (Mac OSX), you can fake it:
PREFIX=LO for file in `find . -name "*.txt"`; do mv ${file##*/} ${PREFIX}_${file##*/}; done
Source: [1]
Related Trick: Collecting and Renaming Multiple Files in Subdirectories
Use case: I ran a bunch of model simulations. Each batch of simulations produced a series of 8 Keras files named model_0x.h5, and stored in directories named batch_##/. 10 batches of simulations produced 80 model files, except that they all had the same names. I wanted to run some tests on the complete set, so I needed to aggregate all the files in a single directory, but rename them from 01 to 80:
for run in $(seq 1 10) do r=`printf "%02d" $run` echo "Gathering run $r files" for m in {1..8} do basemodel=`printf "%02d" $m` blockstart=$(( ($run-1)*8 )) newmodel=$(( $blockstart+$m )) cp batch_$r/model_$basemodel.h5 ./model_$newmodel.h5 done done
sed
Tricks
Replacing Text in Multiple Files
sed -i 's/oldtext/newtext/g' *.ext
Remove punctuation and convert to lowercase
$FILENAME=file.txt sed 's/[[:punct:]]//g' $FILENAME | sed $'s/\t//g' | tr '[:upper:]' '[:lower:]' > lowercase.$FILENAME
Archiving Specific Files in a Directory Tree
The tar
has an --include
switch which will archive only matching file patterns, however it appears that this filtering breaks when trying to archive files in subdirectories. Fortunately, the person who posed the question on StackExchange already had a workaround that works fine (it's just ugly):
find ./ -name "*.wav.txt" -print0 | tar -cvzf ~/adhd.tgz --null -T -
No idea what the -T does, nor what the trailing - does, but there you have it. This works. Just replace your file pattern with whatever it is you're filtering out, and of course specify an appropriate tgz archive name.
mysql on the terminal
So I learned tonight how to export query results to a text file from the shell interface. Note that MySQL server is running with the --secure-file-priv option enabled, so you can't just willy-nilly write files wherever you want. However /var/lib/mysql-files/ is fair game, so for example:
select * from conceptstats inner join concepts on conceptstats.concid=concepts.concid where pid=183 and norm=1 into outfile '/var/lib/mysql-files/0183.txt'