BASH Tricks: Difference between revisions

From CCN Wiki
Jump to navigation Jump to search
Line 14: Line 14:
This lists in 1 column all the directories (-1 -d) and uses ''sed'' to snip off the trailing forward slashes in the directory names
This lists in 1 column all the directories (-1 -d) and uses ''sed'' to snip off the trailing forward slashes in the directory names


== Making a <code>tar</code> archive containing only the minimal fileset for FreeSurfer ==
== Making a <code>tar</code> archive containing only the minimal set of structural files for FreeSurfer ==
FreeSurfer makes a zillion files. I have no idea what most of them are there for. Which do we need? The absolute minimal list is a work in progress, but I have made a file called '''fsaverage.required''' (copied to the ubfs scripts directory) based on the contents of the fsaverage template subject directory contents. I dropped the obvious directories (e.g., mri 2mm), so what's left should hopefully be close to the minimal required set for getting things done with a subject. The idea is to reduce the number of superfluous files that you store or copy over the network so that we don't waste as much time and disk space with useless nonsense.  
FreeSurfer makes a zillion files during the recon-all step. I have no idea what most of them are there for. Which do we need? The absolute minimal list is a work in progress, but I have made a file called '''fsaverage.required''' (copied to the ubfs scripts directory) based on the contents of the fsaverage template subject directory contents. I dropped the obvious directories (e.g., mri 2mm), so what's left should hopefully be close to the minimal required set for getting things done with a subject. The idea is to reduce the number of superfluous files that you store or copy over the network so that we don't waste as much time and disk space with useless nonsense.  


So here's what you do (note the text in red will vary - don't just blindly copy the code snippets below and expect them to work; that's how Chernobyl happens! You need to understand what you're doing):
So here's what you do (note the text in red will vary - don't just blindly copy the code snippets below and expect them to work; that's how Chernobyl happens! You need to understand what you're doing):
Line 36: Line 36:
  #next line unzips the file archive into the empty directory
  #next line unzips the file archive into the empty directory
  tar -xzvf <span style="color:red">sub-001.minimal.tgz</span>
  tar -xzvf <span style="color:red">sub-001.minimal.tgz</span>
With the minimal set of structural files, you should be able to unzip the surface and T1 anatomical files and inspect for reconstruction accuracy, or add BOLD files from elsewhere (the BOLD files are what ''really'' does you in, and I've developed a similar procedure to grab your blob analysis files)


== Make a series of numbered directories ==
== Make a series of numbered directories ==

Revision as of 21:12, 25 March 2020

How many lines in my text file?

Totally useful when you have some kind of training file with many rows and columns:

FILENAME=myfile.csv
nl ${FILENAME} | awk '{ print $1 }'

I want to drop the first line of my text file

tail echoes the last n lines (default: 10) of a text file to stdout. Using the -n flag flips it around so that it echoes back all up to the last n lines of the file. So -n +2 will echo back the file up to the 2nd line of the file (i.e., dropping the first line). We can pipe this to a temp file (so we don't write out an empty file), and then rename:

tail -n +2 "$FILE" > "$FILE.tmp" && mv "$FILE.tmp" "$FILE"

PROTIP: I'm pretty sure the same trick applies using the head command to drop the last n lines from a file.

Make a list of directory names

We often organize subject data so that each subject gets their own directory. Freesurfer uses a subjects file when batch processing. Rather than manually type out each folder name into a text file, it can be generated in one line of code:

ls -1 -d */ | sed "s,/$,," >  subjects

This lists in 1 column all the directories (-1 -d) and uses sed to snip off the trailing forward slashes in the directory names

Making a tar archive containing only the minimal set of structural files for FreeSurfer

FreeSurfer makes a zillion files during the recon-all step. I have no idea what most of them are there for. Which do we need? The absolute minimal list is a work in progress, but I have made a file called fsaverage.required (copied to the ubfs scripts directory) based on the contents of the fsaverage template subject directory contents. I dropped the obvious directories (e.g., mri 2mm), so what's left should hopefully be close to the minimal required set for getting things done with a subject. The idea is to reduce the number of superfluous files that you store or copy over the network so that we don't waste as much time and disk space with useless nonsense.

So here's what you do (note the text in red will vary - don't just blindly copy the code snippets below and expect them to work; that's how Chernobyl happens! You need to understand what you're doing):

  1. Copy fsaverage.required to $SUBJECTS_DIR
  2. Inspect fsaverage.required to make sure that it has any idiosyncratic files that you might wish to include
    • e.g., the original version only includes f.nii.gz files. If you want to also grab all your preprocessed .mgz files, then you'll want to include *.mgz up at the top. Save any changes.
  3. Navigate to a subject directory: cd FS_sub-001
  4. The following command will use the files listed in $SUBJECTS_DIR/fsaverage.required to find and archive the desired files for this subject:
    • tar -czvf sub-001.minimal.tgz `find . | grep -G -f ${SUBJECTS_DIR}/fsaverage.required`
  5. When you're done, you'll have the bare-bones minimum files to permit FS-FAST analyses of your BOLD data for your subject.
  6. You can copy the .tgz files to an external drive or over the network. Be sure to unpack the .tgz archive in an empty subject directory
    • e.g.:
mkdir ~/new_project/   #starting a new project directory - in this case on the same computer, but it could be anywhere
cd new_project            #enter the new project directory
mkdir FS_sub-001       #making an empty subject directory for the files we're about to unpack
cd FS_sub-001            #navigate into the new empty subject directory
 #next line copies the minimal file archive from the source directory into the new empty subject directory
cp ${SUBJECTS_DIR}/FS_sub-001/sub-001.minimal.tgz ./
#next line unzips the file archive into the empty directory
tar -xzvf sub-001.minimal.tgz

With the minimal set of structural files, you should be able to unzip the surface and T1 anatomical files and inspect for reconstruction accuracy, or add BOLD files from elsewhere (the BOLD files are what really does you in, and I've developed a similar procedure to grab your blob analysis files)

Make a series of numbered directories

FreeSurfer BOLD data goes in a series of directories, numbered 001, 002, ... , 0nn. A one-liner of code to create these directories in the command line:

for i in $(seq -f "%03g" 1 6); do mkdir ${i}; done
#this will create directories 001 to 006. Obviously, if you need more directories, change the second value from 6 to something else

Protip: If you want to also make the runs file that some of our scripts use at the same time, the above snippet can be modified:

 for i in $(seq -f "%03g" 1 6); do mkdir ${i}; echo ${i} >> runs; done

Restart Window Manager

This has happened a couple times before: you step away from the computer for awhile (maybe even overnight) and when you come back, you find it is locked up and completely unresponsive. The nuclear option is to reboot the whole machine:

sudo shutdown -r now #Sad for anyone running autorecon or a neural network

Unfortunately, that will stop anything that might be running in the background. A less severe solution might be to just restart the window manager. To do this you will need to ssh into the locked-up computer from a different computer, and then restart the lightdm process. This will require superuser privileges.

ssh hostname

Then after you have connected to the frozen computer:

sudo restart lightdm

Any processes that were dependent on the window manager will be terminated (e.g., so if you had been in the middle of editing labels in tksurfer, you will find that tksurfer has been shutdown and you will need to start over), however anything that was running in the background (e.g., autorecon) should be unaffected.

Renaming Multiple Files

Rename Using rename

A perl command, called rename might be available on your *nix system:

rename [OPTIONS] perlexpr files

Among useful options are the -n flag, which just reports what all the file renames would be, but doesn't actually execute them. A handy application of rename is to hide files and/or directories. Files with names beginning with a dot are hidden by default and don't show up in directory listings. This can be a handy way of excluding chunks of data from your scripts.

Use-Case: Hiding Session 2 Data

In our Multisensory Imagery experiment, we collect 6 runs at time points 1 and 2. If we wish to be able to analyze all the data, these would be stored together as runs 001 to 012. Suppose we wish to temporarily hide the second time point data:

rename -n 's/01/\._01/' `find ./ -type d -name "01*"`

This would find all the directories ("-type -d") named 01*, then it would show you how it would rename them. If everything looked right, you would execute the same command again, but omit the -n flag so that the renaming actually takes place. Note that this example only gets the 010, 011 and 012 directories. You would do something similar for directories 00[6-9].

Use-Case: Unhiding Directories

This one is easier, since all the hidden directories start with "._" using the approach described above:

rename 's/\._//' `find ./ -type d -name "._0*"`

In case you're curious about the syntax of the perl expression, you might want to read up a bit about regular expressions, but in this case, 's/\._//' indicates we are doing a substitution that will replace every instance of ._ with an empty string (//). The extra back-slash in front of the period is an escape character, which is needed because otherwise the dot (period) will be interpreted as a special character.

Rename Using mv

If you don't have access to the rename command (Mac OSX), you can fake it:

PREFIX=LO
for file in `find . -name "*.txt"`; do mv ${file##*/} ${PREFIX}_${file##*/}; done

Source: [1]

Related Trick: Collecting and Renaming Multiple Files in Subdirectories

Use case: I ran a bunch of model simulations. Each batch of simulations produced a series of 8 Keras files named model_0x.h5, and stored in directories named batch_##/. 10 batches of simulations produced 80 model files, except that they all had the same names. I wanted to run some tests on the complete set, so I needed to aggregate all the files in a single directory, but rename them from 01 to 80:

for run in $(seq 1 10) 
do
       r=`printf "%02d" $run`
       echo "Gathering run $r files"
       for m in {1..8}
       do
               basemodel=`printf "%02d" $m`
               blockstart=$(( ($run-1)*8 ))
               newmodel=$(( $blockstart+$m ))
              cp batch_$r/model_$basemodel.h5 ./model_$newmodel.h5 
       done
done

sed Tricks

Replacing Text in Multiple Files

sed -i 's/oldtext/newtext/g' *.ext

Remove punctuation and convert to lowercase

$FILENAME=file.txt
sed 's/[[:punct:]]//g' $FILENAME | sed $'s/\t//g' | tr '[:upper:]' '[:lower:]'  > lowercase.$FILENAME

Archiving Specific Files in a Directory Tree

The tar has an --include switch which will archive only matching file patterns, however it appears that this filtering breaks when trying to archive files in subdirectories. Fortunately, the person who posed the question on StackExchange already had a workaround that works fine (it's just ugly):

 find ./ -name "*.wav.txt" -print0 | tar -cvzf ~/adhd.tgz --null -T -

No idea what the -T does, nor what the trailing - does, but there you have it. This works. Just replace your file pattern with whatever it is you're filtering out, and of course specify an appropriate tgz archive name.

mysql on the terminal

So I learned tonight how to export query results to a text file from the shell interface. Note that MySQL server is running with the --secure-file-priv option enabled, so you can't just willy-nilly write files wherever you want. However /var/lib/mysql-files/ is fair game, so for example:

select * from conceptstats inner join concepts on conceptstats.concid=concepts.concid where pid=183 and norm=1 into outfile '/var/lib/mysql-files/0183.txt'