We will talk to the computer using the text-based terminal also known as “the commandline”. On a Unix -like operating systems (typically used for high-performance computing) this is a very powerful way to interact with the computer.
On a typical Linux desktop system you open a “xterm” or “kterm” (or similar application) to get access to the commandline. On Mac OS X you open “Terminal.app” (in the Utilities folder in Applications).
Command syntax: options and arguments:
command [-argument] [-argument OPTVAL] ... [--long-argument] [--long-argument=OPTVAL] [file ...]
Not all help functions described below are always available. Simply try them all. You need to learn to find out about commands by yourself. This introduction can only point you in the right direction and give you hints at to what could be useful to you.
built in help function:
command -h
command --help
command -?
command
manual page (‘man page’):
man command
man -k search_phrase
This is where to look when someone tells you to RTFM.
help function of the shell:
help command
info page (can be somewhat similar to man but can contain hyperlinks):
info command
And of course, you can always search on the web. Just make sure you understand which version of the command is described. There can be huge differences. If in doubt, look if a command complies to the “POSIX standard”, a lowest common denominator for various flavors of Unix (yes, there are many different Unix-like operating systems out there!).
download from PHY598/01/ vim_basics.txt
copy to your docs directory:
cp vim_basics.txt docs
move file to editor directory:
mv vim_basics.txt p01/editor
make a temporary copy:
cp docs/vim_basics.txt tmp
cd tmp
cp vim_basics.txt foo.txt
mkdir tmp2
cd tmp2
mv ../foo.txt . # note '.' for current dir
remove the copy:
cd ..
pwd
rm vim_basics.txt
remove the tmp dir:
rmdir tmp2 # only empty dirs!
## FAILS
rm -r tmp2 # deletes full dirs recursively (dangerous)
Warning
rm -r and rm -rf (“force”, even override permissions) are very dangerous!! It deletes everything recusively. It does not ask and it does not keep a backup (no “Trash”).
In general, Unix commands do what you ask them to. If they succeed they typically will not output anything (unless it’s part of their job such as ls). Only if there’s a problem you’ll get a (terse) message.
Different ways to display a text file:
cat FILE
less FILE # h for help, q for quit
head FILE
tail FILE
Try head -5 vim_basics.txt. For looking at log files of a running simulation, try
tail -f output.log
which will continuously update.
“glob” patterns
* means any characters (even zero):
ls *
ls *.txt
ls /usr/bin/*grep
? means any single one character:
ls /usr/bin/?grep
[x-y] means a range:
ls /usr/bin/[a-y]grep
Advanced: brace expansion
head_{a,b,42,xx}_tail --> head_a_tail
head_b_tail
head_42_tail
head_xx_tail
Can be useful when making a complicated directory layout, e.g. simulations for three systems S1,S2,S3 at four temperatures 273, 300, 310, 373 and 2 pressures (1 atm and 1000 atm) (3*4*2 = 24 directories):
mkdir -p {S1,S2,S3}/T={273,300,310,373}K/P={1,1000}atm
Will create
S1/T=273K/P=1atm
S1/T=273K/P=1000atm
S1/T=300K/P=1atm
S1/T=300K/P=1000atm
...
commands read from standard input (stdin; by default, the terminal, i.e. the keyboard) and write to standard output (stdout):
stdin --> command ---> stdout
By default, standard output is printed to the screen.
(There’s also a second output channel, called standard error (stderr), which is used for error messages. By default it is also sent to the screen.
redirection operators:
command > file # create/overwrite output file
command >> file # append
command < file # read contents of file into stdin
Note on Unix philosophy: “Everything is a file”: a file is a file (of course) but whole disks are also files, the terminal is a file, memory can be treated as a file, a random number generator can appear as a file, ... Not overly important for the course but to keep at the back of your mind because it means that anything you learn about “real” files can be applied in a wider context!
Excercise
Run
cat
and type something, using return to finish lines... what happens?
To end input, type CONTROL-D (press control and ‘D’ at the same time; often written “^D”) to end input. (Knowing this is often useful.)
Note
To terminate a running command, use CONTROL-C (^C).
cat reads your input from stdin (keyboard) and writes it to stdout (screen).
example: cat as simple “editor”:
cat > TODO
- learn Unix
- learn vi
^D
less TODO
example:
mkdir p01
cd p01
ls -R ~/Documents
ls -R ~/Documents > Documents.lsR
less Documents.lsR
wc Documents.lsR
cat Documents.lsR > double1.lsR # create/overwrite
cat Documents.lsR >> double1.lsR # append
wc double1.lsR
or
cat Documents.lsR Documents.lsR > double2.lsR
wc double2.lsR
wc can also read from stdin:
wc < double1.lsR
What’s the difference?
Excercise:
or
(wc -w Documents.lsR; cat Documents.lsR) > Documents.NlsR
( command; command; ... ) runs a sequence of commands in a sub-shell whose output can again be redirected.
“|” is the “pipe” character. It connects stdout from one command with stdin from another one:
command1 | command2
output of 1 is input of 2 (filter):
ls ~/Documents | wc
One of the power of Unix comes from the fact that a Unix system contains many small programs that do one job particularly well and which can be strung together as filters in a pipeline.
Useful filter commands:
(“get regular expression”)
Shows lines that match the expression REGEX:
command | grep 'REGEX'
Note
It is generally a good idea to enclose REGEX in “hard quotes” (single quote character “’”) so that the shell does not interprete special characters such as $ or ~.
Simple REGEX (“basic regular expressions”):
word matches "word" literally anywhere
^word matches "word" at beginning of line
word$ matches "word" at end of line
a *b matches ab, a b, a b, i.e. ' *' is zero or more
spaces
a \+b matches a b, a b, ..., i.e. ' \+' is one or more
spaces (NOTE: in "extended regular expressions"
as used in 'egrep' this is just '+', i.e. 'a +b')
a[A-Z]b matches aAb, aBb, ..., aZb (range expression)
a[0-9][0-9]b a00b, a01b, a02b, ..., a99b
a[A-Z]*b ab, aAb, ..., aZb
a[A-Za-z]b aAb,..., aZb, aab, ..., azb
a[^A-Z]b aab, axb, ab, a+b, ... ([^...] is a negation)
a.b matches aXb a3b a_b a b but not ab: '.' stands for
a single character
a...b a123b aXYZb etc: ... are three characters
a.*b ab a1b a12b a123b etc: .* is zero or more characters
(this is used very often)
a.\+b a1b a12b but not ab: .\+ is one or more characters
(Regular expressions are amazingly useful but it takes some time to learn them. See ‘man re_format’ for the bare bones and various tutorials on the internet. The above barely scratches the surface.)
Examples:
ls /usr/bin | grep lp
ls /usr/bin | grep ^lp
Excercise
How many lp commands?
ls /usr/bin | grep ^lp | wc -l
cut -c N-M,X-Y FILE --> data from cols N-M and X-Y
cut -f 2,3 -d ' ' --> separate fields by space and print 1 and 2
(But for field splitting, awk works better (see below).)
“stream editor”: reads a file line by line and applies a sed-program to each line in turn. It is rather complicated and a typical use is to search and replace in a file:
cat FILE | sed 's/SEARCH/REPLACE/g'
where SEARCH is a “basic regular expression” as for grep.
Warning
sed sed-program FILE > FILE will destroy FILE. You must redirect to a temporary file, e.g. sed sed-program FILE > FILE.temp && mv FILE.temp FILE. Modern versions of sed have the -i (inplace) option to take care of that.
awk also scans a file line-by-line and applies an awk program to each line. It is also fairly complicated (actually, awk is a full blown scripting language) but typical use is straight forward: awk splits the line into fields (separated by white space (i.e. space, tabs) and then allows you to access fields by the special variables $1 (first field), $2 (second field), etc. For most data files you can think of fields as columns.
cat FILE | awk '/REGEX/ {awk-command; awk-command; ...}'
e.g.:
ls -lR /usr/bin | awk '/grep$/ {print $9, $5/1024}'
prints the file name and the size in kB instead of bytes but only of those commands that end in grep.
Download 1AKE and 4AKE from the PDB (Protein Databank). Look at the files with less.
search with the PDB code (e.g. “1ake”)
download file (Files -> Download Files -> PDB File (gz))
Or from the command line:
curl http://www.rcsb.org/pdb/files/1ake.pdb.gz -o 1ake.pdb.gz
curl http://www.rcsb.org/pdb/files/4ake.pdb.gz -o 4ake.pdb.gz
gunzip *.gz
(curl means “cat URL”, i.e. by default it writes the file pointed to by URL to stdout — ready to be used in a pipeline.)
Or (if wget is installed):
wget http://www.rcsb.org/pdb/files/{1ake,4ake}.pdb.gz
gunzip 1ake.pdb.gz
Put files into your PDB dir.
Look at files with less and recognize the PDB file format.
We are particularly interested in the Coordinates Section where individual atoms are listed together with their coordinates. Move to the ATOM and HETATM sections (use / (e.g. /ATOM) to search; press n repeatedly to move forward through the matches.).
count the number of residues [1]
Hint: each residue has exactly one CA atom; protein residues are stored with ATOM records. Other molecules are in HETATM.
Bonus: How many residues in each chain?
Solution: manual inspection of the file showed that there are only two chains, A and B, so we simply grep for those separately:
grep '^ATOM.*CA.* A ' 1ake.pdb | wc -l
grep '^ATOM.*CA.* B ' 1ake.pdb | wc -l
214 (same for all)
Total:
grep '^ATOM.*CA' 1ake.pdb | wc -l
428
histogram of residue names: how often does each amino acid occur in the protein? Are some rarer than others?
Solution:
cat 1ake.pdb | grep '^ATOM.*CA' | cut -c 18-20 | sort | uniq -c
Question: How to some up the totals?
cat 1ake.pdb | grep '^ATOM.*CA' | cut -c 18-20 | sort | uniq -c \
| awk '{sum+=$1}; END {print "total: ", sum}'
Footnotes
[1] | A protein is a polypeptide that is made up from a linear sequence of amino acids; each amino acid is called a residue. There are 20 natural (and frequently) occuring amino acids. Each has a three-letter residue name. For instance, glycine is Gly, arginine is Arg, and Glutamine is Gln. |
Take a detailed look at the output of a long file listing:
ls -la ....
drwxr-xr-x 2 oliver oliver 68 Jan 11 02:34 tmp
-rw-r--r-- 1 oliver oliver 495559 Jan 11 13:56 Documents.lsR
uuugggooo owner group size date name
d = directory (see man ls for the full list)
r = read
w = write
x = execute
Fields:
u = user/owner = oliver
g = group = oliver
o = other
Additionally, after the permissions there can also be a single character that shows if alternative access controls (such as Access Control Lists) are applicable. This is typically signified through a + sign. The ls -l command on Mac OS X also shows information about extended attributes (@), i.e. there exists meta data stored in the file system. This is only of concern if you copy the file to a non-Mac OS X formatted disk or USB flash drive because the “foreign” filesystem will not be able to store these extended attributes.
change permissions:
chmod go-rwx FILE # make it fairly private
chmod go+r FILE # let others read it
chmod a+r FILE # let everyone read it
Excercise:
compress (shrink file size without loosing information):
zip
gzip
bzip2 (may not be installed)
uncompress:
unzip
gunzip, zcat
bunzip2
(These commands can typically be asked to take either a FILE as input or read from stdin. They can also write to stdout so that one can put a compress or uncompress step into a pipeline.)
file FILE
diff FILE1 FILE2
The diff command (together with its cousin patch) is very powerful when it comes to big software development projects. For us it is mostly useful to quickly compare two files. diff -U2, a so-called “unified diff” is generally more readable than the standard diff output. Also look at sdiff, which shows the differences in a side-by-side view (pipe the output through less!).
Complicated syntax but can be extremely useful:
find . -name '*.txt'
Find files over 1M in size:
find . -size +1M -ls
The shell remembers all the commands that you typed in the “history” (typically a hidden file ~/.history or similarly named). This history allows you to
and see the commands with
history
The history is actually truncated at HISTSIZE (you can set the environment variable HISTSIZE yourself: e.g. HISTSIZE=500; export HISTSIZE)
curl (“cat url”) treats a URL as a file. It is a great tool and well worth learning:
curl URL | command
curl URL -o FILENAME
wget is straightforward for downloading files (but not installed on Mac OS X by default):
wget URL
or
wget URL -O NEWNAME
Excercise: Download PHY598/01/vimqrc.pdf and move it into your docs directory, using the commandline:
cd ~/NAME/p01/docs
curl http://becksteinlab.physics.asu.edu/pages/courses/2012/PHY598/01/vimqrc.pdf -o vimqrc.pdf
If you want to see what the shell thinks of a expression or if you want to have a script output a message you can use the echo command:
echo "Hello world!"
echo "nothing happens *"
echo "all the files " *
(Note that in the last line the shell expands * to all the files in the directory.)
There’s also the printf command, which is more versatile but less often used (see man page).
Variables are containers to store content in. Bash knows simple variables and arrays. It does not distinguish between text and numbers (essentially, everything is treated as text and if needed interpreted as a number).
Assign value to the variable NAME:
NAME=value
E.g:
WORK=$HOME/NAME/p01
ls $WORK
TMP_DIR=$WORK/tmp
ls ${TMP_DIR}/* # braces for variables unless only letters
By convention we use uppercase letters for NAME but it can be any mixture between upper and lower case characters and numbers (though it can’t start with a number). Stick to letters, numbers, underscores.
The value is accessed (“expanded”) by prefixing NAME with the $ (dollar symbol).
Variables behave differently in quotes:
echo "My work directory is $WORK"
This will print something like
My work directory is /Users/USERNAME/NAME/p01
The value is expanded inside strings with double quotes (“soft quotes”). Within single quotes the variable is not expanded (or “interpolated into the string”) hence we call them “hard quotes”:
echo 'The WORK variable can be accessed as $WORK'
will print
The WORK variable can be accessed as $WORK
In order to keep special characters such as $ in a string you can
A number of variables are already set to certain values. These “environment variables” have special meaning. Examples
echo $HOME
echo $USER
Only modify if you know what you are doing!
Show the whole environment:
env
env | less
New environment variables are generated with
export NAME=value
or:
NAME=value
export NAME
Note
In other shells (not bash), one has to use different commands, e.g. in csh and tcsh it is setenv NAME value.
PATH is a very special environment variable. It lists the directories where commands are searched for.
echo $PATH
which ls
The second command shows the full path of the ls command (which is simply a file in a bin directory). It’s directory is also listed in PATH. This is why one can simply type
ls
although one could alternatively use the full path to the command:
/bin/ls
Note
If an executable (e.g. a code that you compiled yourself) is not on the PATH then you will always have to provide the path name in order to execute it.
Changes to PATH are typically done in the shell startup file ~/.bashrc. E.g. adding your own bin directory:
export PATH=$PATH:$HOME/NAME/bin
Unix is generally case-sensitive (however, Mac OS X is typically not!)
directories are separated by the forward slash /
filenames: may contain any characters, however:
special characters need to be quoted, using
- backslash:
- “soft” quotes: ” “
- “hard” quotes: ‘ ‘
Avoid spaces , slashes /, backslashes \, dollar sign $, ampersand &, question mark ?, parentheses (), square [] and curly brackets {}, pipe |, binary relations >, <, back tick \` — it will make your life difficult.
Also avoid non-english (i.e. non ASCII) letters such as German umlauts (äöü ...) or accented characters (éîè ...) or special symbols (©≠–† ...).
Good: standard letters, numbers, underscore _, dot ., dash -. The equality sign = can be used but it can lead to confusion.
We will use the vi editor (actually, the editor is really called vim but it will appear as “the vi editor”... there’s longish Unix history behind this , which does not need to concern us.) vi is very powerful and available on any Unix system. It’s learning curve is fairly steep, though, but “vi” is a very useful skill to have.
Go through vi/vim essentials (vim_basics.txt).
If this is not sufficient consider doing vimtutor (takes about 20-30 mins): On the commandline type
vimtutor
and follow the instructions.
nano is a light-weight editor available on most modern Unix-like systems. If you absolutely hate vi (some people do) then you can try this one. It has fewer features than vi and requires a few more customizations in order to provide a comparable experience.
To learn more about nano, launch it and read the help (^G, i.e. CTRL-G) and have a look at nano’s homepage
customize ~/.nanorc (see also nanorc (5)). Example:
## Backup files to filename~.
set backup
## Enable ~/.nano_history for saving and reading search/replace strings.
set historylog
## The opening and closing brackets that can be found by bracket
## searches. They cannot contain blank characters. The former set must
## come before the latter set, and both must be in the same order.
##
set matchbrackets "(<[{)>]}"
## Enable mouse support, if available for your system. When enabled,
## mouse clicks can be used to place the cursor, set the mark (with a
## double click), and execute shortcuts. The mouse will work in the X
## Window System, and on the console when gpm is running.
##
set mouse
## Use smooth scrolling as the default.
# set smooth
## Constantly display the cursor position in the statusbar. Note that
## this overrides "quickblank".
set const
## For python
## Use this tab size instead of the default; it must be greater than 0.
set tabsize 4
## Convert typed tabs to spaces.
set tabstospaces
## Use auto-indentation.
set autoindent
add syntax highlighting: Mac OS X’s nano misses the files that describe syntax highlighting. You can download them from http://becksteinlab.physics.asu.edu/pages/courses/2012/PHY598/nanorc.tar.gz . Unpack into a new directory ~/.nano. Or do it in one go:
mkdir ~/.nano
cd ~/.nano
curl http://becksteinlab.physics.asu.edu/pages/courses/2012/PHY598/nanorc.tar.gz | tar zxvf -
(Note how the tar command can read from stdin (f -) and curl provides the archive to stdin.)
Now you have to add the enable the syntax highlighting in your ~/.nanorc file. Instead of doing this manually we use a simple for loop:
cd ~/.nano
(for f in *.nanorc; do echo "include \"~/.nano/$f\""; done) >> ~/.nanorc
That appends the correct commands such as
include "~/.nano/asm.nanorc"
include "~/.nano/awk.nanorc"
include "~/.nano/python.nanorc"
...
to the configuration file,