.. -*- encoding: utf-8 -*- ============= Unix Basics ============= We will talk to the computer using the *text-based terminal* also known as "the commandline". On a `Unix`_ -like operating systems (typically used for high-performance computing) this is a very powerful way to interact with the computer. On a typical *Linux* desktop system you open a "xterm" or "kterm" (or similar application) to get access to the commandline. On *Mac OS X* you open "Terminal.app" (in the Utilities folder in Applications). - We use the "bash" shell ``bash`` ("Bourne again shell", which replaces the Bourne shell ``sh``). Bash is a very good shell to write scripts in and to use on an every day basis. Pretty much everything written about ``sh`` also applies to ``bash`` (including the original and very readable `Introduction to the Unix Shell`_ written by Steve Bourne in 1978). - Bash is available on most modern Unix-like operating systems; it is the default shell on Linux and Mac OS X. - There are other shells out there (like ``csh`` and ``tcsh`` ("C-shells") or the Korn shell ``ksh``). We will not deal with them. In particular, the `C-shells should not be used for writing scripts`_, for `many good reasons`_. .. Links .. _Unix: http://en.wikipedia.org/wiki/Unix .. _`Introduction to the Unix Shell`: http://darthtater.asurite.ad.asu.edu/unix/shell.html .. _`C-shells should not be used for writing scripts`: http://www.faqs.org/faqs/unix-faq/shell/csh-whynot/ .. _`many good reasons`: http://www.grymoire.com/Unix/CshTop10.txt Command syntax: options and arguments:: command [-argument] [-argument OPTVAL] ... [--long-argument] [--long-argument=OPTVAL] [file ...] - single-letter, can be combined - long options - arguments: if not supplied, often read from "standard input" and output to "standard output" Help ==== Not all help functions described below are always available. Simply try them all. You need to learn to find out about commands by yourself. This introduction can only point you in the right direction and give you hints at to what could be useful to you. - built in help function:: command -h command --help command -? command - manual page ('man page'):: man command man -k search_phrase This is where to look when someone tells you to RTFM_. .. _RTFM: http://foldoc.org/RTFM - help function of the shell:: help command - info page (can be somewhat similar to man but can contain hyperlinks):: info command And of course, you can always search on the web. Just make sure you understand which version of the command is described. There can be huge differences. If in doubt, look if a command complies to the "POSIX standard", a lowest common denominator for various flavors of Unix (yes, there are many different `Unix-like operating systems`_ out there!). .. _`Unix-like operating systems`: http://en.wikipedia.org/wiki/File:Unix_history-simple.svg Navigating file system ====================== See where you are and move around:: pwd cd # cd to home cd ~ # cd to home: ~ stands for home cd directory cd . # cd to the current dir (mostly useful with -P/-L options) cd .. # cd to parent dir cd - # cd to previous dir (try it out) List files:: ls ls directory ls -l # long format ls -a # hidden files -- try 'ls -la' ls -ld # directory information -- 'ls -l ~' vs 'ls -ld ~' ls -R # recursive **Excercise** 1) use ``ls``:: ls ~/Documents ls ~/.. ls -lR / # takes forever, cancel with CTRL+C (= ^C) 2) create directory space:: ~/NAME/p01 p01/tmp p01/editor pdb docs 3) look at these special directories:: / # "root" of the file system /bin # commands ("binaries") are stored here (needed at system boot) /usr/bin # standard commands stored here /Volumes # Mac OS X specific: disks (and your USB flash drive) appear here Copy, renaming, deleting ======================== - download from http://darthtater.asurite.ad.asu.edu/PHY598/01/ ``vim_basics.txt`` - copy to your ``docs`` directory:: cp vim_basics.txt docs - move file to ``editor`` directory:: mv vim_basics.txt p01/editor - make a temporary copy:: cp docs/vim_basics.txt tmp cd tmp cp vim_basics.txt foo.txt mkdir tmp2 cd tmp2 mv ../foo.txt . # note '.' for current dir - remove the copy:: cd .. pwd rm vim_basics.txt - remove the tmp dir:: rmdir tmp2 # only empty dirs! ## FAILS rm -r tmp2 # deletes full dirs recursively (dangerous) .. Warning:: ``rm -r`` and ``rm -rf`` ("force", even override permissions) are very dangerous!! It deletes everything recusively. It does not ask and it does not keep a backup (no "Trash"). In general, Unix commands do what you ask them to. If they succeed they typically will not output anything (unless it's part of their job such as ``ls``). Only if there's a problem you'll get a (terse) message. Looking at files ================ Different ways to display a text file:: cat FILE less FILE # h for help, q for quit head FILE tail FILE Try ``head -5 vim_basics.txt``. For looking at log files of a running simulation, try :: tail -f output.log which will continuously update. Shell name generation ===================== "glob" patterns - ``*`` means any characters (even zero):: ls * ls *.txt ls /usr/bin/*grep - ``?`` means any single one character:: ls /usr/bin/?grep ``[x-y]`` means a range:: ls /usr/bin/[a-y]grep - Advanced: brace expansion :: head_{a,b,42,xx}_tail --> head_a_tail head_b_tail head_42_tail head_xx_tail Can be useful when making a complicated directory layout, e.g. simulations for three systems S1,S2,S3 at four temperatures 273, 300, 310, 373 and 2 pressures (1 atm and 1000 atm) (3*4*2 = 24 directories):: mkdir -p {S1,S2,S3}/T={273,300,310,373}K/P={1,1000}atm Will create :: S1/T=273K/P=1atm S1/T=273K/P=1000atm S1/T=300K/P=1atm S1/T=300K/P=1000atm ... Input/Output Redirection ======================== - commands read from standard input (*stdin*; by default, the terminal, i.e. the keyboard) and write to standard output (*stdout*):: stdin --> command ---> stdout By default, standard output is printed to the screen. (There's also a second output channel, called standard error (*stderr*), which is used for error messages. By default it is also sent to the screen. - redirection operators:: command > file # create/overwrite output file command >> file # append command < file # read contents of file into stdin - Note on Unix philosophy: "Everything is a file": a file is a file (of course) but whole disks are also files, the terminal is a file, memory can be treated as a file, a random number generator can appear as a file, ... Not overly important for the course but to keep at the back of your mind because it means that anything you learn about "real" files can be applied in a wider context! **Excercise** Run :: cat and type something, using return to finish lines... what happens? To end input, type :kbd:`CONTROL-D` (press control and 'D' at the same time; often written ":kbd:`^D`") to end input. (Knowing this is often useful.) .. Note:: To terminate a running command, use :kbd:`CONTROL-C` (:kbd:`^C`). ``cat`` reads your input from stdin (keyboard) and writes it to stdout (screen). example: cat as simple "editor":: cat > TODO - learn Unix - learn vi ^D less TODO example:: mkdir p01 cd p01 ls -R ~/Documents ls -R ~/Documents > Documents.lsR less Documents.lsR wc Documents.lsR cat Documents.lsR > double1.lsR # create/overwrite cat Documents.lsR >> double1.lsR # append wc double1.lsR or :: cat Documents.lsR Documents.lsR > double2.lsR wc double2.lsR wc can also read from stdin:: wc < double1.lsR What's the difference? **Excercise:** 1) find out how to only print the number of words with ``wc`` 2) create a new file with this number at the top and the remaining Documents.lsR following solution:: wc -w < Documents.lsR > Documents.NlsR cat Documents.lsR >> Documents.NlsR or :: (wc -w Documents.lsR; cat Documents.lsR) > Documents.NlsR ``( command; command; ... )`` runs a sequence of commands in a sub-shell whose output can again be redirected. Pipelines ========= "``|``" is the "pipe" character. It connects stdout from one command with stdin from another one:: command1 | command2 output of 1 is input of 2 (filter):: ls ~/Documents | wc One of the power of Unix comes from the fact that a Unix system contains many small programs that do one job particularly well and which can be strung together as filters in a pipeline. Useful filter commands: grep ---- ("get regular expression") Shows lines that match the expression REGEX:: command | grep 'REGEX' .. note:: It is generally a good idea to enclose REGEX in "hard quotes" (single quote character "'") so that the shell does not interprete special characters such as $ or ~. Simple REGEX ("basic regular expressions"):: word matches "word" literally anywhere ^word matches "word" at beginning of line word$ matches "word" at end of line a *b matches ab, a b, a b, i.e. ' *' is zero or more spaces a \+b matches a b, a b, ..., i.e. ' \+' is one or more spaces (NOTE: in "extended regular expressions" as used in 'egrep' this is just '+', i.e. 'a +b') a[A-Z]b matches aAb, aBb, ..., aZb (range expression) a[0-9][0-9]b a00b, a01b, a02b, ..., a99b a[A-Z]*b ab, aAb, ..., aZb a[A-Za-z]b aAb,..., aZb, aab, ..., azb a[^A-Z]b aab, axb, ab, a+b, ... ([^...] is a negation) a.b matches aXb a3b a_b a b but not ab: '.' stands for a single character a...b a123b aXYZb etc: ... are three characters a.*b ab a1b a12b a123b etc: .* is zero or more characters (this is used very often) a.\+b a1b a12b but not ab: .\+ is one or more characters (Regular expressions are amazingly useful but it takes some time to learn them. See 'man re_format' for the bare bones and various tutorials on the internet. The above barely scratches the surface.) Examples:: ls /usr/bin | grep lp ls /usr/bin | grep ^lp **Excercise** How many lp commands? :: ls /usr/bin | grep ^lp | wc -l sort ---- alphabetical or numerical sort, e.g. :: who | sort uniq ---- :: cat FILE | sort | uniq (note: ``uniq -c``: histogram) cut --- :: cut -c N-M,X-Y FILE --> data from cols N-M and X-Y cut -f 2,3 -d ' ' --> separate fields by space and print 1 and 2 (But for field splitting, ``awk`` works better (see below).) sed --- "stream editor": reads a file line by line and applies a sed-program to each line in turn. It is `rather complicated`_ and a typical use is to search and replace in a file:: cat FILE | sed 's/SEARCH/REPLACE/g' where SEARCH is a "basic regular expression" as for grep. .. Warning:: ``sed sed-program FILE > FILE`` will destroy ``FILE``. You must redirect to a temporary file, e.g. ``sed sed-program FILE > FILE.temp && mv FILE.temp FILE``. Modern versions of ``sed`` have the ``-i`` (inplace) option to take care of that. .. _rather complicated: http://sed.sourceforge.net/sedfaq.html awk --- ``awk`` also scans a file line-by-line and applies an awk program to each line. It is also fairly complicated (actually, ``awk`` is a full blown scripting language) but typical use is straight forward: ``awk`` splits the line into *fields* (separated by white space (i.e. space, tabs) and then allows you to access fields by the special variables ``$1`` (first field), ``$2`` (second field), etc. For most data files you can think of fields as columns. :: cat FILE | awk '/REGEX/ {awk-command; awk-command; ...}' e.g.:: ls -lR /usr/bin | awk '/grep$/ {print $9, $5/1024}' prints the file name and the size in kB instead of bytes but only of those commands that end in ``grep``. Filter excercises ================= Download 1AKE and 4AKE from the PDB_ (Protein Databank). Look at the files with ``less``. .. _PDB: http://www.pdb.org - search with the PDB code (e.g. "1ake") - download file (Files -> Download Files -> PDB File (gz)) - Or from the command line:: curl http://www.rcsb.org/pdb/files/1ake.pdb.gz -o 1ake.pdb.gz curl http://www.rcsb.org/pdb/files/4ake.pdb.gz -o 4ake.pdb.gz gunzip *.gz (``curl`` means "cat URL", i.e. by default it writes the file pointed to by URL to stdout --- ready to be used in a pipeline.) Or (if ``wget`` is installed):: wget http://www.rcsb.org/pdb/files/{1ake,4ake}.pdb.gz gunzip 1ake.pdb.gz Put files into your PDB dir. Look at files with ``less`` and recognize the `PDB file format`_. We are particularly interested in the `Coordinates Section`_ where individual atoms are listed together with their coordinates. Move to the ATOM_ and HETATM_ sections (use :kbd:`/` (e.g. :kbd:`/ATOM`) to search; press :kbd:`n` repeatedly to move forward through the matches.). 0) count the number of residues [#residues]_ *Hint*: each residue has exactly one CA atom; protein residues are stored with ATOM records. Other molecules are in HETATM. Bonus: How many residues in each chain? *Solution*: manual inspection of the file showed that there are only two chains, A and B, so we simply grep for those separately:: grep '^ATOM.*CA.* A ' 1ake.pdb | wc -l grep '^ATOM.*CA.* B ' 1ake.pdb | wc -l 214 (same for all) Total:: grep '^ATOM.*CA' 1ake.pdb | wc -l 428 1) histogram of residue names: how often does each amino acid occur in the protein? Are some rarer than others? - find the CA - use ``cut -c N-M`` to extract the name from the fixed (not white-space separated!) columns (check PDB ATOM_ specs for N-M) - use ``sort`` and ``uniq -c`` *Solution*:: cat 1ake.pdb | grep '^ATOM.*CA' | cut -c 18-20 | sort | uniq -c *Question*: How to some up the totals? :: cat 1ake.pdb | grep '^ATOM.*CA' | cut -c 18-20 | sort | uniq -c \ | awk '{sum+=$1}; END {print "total: ", sum}' .. rubric:: Footnotes .. [#residues] A protein is a polypeptide that is made up from a *linear* sequence of amino acids; each amino acid is called a *residue*. There are 20 natural (and frequently) occuring amino acids. Each has a `three-letter residue name`_. For instance, glycine is Gly, arginine is Arg, and Glutamine is Gln. .. _`three-letter residue name`: http://www.ncbi.nlm.nih.gov/Class/MLACourse/Modules/MolBioReview/iupac_aa_abbreviations.html .. _`PDB file format`: http://www.wwpdb.org/documentation/format33/v3.3.html .. _`Coordinates section`: http://www.wwpdb.org/documentation/format33/sect9.html .. _ATOM: http://www.wwpdb.org/documentation/format33/sect9.html#ATOM .. _HETATM: http://www.wwpdb.org/documentation/format33/sect9.html#HETATM Access rights (permissions) =========================== Take a detailed look at the output of a long file listing:: ls -la .... :: drwxr-xr-x 2 oliver oliver 68 Jan 11 02:34 tmp -rw-r--r-- 1 oliver oliver 495559 Jan 11 13:56 Documents.lsR uuugggooo owner group size date name d = directory (see man ls for the full list) r = read w = write x = execute Fields: u = user/owner = oliver g = group = oliver o = other Additionally, after the permissions there can also be a single character that shows if alternative access controls (such as Access Control Lists) are applicable. This is typically signified through a ``+`` sign. The ``ls -l`` command on Mac OS X also shows information about extended attributes (``@``), i.e. there exists meta data stored in the file system. This is only of concern if you copy the file to a non-Mac OS X formatted disk or USB flash drive because the "foreign" filesystem will not be able to store these extended attributes. chmod ----- change permissions:: chmod go-rwx FILE # make it fairly private chmod go+r FILE # let others read it chmod a+r FILE # let everyone read it Excercise: 1) remove execute permission from tmp dir. Try 'cd tmp' and 'ls tmp' 2) Fix the permissions so that everthing works again. Other useful commands ===================== ``df`` --- file system information ---------------------------------- ("display file system"):: df ``du`` --- file use in directories ---------------------------------- ("disk usage"):: du -s DIR compression ----------- compress (shrink file size without loosing information):: zip gzip bzip2 (may not be installed) uncompress:: unzip gunzip, zcat bunzip2 (These commands can typically be asked to take either a FILE as input or read from stdin. They can also write to stdout so that one can put a compress or uncompress step into a pipeline.) ``file`` --- guess file type ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: file FILE ``diff`` --- compare two files ------------------------------ :: diff FILE1 FILE2 The ``diff`` command (together with its cousin ``patch``) is very powerful when it comes to big software development projects. For us it is mostly useful to quickly compare two files. ``diff -U2``, a so-called "unified diff" is generally more readable than the standard diff output. Also look at ``sdiff``, which shows the differences in a side-by-side view (pipe the output through ``less``!). ``find`` --- find files ----------------------- Complicated syntax but can be extremely useful:: find . -name '*.txt' Find files over 1M in size:: find . -size +1M -ls ``history`` --- all the commands you typed ------------------------------------------ The shell remembers all the commands that you typed in the "history" (typically a hidden file ``~/.history`` or similarly named). This history allows you to - go back with the Cursor-up key to recall commands - search backwards with :kbd:`^R` and see the commands with :: history The history is actually truncated at :envvar:`HISTSIZE` (you can set the environment variable :envvar:`HISTSIZE` yourself: e.g. ``HISTSIZE=500; export HISTSIZE``) Downloading files via the commandline ------------------------------------- curl_ ("cat url") treats a URL as a file. It is a great tool and well worth learning:: curl URL | command curl URL -o FILENAME wget_ is straightforward for downloading files (but not installed on Mac OS X by default):: wget URL or :: wget URL -O NEWNAME .. _wget: http://www.gnu.org/software/wget/manual/html_node/index.html .. _curl: http://curl.haxx.se/docs/ **Excercise:** Download http://darthtater.asurite.ad.asu.edu/PHY598/01/vimqrc.pdf and move it into your docs directory, using the commandline:: cd ~/NAME/p01/docs curl http://darthtater.asurite.ad.asu.edu/PHY598/01/vimqrc.pdf -o vimqrc.pdf ``echo`` --- printing a string to stdout ---------------------------------------- If you want to see what the shell thinks of a expression or if you want to have a script output a message you can use the ``echo`` command:: echo "Hello world!" echo "nothing happens *" echo "all the files " * (Note that in the last line the shell expands ``*`` to all the files in the directory.) There's also the ``printf`` command, which is more versatile but less often used (see man page). Unix variables ============== Variables are containers to store content in. Bash knows simple variables and arrays. It does not distinguish between text and numbers (essentially, everything is treated as text and if needed interpreted as a number). Shell variables and variable expansion -------------------------------------- Assign *value* to the variable *NAME*:: NAME=value E.g:: WORK=$HOME/NAME/p01 ls $WORK TMP_DIR=$WORK/tmp ls ${TMP_DIR}/* # braces for variables unless only letters By convention we use uppercase letters for *NAME* but it can be any mixture between upper and lower case characters and numbers (though it can't start with a number). Stick to letters, numbers, underscores. The value is accessed ("expanded") by prefixing *NAME* with the ``$`` (dollar symbol). Variables behave differently in quotes:: echo "My work directory is $WORK" This will print something like :: My work directory is /Users/USERNAME/NAME/p01 The value is expanded inside strings with *double quotes* ("soft quotes"). Within *single quotes* the variable is *not* expanded (or "interpolated into the string") hence we call them "hard quotes":: echo 'The WORK variable can be accessed as $WORK' will print :: The WORK variable can be accessed as $WORK In order to keep special characters such as ``$`` in a string you can - in double quotes, prefix it with a single backslash ``\`` - put it in single quotes Environment variables --------------------- A number of variables are already set to certain values. These "environment variables" have special meaning. Examples :: echo $HOME echo $USER Only modify if you know what you are doing! Show the whole environment:: env env | less New environment variables are generated with :: export NAME=value or:: NAME=value export NAME .. Note:: In other shells (not bash), one has to use different commands, e.g. in csh and tcsh it is ``setenv NAME value``. PATH ---- :envvar:`PATH` is a very special environment variable. It lists the directories where commands are searched for. :: echo $PATH which ls The second command shows the full path of the ``ls`` command (which is simply a file in a bin directory). It's directory is also listed in :envvar:`PATH`. This is why one can simply type :: ls although one could alternatively use the full path to the command:: /bin/ls .. Note:: If an executable (e.g. a code that you compiled yourself) is not on the PATH then you will always have to provide the path name in order to execute it. * The shell searches directories for a command in the order listed in PATH. * Directories are separated by ':'. * If a command is not on PATH then one has to provide its path to the shell in order to run it. * It can make sense to '.' to the path to be able to run executables in the current directory. Changes to PATH are typically done in the shell startup file ``~/.bashrc``. E.g. adding your own bin directory:: export PATH=$PATH:$HOME/NAME/bin General Unix gotchas ==================== - Unix is *generally* case-sensitive (however, Mac OS X is typically *not*!) - directories are separated by the forward slash ``/`` - filenames: *may* contain any characters, however: - special characters need to be quoted, using - backslash: \ - "soft" quotes: " " - "hard" quotes: ' ' - **Avoid** spaces , slashes ``/``, backslashes ``\``, dollar sign ``$``, ampersand ``&``, question mark ``?``, parentheses ``()``, square ``[]`` and curly brackets ``{}``, pipe ``|``, binary relations ``>``, ``<``, back tick ``\``` --- it will make your life difficult. Also avoid non-english (i.e. non ASCII) letters such as German umlauts (äöü ...) or accented characters (éîè ...) or special symbols (©≠–† ...). - Good: standard letters, numbers, underscore ``_``, dot ``.``, dash ``-``. The equality sign ``=`` can be used but it can lead to confusion. Setting up an editor ==================== We will use the ``vi`` editor (actually, the editor is really called ``vim`` but it will appear as "the vi editor"... there's longish Unix history behind this , which does not need to concern us.) ``vi`` is very powerful and available on any Unix system. It's learning curve is fairly steep, though, but "vi" is a very useful skill to have. vi/vim ------ Go through :ref:`vim-label` (:doc:`vim_basics.txt`). If this is not sufficient consider doing ``vimtutor`` (takes about 20-30 mins): On the commandline type :: vimtutor and follow the instructions. nano ---- ``nano`` is a light-weight editor available on most modern Unix-like systems. If you absolutely hate ``vi`` (some people do) then you can try this one. It has fewer features than ``vi`` and requires a few more customizations in order to provide a comparable experience. To learn more about ``nano``, launch it and read the help (:kbd:`^G`, i.e. CTRL-G) and have a look at `nano's homepage`_ .. _`nano's homepage`: http://www.nano-editor.org/ .. _`nanorc (5)`: http://www.nano-editor.org/dist/v2.2/nanorc.5.html - customize ``~/.nanorc`` (see also `nanorc (5)`_). Example:: ## Backup files to filename~. set backup ## Enable ~/.nano_history for saving and reading search/replace strings. set historylog ## The opening and closing brackets that can be found by bracket ## searches. They cannot contain blank characters. The former set must ## come before the latter set, and both must be in the same order. ## set matchbrackets "(<[{)>]}" ## Enable mouse support, if available for your system. When enabled, ## mouse clicks can be used to place the cursor, set the mark (with a ## double click), and execute shortcuts. The mouse will work in the X ## Window System, and on the console when gpm is running. ## set mouse ## Use smooth scrolling as the default. # set smooth ## Constantly display the cursor position in the statusbar. Note that ## this overrides "quickblank". set const ## For python ## Use this tab size instead of the default; it must be greater than 0. set tabsize 4 ## Convert typed tabs to spaces. set tabstospaces ## Use auto-indentation. set autoindent - add syntax highlighting: Mac OS X's ``nano`` misses the files that describe syntax highlighting. You can download them from http://darthtater.asurite.ad.asu.edu/PHY598/nanorc.tar.gz . Unpack into a new directory ``~/.nano``. Or do it in one go:: mkdir ~/.nano cd ~/.nano curl http://darthtater.asurite.ad.asu.edu/PHY598/nanorc.tar.gz | tar zxvf - (Note how the ``tar`` command can read from stdin (``f -``) and ``curl`` provides the archive to stdin.) Now you have to add the enable the syntax highlighting in your ``~/.nanorc`` file. Instead of doing this manually we use a simple ``for`` loop:: cd ~/.nano (for f in *.nanorc; do echo "include \"~/.nano/$f\""; done) >> ~/.nanorc That appends the correct commands such as :: include "~/.nano/asm.nanorc" include "~/.nano/awk.nanorc" include "~/.nano/python.nanorc" ... to the configuration file,