Wednesday, 21 March 2012

Getting the Best from bash

Many people coming to Linux (or the Mac, for that matter, which also has bash) have only ever used Windows - they don't remember good old DOS and the days of the command line. But Linux, like most UNIXes, favours those who work at the command line. There are lots of labour-saving techniques available for those willing to invest a little effort.
Linux systems typically have a number of shells available, but for most, the default shell is /bin/bash. The bash shell gets its name as a treacherous pun on the earlier Bourne Shell (sh); the Bourne Again SHell.
Basic Commands
First, some basic commands to help you navigate around the directory tree. A shell normally starts in your home directory, but if you put all your files in that one directory, you will wind up with an unholy mess, so we normally organise files into multiple subdirectories using some logical scheme or other. Then we use the cd command to navigate around the directory tree, and use pathnames to access files that are outside the current directory.


Linux/Unix Command
DOS Equivalent
Create a directory
mkdir dirname
md dirname
Change directory
cd dirname
cd dirname
Reurn to home directory
cd
-
Show current directory (Print Working Directory)
pwd
cd
Directory listing
ls
dir
Copy file
cp source destination
copy source destination
Delete file
rm filename
del filename
Rename file
mv oldname newname
ren oldname newname
View contents of text file
type filename
cat filename
Switch to the root account
su
-

Filenames
You can refer to files by their absolute or relative pathnames. An absolute pathname lists the directories you traverse in getting from the root directory (/) to the destination file or directory, such as:
/home/les/work/bash.txt

A relative pathname, by contrast, lists the directories you traverse from your current (or working) directory to the destination file or directory. So, if my working directory is my home directory, then the relative pathname to the bash.txt file is:
work/bash.txt

The basic rule is that an absolute pathname always starts with a /, while relative pathnames never do. And of course, you can use . (dot) to refer to the current directory, and .. (dot-dot) to refer to the one above. So far, nothing new: this is the same as dear old DOS and Windows. But bash has a number of other tricks up its sleeve. For example, you can refer to your home directory as ~ (tilde or twiddle). So, no matter what my current directory, I can say
less ~/work/bash.txt

and see the contents of that same file. Or I can copy a bunch of files off floppy disk into my home directory:
cp -r /mnt/floppy ~

takes care of that chore. You can refer to other people's home directories in a similar way, as ~username. So, if you have permission to access my work, you can get to that file with the command
less ~les/work/bash.txt

Globbing and Wildcards
One nice benefit of working at the command line is being able to use commands to work on collections of files. Of course, a similar facility has been available in DOS/Windows since the earliest times, and most people know how to use the * and ? symbols to perform basic wildcard filename matching. However, in Unix systems, wildcard expansion doesn't work in quite the same way, and of course, there are more options.
In the DOS world, every program includes code for expanding wildcards into a list of matching filenames. However, Unix programs generally do not - it is up to the shell to expand wildcards into a list of filenames, a process which is known as globbing. This might seem like a minor distinction, but it can lead to unexpected behaviour, especially when using remote shells.
There are three basic wildcard characters:
String
Explanation
?
Matches any single character
*
Matches any string, including the null string (i.e zero or more characters)
[...]
Matches any one of the enclosed characters. A pair of characters separate by a hyphen represents a character range. For example, [ABC] matches any of the characters A, B or C, while [A-Z] matches any upper-case character, [A-Za-z] matches any letter and [0-9] matches any digit.

Escaping
Many punctuation symbols have a special meaning to the shell; when it sees them it will try to act upon them. However, occasionally you will come across filenames that actually incorporate these characters; when you try to use these filenames on the command line, you will get the wrong behaviour from the shell.
For example, what if someone creates a file with an ampersand (&)in its name? Normally, an ampersand marks the end of a command which is to be detached and run in the background, so you're going to see this kind of effect:
[les@sleipnir test]$ cat t&c.doc
[1] 23782
cat: t: No such file or directory
-bash: c.doc: command not found
[1]+  Exit 1                  cat t
[les@sleipnir test]$

What's happening here is that the shell tries to run the command cat t in the background and then run the command c.doc in the foreground. Not surprisingly, both fail. However, you can 'turn off' or escape the special meaning of the & character in several ways:
Type a backslash character before the special character:
[les@sleipnir test]$ cat t\&c.doc
Contents of the file 't&c.doc'.

[les@sleipnir test]$

Place the entire filename in single or double quotes:
[les@sleipnir test]$ cat 't&c.doc'
Contents of the file 't&c.doc'.

[les@sleipnir test]$

There is a difference between single and double quotes: the single quotes are 'stronger' and turn off substitution of variables (to be covered later). Double quotes should be used when you want variables to be substituted inside the string.
Command, Filename and Hostname Completion
Downloading and installing software often means dealing with long filenames. The reason the names are so long is that they include version number and patch level information; such filenames are very logical and very helpful by comparison with the Windows world, where every piece of software seems to come as a file called setup.exe, and it is hard to know what's what. I can just look at a file called webmin-1.110-1.noarch.rpm and know that it contains a newer version of Webmin than webmin-1.080-1.noarch.rpm. I also know I can install it on any processor architecture, using the rpm (Red Hat Package Manager) command.
But don't these long filenames mean a lot of typing? I'll let you into a secret: Linux users rarely type entire filenames. In fact, if I wanted to install Webmin 1.110, and my current directory contained only a few files, I would type:
rpm -ivh w

and then hit the Tab key. If webmin-1.110-1.noarch.rpm was the only file in that directory, the bash shell would automatically complete the filename for me, since that's the only file it can be. On the other hand, if my current directory also contained a file (or directory) called webcalendar, then bash would display
[les@sleipnir les]$ rpm -ivh web

and then beep, to signal that it needed me to distinguish between the two files by typing in the next letter, which would make it clear which I wanted. Now, if I already know that typing m will resolve the problem, then I would do that and then press the Tab key to complete the file name. But if I was unsure, I could press Tab again (another beep) and a third time, and now bash will produce a list of possible matches based on what I've typed so far:
[les@sleipnir les]$ rpm -ivh web<Tab><Tab><Tab>
webcalendar                webmin-1.110-1.noarch.rpm
[les@sleipnir les]$ rpm -ivh web

If there are several similarly-named files in a directory, you may have to use the Tab key several times to select the right alternatives. I always find this automatic filename completion useful when changing directories, as in this example:
cd /v<Tab>loca<Tab>n<Tab>domi<Tab>/h<Tab>sl<Tab>t<Tab>

gives me
[root@bifrost root]# cd /var/local/notesdata/domino/html/slides/t325

with a lot less typing, and no spelling errors, either. If I typed the entire path out manually, I would be bound to make a mistake.
But bash can auto-complete a lot more than just filenames. It can also auto-complete commands (useful if you're unsure whether you're misspelling a command, or whether the command exists on your system), although most commands are loaded from files anyway, so this is mostly a special case of filename completion.
bash will also autocomplete usernames, when it sees a ~ at the beginning of the word being completed. And it will autocomplete hostnames, if it sees an @ symbol at the beginning of the word - useful for scp commands and the occasional email - although it only refers to the /etc/hosts fle and cannot use DNS lookups.
Command History
We often type a command, then realise we're not in the right directory, or that something else should have been done first. We need to fix the problem, then repeat the command, which will now work. bash has a command history feature which makes this easy. Most people just use the up and down arrow keys to cycle through the command history, but it's got a lot more capability than that.
The history command will list the command history currently in the shell's memory, with numbers against each line. You can use the ! symbol to refer back to previous commands by number or by name. For example:
  994  exit
  995  w
  996  uptime
  997  set -o
  998  history
  999  iptables -L -n
 1000  exit
 1001  ls
 1002  history
[root@bifrost root]# !999
iptables -L -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
DROP       all  --  61.161.73.75         0.0.0.0/0
DROP       all  --  203.22.113.26        0.0.0.0/0
 . . .

If the command wasn't quite right, you can edit it. For most users, all the editing you ever want can be done with the left and right arrow keys, and the backspace key - the line editor is already in "insert" mode. But if you already know your way around the vi editor, many vi commands will work here too, and if you are an emacs fan (in which case you probably didn't need to read this article) you can put bash into emacs mode with the command set -o emacs.
You can also execute previous commands by typing an exclamation mark, followed by the beginning of the command. bash will work its way back up the history list, and when if finds the first (most recent) command that matches what you have typed, it will execute it
The Directory Stack
Quite often, as you work, you will need to change directory, work a while, and then subsequently return to the original directory. Typing all those cd commands is a drag, surely - even with filename completion? But it's not as bad as it might seem.
Most of your work will be done in your home directory, and you can always return straight there just by using the cd command with no arguments. The pushd and popd commands allow you to "remember" directories on a stack; rather than changing directory with the cd command, use the pushd command and the popd command will take you back to where you were. You can see the current stack with the dirs command. Perhaps an example will make it clear:
[[les@sleipnir linux]$ pwd
/home/les/nethome/download/linux
[les@sleipnir linux]$ pushd ~/feynman/vol1/ch09
~/feynman/vol1/ch09 ~/nethome/download/linux
[les@sleipnir ch09]$ pushd /home/les/tiki
~/tiki ~/feynman/vol1/ch09 ~/nethome/download/linux
[les@sleipnir tiki]$ pwd
/home/les/tiki
[les@sleipnir tiki]$ popd
~/feynman/vol1/ch09 ~/nethome/download/linux
[les@sleipnir ch09]$ pwd
/home/les/feynman/vol1/ch09
[les@sleipnir ch09]$ popd
~/nethome/download/linux
[les@sleipnir linux]$ pwd
/home/les/nethome/download/linux
[les@sleipnir linux]$

As you can see, the pushd command shows the directory stack, with the directory you are changing to at the left, and the original directory at the right. When you pop a directory off the stack, the popd command shows the stack again, from left to right
Multiple Commands on One Line
It's quite easy to put multiple commands on one line - especially useful if they will each take some time to complete. You can simply separate the commands with semicolon characters. For example, a command I quite often use in order to document how drives are partitioned on a system:
fdisk -l; mount; cat /etc/fstab

However, there are times when you want to make sure that the commands are completed correctly, for example when one command depends upon an earlier one. The classic example is the pair of commands used to compile and install kernel modules: make modules and make modules_install. If you type:
make modules ; make modules_install

you run the risk of the make modules step falling over, but then the make modules_install step apparently proceeding correctly and pushing the error messages off the top of the screen, so that when you return from your coffee break everything appears to have gone smoothly. Now you'll spend the next hour scratching your head and wondering why the changes you made seem to have had no effect. A better approach:
make modules && make modules_install

The && operator checks the result code of the make modules step, and only if it is zero, indicating success, will it proceed to the next step. Either both steps will complete successfully, or neither will. You can also do either one command or another:
rm -f filename || rmdir filename

will attempt to delete the file filename and only if that fails will it attempt the rmdir command.
You can also group commands, using parentheses. For example, if you try to get printed output from my three-command trick above, like this:
fdisk -l; mount; cat /etc/fstab | lpr

what you'll get is the output of the first two commands going to the screen, and only the last going to the printer. OK, let's bite the bullet and pipe the output of each command to the printer
fdisk -l | lpr; mount | lpr; cat /etc/fstab | lpr

but now you get three pieces of paper, from the three commands. Dammit! Here's the answer: group the commands and feed their output into one lpr command:
(fdisk -l; mount; cat /etc/fstab) | lpr

Finally, a few embellishments:
(echo "Disk partitioning for $HOSTNAME; fdisk -l; echo; mount; echo; cat /etc/fstab) | lpr

The first echo command adds a title for the page, including the host name from an environment variable (on which more later) and the other two simply space the page out a little better.
Aliases
You can save yourself some more typing by setting up some aliases on your system. Most Linux distributions have a few aliases already set up for convenience, and you can see these with the alias command. For example, on the Red Hat 9 system currently beside my desk:
[les@sleipnir les]$ alias
alias l.='ls -d .* --color=tty'
alias ll='ls -l --color=tty'
alias ls='ls --color=tty'
alias vi='vim'
alias which='alias | /usr/bin/which --tty-only --read-alias --show-dot --show-tilde'

For example, to see all hidden files in your current directory, you could just type l. rather than typing a full commmand. You can define your aliases:
[les@sleipnir les]$ alias l='ls --color=no'
[les@sleipnir les]$ l
a52dec-0.7.4-fr3.i386.rpm                libsigc++-1.2.5-fr1.src.rpm
a52dec-0.7.4-fr3.src.rpm                 linux-2.4.21.tar.bz2
. . .
so now I have a quick way of getting an ls listing without the colours. You can also use aliases as a way of providing default options for commands, as the variants on the ls command above show. In Red Hat Linux, aliases are used to add the -i (interactive) option to "dangerous" commands like rm and mv; if this drives you nuts you can delete fhem from ~root/.bashrc.
Command Substitution and Backquoting
A really neat trick, this one: You can include the output of one command inside another by using backticks (that funny backwards quote character up at the top left corner of your keyboard). Why might you want to use this? Here's a simple example. The kernel device driver modules on your Linux system are compiled for a specific kernel version, and the resultant files are placed in a subdirectory of /lib/modules that is named for the kernel version. So, for example, if your kernel version is 2.4.20-8, then the device driver module files will be under:
/lib/modules/2.4.20-8/kernel/drivers

So, to change directory to the right place, you have to figure out which kernel version you are running, with the uname -a or uname -r command, and then type the right cd command, inserting the kernel version at the right place. But a smarter way to do this is to have the shell insert the kernel version for you:
cd /lib/modules/`uname -r`/kernel/drivers

(Caution: the typesetting process sometimes messes with quote marks in articles; those should both be back-quotes around the uname -r command above). Try this for yourself - the command should take you directly to the right directory for your currently running kernel. There are lots of other uses for this technique; a good example is a way of getting sendmail to re-read its configuration file:
kill -HUP `head -1 /var/run/sendmail.pid`

This sends a HUP (hangup) signal to the process ID which is given by the first line of the file /var/run/sendmail.pid.
The nicest thing about this technique is that it will work inside scripts, with no intervention required. That's the ideal way to work on a Unix or Linux system - write scripts which do all the work for you, schedule them to run automatically using the cron facility, then sit back and play Tux Racer all day. . .
More on scripts in an upcoming article.
Sidebars
Making Configuration Changes Permanent
When the bash shell starts up, it executes a file called .bashrc in your home directory (you might not have noticed this file before, since the initial period makes it a hidden file). You can put any shell configuration commands in there - typically, people set up aliases in there, as well as setting shell variables such as EDITOR, PRINTER and so on.
If you want a change to be effective for all users on the system, then you should put it in the /etc/bashrc file. Usually, this script is called from within ~/.bashrc.
These two scripts are executed by all shells, that is, by shells that you launch within a GUI environment, subshells that are used to run scripts and login shells used in full-screen virtual consoles. However, login shells also execute two other scripts: /etc/profile (for all users) and ~/.bash_profile (per user). Anything you want set up for just login shells should be placed in one of those files.
Double-checking Deletions
Here's a trick that depends upon the way wild-card globbing is done. You might remember that DOS would always ask, in response to a 'DEL *.*' command "Are you sure (Y/N)?". Of course, as a Unix user, you know that the bash shell doesn't do this, it simply does what it's asked, and if you asked for the wrong thing, then that was your fault. That suits you fine, but you want to provide some protection for scatter-brained users. Then, just create a file called '-i' in any directory where you want to do this. Creating this file is a little tricky, even the touch command balks at it, but this command will do it:
> -i

(Yes, you read that correctly!). Now, whenever you type the command
rm *

it will become
rm -i file1 file2 file3 . .

and of course, the -i option makes the file removal interactive.
References and Further Reading
Programmable Completion http://www.caliban.org/bash/index.shtml#completion
For full details on all the features of the bash shell, such as its built-in commands, you should read the bash man page. The 'info bash' command provides an overview.