Shell

“If you hold a UNIX shell to your ear, you might just hear the C.”

Introduction

This text assumes that you’ve written shell scripts before, or at least used the interactive shell, as part of NSWI177 Introduction to Linux. We don’t discuss elementary things such as syntax and instead focus on practical problems encountered when writing real-world shell scripts.

If you need a refresher of the basics, take a look at An Introduction to the Unix Shell.

Terminology

Fun fact: the name “shell” is a pun: in analogy with a nut, it’s what stands between you (the user) and the kernel of the operating system. In other words, to get to the kernel, you need to crack the shell!

Depending on the context, “shell” can have one of several closely-related meanings:

a shell interpreter such as dash, Zsh or Bash;
the language shell scripts are written in;
a shell process (e.g., PID 12345 executing /bin/zsh).

Shell, the interpreter

Just like Python scripts are interpreted with the Python interpreter, shell scripts are interpreted with a shell interpreter, usually called just “the shell”. There is, however, a key difference:

Python (3.x) is a single well-defined language, and there’s a canonical interpreter for it (the official one). But there are many popular shell interpreters, such as:

Bourne-again shell (Bash)
Z shell (Zsh)
Debian Almquist shell (Dash)

They differ from one another in the:

input language they accept (lexically and syntactically);
semantics of shell programs;
set of supported features.

Some of the differences are fundamental (one shell lacks a feature another shell supports) and sometimes very subtle (the features are nearly identical, save for one edge case). Consequently, shell scripts that work perfectly under one interpreter can misbehave under another.

The reason for this is largely historical; while Python is a modern language which appeared in 1991, the original UNIX shell, the Thompson shell, appeared in 1971. It was improved upon, extended and modified, giving rise to many slightly incompatible variations which inspired the modern shells such as Zsh and Bash which we use today.

Shell, the language

The various shell interpreters give rise to various shell dialects. You might say a script is “written in Bash” if you want to emphasize that it relies on features specific to Bash.

Thankfully, most of the shells used in practice today agree upon a set of core concepts. The POSIX family of standards describes a standard for the Shell Command Language. In terms of feature support, in a Venn diagram:

Bash vs Zsh vs Dash vs POXIS feature sets — Illustration of the feature sets of Bash, Zsh, Dash and the POSIX standard.

Shell script portability

To make matters worse, shell scripts are rarely self-contained: to achieve even the simplest tasks, they often call upon external programs.

For example, to search for files you’d use find(1), a binary program usually found in /usr/bin/find. Without these programs, called commands, the shell wouldn’t be very useful.

For a given command, there are often multiple implementations available:

coreutils and busybox are two implementations of many of the foundational UNIX commands such as ls(1), cp(1) or rm(1). While coreutils aims to be complete and feature-rich, busybox aims for the smallest possible binary size and is pretty much bare-bones.
GNU netcat vs BSD netcat are two implementations nc(1), a network diagnostic utility. Looking at the linked man pages, it’s easy to see that the options are different and incompatible. On most systems, either the GNU or the BSD version is available with the command name nc.
The commands evolve over time: features are added, and sometimes deprecated or removed. For example, git clone supports the --shallow-since option since v2.11.0. Whether your shell scripts can use this option depends on the version of git installed in the system.

This rises concerns about the portability of shell scripts. A script is said to be portable when it can be used, unmodified, on a wide range of systems. That can be quite challenging to achieve, given how tied shell scripts are to their environment.

While writing a portable shell script can be very difficult, it’s fairly easy to avoid many mistakes which make shell scripts non-portable for no good reason. In a nutshell:

Use /bin/sh as your shell interpreter. On most systems, this will either be a dedicated POSIX shell (e.g. Dash) or Bash in POSIX compatibility mode (see --posix in bash(1)). This will help you only rely on those shell features which are broadly available and fairly well-defined¹.

Use ShellCheck. When you specify /bin/sh as your interpreter, it will also warn you when you rely on non-POSIX features or behavior, in addition to spotting many common scripting errors.
Don’t rely on exotic, deprecated, undocumented and very recently added features of the commands you use. Or, if you really want to use them, check for their presence (for example, by looking at the output of git --version) and implement a fallback.

This way, you get reasonable portability with very little effort.

Does portability matter in practice?

In short: yes. You can easily run into portability issues with:

Containers, since they often run bare-bones Linux distros such as Alpine Linux which come with Dash and busybox by default to conserve space and memory.
Other popular Unix-like systems, mainly Macs, which are stuck on Bash 3.2.57 (released 2006!) due to licensing constraints and the BSD variants of many utilities such as sed(1) or cut(1).

Following the portability suggestions above will save you a lot of trouble.

Choose your battles

The most important lesson when it comes to shell scripting is to choose your battles. Just don’t write shell scripts when C or Python are a better fit.

Shell is a great fit when your program can be expressed as a sequential combination of other programs. In other words, when the shell doesn’t do much besides serially executing other programs and plumbing them together.

Shell is not a good fit when:

the program needs to perform lots of parallel processing;
sophisticated error handling is required (shell scripts usually abort on any error);
the program needs to perform math or very complex text or data processing; or
performance matters.

While you can often work around the above limitations, if you need any of the above, it’s usually a good idea to write your program in a high-level programming language from the get-go instead. On the other hand: it’s not unheard of for a 20 line shell script to be replaced by an equivalent program spanning 500 lines of C.

Using the right tool for the job is arguably one of the most important skills not just when it comes to shell scripting, but in programming overall.

High-level design of shell scripts

Let’s look at the high-level structure of example shell programs first. Please note that this is just one way to structure your scripts, albeit a recommended one.

Example #1: gitdo

 1 #!/bin/sh
 2 set -eu
 3
 4 usage() {
 5         cat <<EOF
 6 Usage: gitdo [-gv] [--] COMMAND... [--]
 7        gitdo -h
 8
 9 Execute COMMAND on all tracked files in a Git repository.
10 The file names will be provided as command line arguments to COMMAND.
11
12 Options:
13   -g  Operate on the entire repository, not just the current subtree
14   -h  Print this message and exit
15   -v  Be verbose
16 EOF
17 }
18
19 opt_global=false
20
21 while getopts "ghv" opt; do
22         case $opt in
23         g) opt_global=true;;
24         d) echo="echo";;
25         h) usage; exit 0;;
26         v) set -x;;
27         *) usage 2>&1; exit 1;;
28         esac
29 done
30
31 shift $((OPTIND-1))
32
33 root=$(git rev-parse --show-toplevel 2>&1) || {
34         printf "gitdo: not within a Git repo: %s\n" "$PWD"
35         exit 1
36 } >&2
37
38 dir=.
39 $opt_global && dir=$root
40
41 git ls-files -z --exclude-standard -- "$dir" |
42         xargs -0r "$@"

Since this is the first example, let’s break it down and discuss it piece by piece.

Hashbang

 1 #!/bin/sh

This line is usually called a hashbang, a shebang or an interpreter directive. When we execute the script (e.g., ./gitdo), it’s not the script that gets executed; rather, the interpreter (/bin/sh) is executed and is given the name of the script to interpret (./gitdo) as the first argument.

Perhaps surprisingly, this logic is implemented in the kernel directly. When you execute the script, the name of the script is passed as an argument to execve(2). The kernel then needs to decide what type of executable the named file is (whether it’s an ELF binary, a script or some other type of supported executable file format).

When the file starts with #!, it’s treated as a script. If you’re interested in the details, here are some pointers: fs/exec.c, fs/binfmt_script.c.

By specifying /bin/sh as your interpreter, you clearly ask for a POSIX shell. If your script does in fact require Bash or Zsh to function properly, you should of course specify that shell instead.

From Python, you may be used to the following instead:

#!/usr/bin/env python3

There are two reasons to use /usr/bin/env in the hashbang:

The path to the interpreter (python3) must be absolute, but it’s not the same on all systems where the script is run. Since env performs $PATH lookup, it will resolve the command name at runtime, just like the shell does.

You want to allow the user to override the system-wide Python interpreter. For example, when
```
PATH=~/bin:/bin:/usr/bin
```
and there is an executable called python3 in ~/bin/python3, this interpreter will be used.

This form (chain-loading through env) isn’t normally needed with shell scripts, because the path is usually the same everywhere (/bin/sh) and users don’t really bring their own shell. That is, unless you use a Mac with an ancient Bash—then you probably want to override the system one with a newer version from Homebrew or MacPorts.

set -eu

This is arguably the most important line in the entire script:

 2 set -eu

By default, shell error handling is extremely benevolent. Notably:

When a command fails, evaluation continues with the next command. In other words, the interpreter doesn’t stop on errors by default.
Referencing an unassigned variable is not considered an error. Instead, the variable expands to the empty string.

Especially when combined with other peculiarities of the shell, this makes for a lethal cocktail. Consider the following trivial script:

#!/bin/bash

cd $dir
rm -rf *

This script has more problems than lines of code:

It targets Bash for no good reason. This is all too common. Shell ≠ Bash. Use #!/bin/sh where POSIX is good enough.
If dir is unset, $dir expands to the empty string. Because set -u isn’t in effect, this isn’t an error. Because the expansion isn’t quoted, cd will be executed without arguments. Without arguments, cd will change to $HOME.
Consequently, if dir is unset, you will delete the contents of your $HOME rather than the contents of $dir. Not great.
If dir is set to /some/path and /some/path does not exist, the cd will fail and the current working directory will remain unchanged. But because set -e is not in effect, execution will continue and the rm will delete the contents of the current working directory. Also not great.
The invocation of rm is missing the -- option terminator. If there’s a file whose name starts with -, it will be understood as an option to rm (and may or may not be deleted itself, depending on what the option does).

Here’s a better version:

#!/bin/sh
set -eu

cd "$dir"
rm -rf -- *

If there’s one thing to remember, it’s always to set -eu.

usage()

The so-called usage message is a tiny bit of documentation embedded in the source code of a (UNIX) program. Think of it as of a mini man page which just lists the valid ways to invoke the script and a list of options:

 4 usage() {
 5         cat <<EOF
 6 Usage: gitdo [-gv] [--] COMMAND... [--]
 7        gitdo -h
 8
 9 Execute COMMAND on all tracked files in a Git repository.
10 The file names will be provided as command line arguments to COMMAND.
11
12 Options:
13   -g  Operate on the entire repository, not just the current subtree
14   -h  Print this message and exit
15   -v  Be verbose
16 EOF
17 }

The <<EOF marks the beginning of a so-called here-document. The document starts on the next logical line (line 6) and stops just before a line containing only the end marker EOF (line 16). The end marker is arbitrary shell word, but EOF (“end of file”) is often used. The content of the document is then supplied to cat as stdin. In other words, lines 5 through 16 just print the usage message to stdout.

A well-written usage message is very short yet informative.

getopts

The first real work the script does is option processing. By far the simplest way to process shell options is getopts, a POSIX shell utility:

19 opt_global=false
20
21 while getopts "ghv" opt; do
22         case $opt in
23         g) opt_global=true;;
24         d) echo="echo";;
25         h) usage; exit 0;;
26         v) set -x;;
27         *) usage 2>&1; exit 1;;
28         esac
29 done
30
31 shift $((OPTIND-1))

A well-behaved UNIX program prints usage to stdout with -h and then immediately exits with a zero (success) exit code. It also prints usage when invoked incorrectly, but then the usage message should go to stderr and the exit code should be non-zero (failure).

The *) case item is taken when an unknown option is specified, or when an option is missing a value; getopts will print the error to stderr and the usage will follow:

% gitdo -x
Illegal option -x
Usage: gitdo [-ghv] -- COMMAND...

Execute COMMAND on all (non-excluded) files within a Git repository.
The file names will be provided as command line arguments to COMMAND.

Options:
  -g  Operate on the entire repository, not just the current subtree.
  -h  Print this usage message and exit.
  -v  Be verbose
% echo $?
1

It’s good practice to also support -v (verbose) which induces the script to log what it’s doing. Using set -x is a quick-and-dirty way to implement a basic verbose mode. It’s a good idea to support this option from the very beginning, since then it won’t be accidentally used for anything else.

This script’s command line interface is very simple as it takes only one another option (-d) which accepts no value; and no processing of positional arguments takes place. We’ll see some more complex examples later.

Once options are processed, we shift the arguments so that $1 becomes the first non-option (positional) argument.

The actual program

Up until now, we have only dealt with the script’s interface. Now it’s time to implement whatever it is that the script does, and that’s a relatively small portion of the source code:

33 root=$(git rev-parse --show-toplevel 2>&1) || {
34         printf "gitdo: not within a Git repo: %s\n" "$PWD"
35         exit 1
36 } >&2
37
38 dir=.
39 $opt_global && dir=$root
40
41 git ls-files -z --exclude-standard -- "$dir" |
42         xargs -0r "$@"

This is very straight-forward. The only non-POSIX part (apart from git which is non-negotiable) is the -z option we pass to xargs(1). It’s however fairly widely supported and makes the script more robust, so we consider it a fair trade.

The script also has just the right number of lines.

Example #2: snapback

The second example is a simple wrapper around Snap, a Btrfs snapshot manager. It allows you to quickly recover a particular version of a file.

 1 #!/bin/sh
 2 set -eu
 3
 4 opt_profile=root
 5 opt_recover=false
 6
 7 usage() {
 8         cat <<EOF
 9 Usage: snapback [-v] [-p PROFILE] [FILE]
10        snapback -h
11
12 List all Snap backups of FILE for PROFILE.
13 If FILE is not given, list backups of all files in PROFILE.
14 If PROFILE is not set, it defaults to $opt_profile.
15
16 Options:
17
18   -h  Print this message and exit
19   -p  Snap profile to search backups in [$opt_profile]
20   -r  Recover the most recent backup of FILE
21   -v  set -x
22 EOF
23 }
24
25 while getopts "hp:rv" opt; do
26         case $opt in
27         h) usage; exit;;
28         p) opt_profile=$OPTARG;;
29         r) opt_recover=true;;
30         v) set -x;;
31         *) usage >&2; exit 1;;
32         esac
33 done
34 shift $((OPTIND-1))
35
36 file=$PWD
37 [ $# -gt 1 ] && {
38         printf >&2 "too many arguments\n"
39         usage >&2
40         exit 1
41 }
42 [ $# -eq 1 ] && {
43         file=$1
44         shift
45 }
46
47 files=$(snap -L "$file" -- "$opt_profile")
48 printf "%s\n" "$files"
49
50 "$opt_recover" && {
51         latest=$(printf "%s\n" "$files" \
52                 | tail -n1 \
53                 | cut -f4)
54         cp "$latest" .
55 }

Of note:

The high-level structure of the program is exactly the same.
The -p option accepts an argument via $OPTARG.
The script accepts positional arguments and drops usage on incorrect use.

Example #3: passman

The third example is a very simple password manager.

  1 #!/bin/sh
  2 set -eu
  3
  4 store="$HOME/passwords"
  5
  6 usage() {
  7         cat <<EOF
  8 Usage: passman [-v] [-s STORE]
  9        passman [-v] [-s STORE] -i [-cR] CREDENTIAL
 10        passman [-v] [-s STORE] -r [-cR] CREDENTIAL
 11        passman [-v] [-s STORE] -o [-c]  CREDENTIAL
 12        passman [-v] [-s STORE] -d       CREDENTIAL
 13        passman [-h]
 14
 15 Without arguments, list all passwords in STORE.
 16 With one or more arguments, modify the contents of the store.
 17
 18 Options:
 19   -i      Insert CREDENTIAL
 20   -r      Replace CREDENTIAL
 21   -o      Output CREDENTIAL
 22   -d      Delete CREDENTIAL
 23   -c      Use clipboard for input (-ic, -rc) and output (-oc)
 24   -R      Use a random value for input (-iR, -rR)
 25   -h      Print this message and exit
 26   -s DIR  Use password store DIR [$store]
 27   -v      set -x
 28 EOF
 29 }
 30
 31 opt_clipboard=false
 32 opt_delete=false
 33 opt_insert=false
 34 opt_output=false
 35 opt_random=false
 36 opt_replace=false
 37
 38 while getopts cdhliorRs:v opt; do
 39         case $opt in
 40         c) opt_clipboard=true;;
 41         d) opt_delete=true;;
 42         h) usage; exit;;
 43         i) opt_insert=true;;
 44         o) opt_output=true;;
 45         r) opt_replace=true;;
 46         R) opt_random=true;;
 47         s) store=$OPTARG;;
 48         v) set -x;;
 49         *) usage >&2; exit 1;;
 50         esac
 51 done
 52 shift $((OPTIND - 1))
 53
 54 err() {
 55         exitcode=$1; shift
 56         fmt="passman: $1"; shift
 57         printf >&2 -- "$fmt" "$@"
 58         exit "$exitcode"
 59 }
 60
 61 [ -d "$store" ] ||
 62         err 1 "Password store %s is not a directory.\n" "$store"
 63
 64 git -C "$store" rev-parse --show-toplevel >/dev/null 2>&1 ||
 65         err 1 "Password store %s is not a Git repository.\n" "$store"
 66
 67 "$opt_insert" || "$opt_replace" || "$opt_output" || "$opt_delete" || {
 68         [ $# -eq 0 ] || {
 69                 printf >&2 "Unexpected argument(s): %s...\n" "$1"
 70                 usage >&2;
 71                 exit 4
 72         }
 73         find "$store" -name '*.gpg' -printf "%P\n" | sed 's/\.gpg$//' | sort
 74         exit
 75 }
 76
 77 [ $# -eq 1 ] || {
 78         printf "Missing CREDENTIAL\n"
 79         usage
 80         exit 1
 81 } >&2
 82
 83 cred=$1; shift
 84 cred_file="$store/$cred.gpg"
 85
 86 cleanup() {
 87         rm -f -- "$cred_file.tmp"
 88 }
 89
 90 trap cleanup EXIT
 91
 92 [ -f "$cred_file" ] || "$opt_insert" ||
 93         err 2 "Credential %s does not exist in store %s.\n" "$cred" "$store"
 94
 95 [ -f "$cred_file" ] && "$opt_insert" &&
 96         err 2 "Credential %s already exists in store %s.\n" "$cred" "$store"
 97
 98 "$opt_insert" || "$opt_replace" && {
 99         cred_dir=$(dirname "$cred_file")
100         mkdir -p -- "$cred_dir"
101
102         recipients_file=$(git -C "$cred_dir" rev-parse --show-toplevel)/.recipients
103         [ -f "$recipients_file" ] ||
104                 err 2 "missing recipients file %s\n" "$recipients_file"
105         recipients=$(xargs -n1 printf "-r %s\n" <"$recipients_file")
106
107         if "$opt_random"; then
108                 head -c 24 /dev/random | base64
109         elif "$opt_clipboard"; then
110                 xclip -r -o -selection clipboard 2>/dev/null ||
111                         err 3 "Cannot copy from clipboard."
112         else
113                 cat
114         fi | gpg -q $recipients --encrypt --armor >"$cred_file.tmp"
115         mv "$cred_file.tmp" "$cred_file"
116         git -C "$cred_dir" add "$cred_file"
117         "$opt_insert" &&
118                 msg="Add credential $cred" ||
119                 msg="Replace credential $cred"
120         git -C "$cred_dir" commit -q -m "$msg"
121 }
122
123 "$opt_output" && {
124         gpg -q --decrypt "$cred_file" 2>/dev/null |
125         if "$opt_clipboard"; then
126                 xclip -r -i -l 1 -selection clipboard
127         else
128                 cat
129         fi
130 }
131
132 "$opt_delete" && {
133         cred_dir=$(dirname "$cred_file")
134         rm -- "$cred_file"
135         git -C "$cred_dir" add "$cred_file"
136         git -C "$cred_dir" commit -q -m "Delete credential $cred"
137         find "$store" -type d -not -path "$store/.git/*" -empty -delete
138 }
139
140 exit 0

Of note:

The high-level structure of the program is exactly the same.
The usage message lists the various invocations and relevant options.
The options don’t conflict; you can insert and output, output and delete etc. a credential with a single invocation.

Example #4: deadlink

  1 #!/bin/sh
  2 set -eu
  3
  4 usage() {
  5         cat <<EOF
  6 Usage: deadlink [-v] [FILE...]
  7        deadlink -h
  8
  9 Parse each FILE as HTML and check all outgoing links.
 10 When no FILE is given or FILE is -, check standard input.
 11
 12 Options:
 13
 14   -h   Print this message and exit
 15   -v   set -x
 16 EOF
 17 }
 18
 19 while getopts "hv" opt; do
 20         case $opt in
 21         h) usage; exit;;
 22         v) set -x;;
 23         *) usage >&2; exit 1;;
 24         esac
 25 done
 26
 27 shift $((OPTIND-1))
 28
 29 trap 'rm -f links' EXIT INT QUIT TERM
 30
 31 py=$(cat <<'EOF'
 32 import bs4
 33 import sys
 34 soup = bs4.BeautifulSoup(sys.stdin.read(), features="html.parser")
 35 for a in soup.find_all("a"):
 36     print(a["href"])
 37 EOF
 38 )
 39
 40 cat "$@" | python -c "$py" | grep >links -E '^https?://'
 41
 42 exit=0
 43 while read -r link; do
 44         curl -sSf >/dev/null -- "$link" && {
 45                 printf "OK: %s\n" "$link"
 46         } || {
 47                 [ $? -ge 128 ] && {
 48                         printf "Check interrupted\n"
 49                         break
 50                 }
 51                 printf "BAD LINK: %s\n" "$link"
 52                 exit=1
 53         } >&2
 54 done <links
 55 exit "$exit"

Of note:

This script started as a one-time hack to see if there are any dead links on this very page. As it seemed useful, I have refactored it into a proper script following the best practice documented above.
The script uses BeautifulSoup (a Python HTML processing library) to extract links from the page. Using regular expression for that purpose would be unnecessarily fragile.
To make it easy to stop the script, if the exit from curl(1) is above 128, we break the loop, as we know that the curl process must have been killed with a signal (some signal, not necessarily SIGKILL).

Shell idioms and best practice

We’ve seen some examples of recommended high-level design of shell programs, now let’s take a look at useful primitives. We try to highlight some lesser-known features which make shell scripting a whole lot more enjoyable.

Everything in this section is standard POSIX behavior and tools, unless otherwise noted.

printf

You probably use echo(1) to print strings in the interactive shell, as in

% echo $some_var

And that’s perfectly fine.

However, echo has no place in shell scripts due to portability concerns. Take a look at this page describing various echo implementations. The TL;DR is that echo is only good for plain text containing no escape sequences.

Instead, use printf(1). This is modeled after printf(3):

% name=world
% printf "Hello %s!\n" "$name"
Hello world!

Besides being well-defined, it’s useful for alignment of variable-width strings. For example, to right-justify a string to a width of 10 characters:

% printf "%10s\n" "foo"
      foo

You can also left-justify by using a negative width. The width can also be variable, as in:

% width=-11
% name=world
% printf "Hello %*s!\n" "$width" "$name"
Hello world      !

Numbers can be converted and aligned, too:

% printf "0x%08x\n" 42
0x0000002a

And so on. Refer to printf(3) for details.

Compound command redirections

A little-known feature, redirection can be applied to compound statements, such as if-clauses or for-clauses or { ... }:

if [ $# -gt 0 ]; then
        printf "Unexpected arguments: %s...\n" "$1"
        usage
fi >&2

Both the output of printf and the usage message will be redirected to stderr. Redirections can also be applied to braces, even in function definitions:

err() {
        exit=$1; shift
        msg=$1; shift
        printf "foo.sh: error: %s\n" "$msg"
        exit "$exit"
} >&2

This makes the output of err go to stderr by default when invoked.

Note: when redirecting a brace group, the result is not equivalent to redirecting the individual commands, because the redirection is only performed once. Thus the following:

 { cat; cat; } </etc/hostname

is not equivalent to:

 { cat </etc/hostname; cat </etc/hostname; }

Short-circuit evaluation

Instead of writing

if "$cond"; then
    cmd
fi

you can use the short-circuit evaluation operator && (and):

"$cond" && cmd

For example, we could rewrite one of the prior examples as:

[ $# -gt 0 ] || {
        printf "Unexpected arguments: %s...\n" "$1"
        usage
} >&2

This saves typing and makes the code a bit easier to read. The || (or) operator works similarly to express !cond.

Parameter expansion

Parameter expansion is a shell word of the form $param, where param is a parameter name. For example, $foo or $1 are parameter expansions. The full syntax is ${param} and the braces are usually omitted.

Note: the braces are required in two cases:

to refer to argument $n where n > 9, e.g. ${10};
when the next character could be mistaken for part of the identifier, e.g. in foo${bar}baz.

However, the braces permit additional processing of the parameter.

Conditional parameter expansion operators

The syntax ${param op [word]} allows you to expand param conditionally depending on op. In all cases below, if word is not provided, it defaults to null (the empty string):

${param-[word]}: if param is set, expand to param; otherwise expand to word;
${param=[word]}: if param is set, expand to param; otherwise assign param to word and expand to word;
${param?[word]}: if param is set, expand to param; otherwise print an error and exit (use word as the error message if provided);
${param+[word]}: if param is set, expand to word, otherwise expand to null.

These operators can each be prefixed with a colon (:- := :? :+) and the condition then changes from “if set” to “if set and not null”.

This makes it simple to expand shell variables with a fallback value:

nc -l -p "${port:-8000}"

Conventionally, you would set defaults for environment variables at the beginning of a shell script, e.g.:

#!/bin/sh
set -eu

: "${LC_ALL:=en_US.UTF-8}"

The colon (:) is a so-called null utility. It does nothing useful². But without it, the first field resulting from the parameter expansion would be taken as a command name, which is not what we want.

String operators

Yes, the shell supports string operators! They are few but very useful nonetheless:

The first operator is string length operator. The syntax is ${#param}:

% name="Ken Thompson"
% printf "The length of \$name is %d\n" "${#name}"
The length of $name is 12

Next, there is a remove smallest suffix pattern operator with syntax ${param%suffix}:

% file=img.jpg
% basename=${file%.*}
% printf "%s\n" "$basename"
img

As you can see, the suffix can be a pattern; the pattern matching notation is the same as the one used for filename expansion.

There is also a remove smallest prefix pattern operator with syntax ${param#prefix}.

Both the prefix and suffix operators can be doubled (## %%); these are the remove largest prefix pattern and remove largest suffix pattern, respectively:

% pathname=/path/to/some/file
% basename=${pathname##*/}
% printf "%s\n" "$basename"
file

The argument array $@

The POSIX shell supports exactly one array, the argument array $@. The values of this array are accessed through $1, $2, $n, where n is the length of the array (n = $#). $0 is a special parameter and not part of $@.

Crucially, the expansion "$@" (mind the double quotes) is equivalent to "$1" "$2" … "$n". In other words, expansion of "$@" produces quoted fields, one field per item.

Initially, this array is set to the positional arguments from the invocation of the shell itself, or of the called shell function. This is best illustrated with the following script:

 #!/bin/sh
 set -eu

 for arg in "$@"; do
         printf "Arg = %s\n" "$arg"
 done

Running this script, we get

./foo.sh a b c
Arg = a
Arg = b
Arg = c

When iterating over "$@", the in clause (in "$@") can be omitted (for arg; do … done).

Setting array elements

The set shell built-in can be used to set the argument array $@ and the corresponding parameters $1, $2 … $n. The arguments to set become the new $@.

To clear arguments:
```
set --
```
To append $var to $@:
```
set -- "$@" "$var"
```
To prepend $var to $@:
```
set -- "$var" "$@"
```

Here’s a simple example:

#!/bin/sh
set -eu

set --
set -- "$@" 3
set -- "$@" 4
set -- 2 "$@"
set -- 1 "$@"

for arg; do
        printf "Arg = %d\n" "$arg"
done

Running this, we get:

Arg = 1
Arg = 2
Arg = 3
Arg = 4

Shifting the array

It’s also possible to shift the array to the left, removing first n elements, with shift. When you shift [n], where n defaults to 1, the argument $i refers to what argument ${i+n} referred to before.

#!/bin/sh
set -eu

set 1 2 3 4
shift 2
for arg; do
        printf "Arg = %d\n" "$arg"
done

Running this, we get:

Arg = 3
Arg = 4

Transforming the array

Using set, shift and for, it’s possible to implement filter and map, too. For example, to only keep non-negative elements, one could write:

#!/bin/sh
set -eu

for arg; do
        shift
        [ "$arg" -ge 0 ] && set "$@" "$arg"
done

Recall that for arg is equivalent to for arg in "$@". It’s legal to modify the array in the loop with set and shift, because the expansion of "$@" (even if it’s implicit) happens before the body of the for loop is executed. The shift removes each argument, and the set appends it to the end of the array only if it’s -ge zero. Since the body of the for loop executes exactly once per element of the original array, we are left with non-negative entries only.

Similarly, one could map over the array; this is left as an exercise to the reader.

Argument pass-through

Sometimes, your script will accept a variable number of arguments, and some or all of them will be passed to another program, like in the gitdo example:

41 git ls-files -z --exclude-standard -- "$dir" |
42         xargs -0r "$@"

This is where "$@" truly shines: it allows you to pass the arguments correctly without having to worry about quoting.

Naming positional arguments

It’s good practice to give names to the positional arguments of your shell scripts. That is, rather than using $1, $2, … directly:

mkdir -p "$1"
some_cmd -o "$1" "$2"

It’s better to name your arguments and use the names:

output_dir=$1
source_file=$2

mkdir -p "$output_dir"
some_cmd -o "$output_dir" "$source_file"

It’s even better to shift out the positional arguments as you process them:

output_dir=$1; shift
source_file=$1; shift

mkdir -p "$output_dir"
some_cmd -o "$output_dir" "$source_file"

Advantages:

You can change the order of the script’s arguments simply by swapping the assignments, without having to edit the indices.
$@ contains exactly the unnamed options. Often, there is a variable number of positional arguments (such as filenames to act upon) passed on with "$@" as in the gitdo example above.

You can easily check that there are no extraneous or missing options:

[ $# -eq 0 ] || {
        printf "Unexpected arguments: %s...\n" "$1"
        exit 1
} >&2

This applies to functions, too.

readonly

It is possible to mark variables read-only with the readonly built-in:

readonly answer=42

This prevents the variable from being changed and unset, and is a useful protection for constants.

Omitting quotes

Sometimes, quotes can (and should, for the sake of professionalism) be omitted:

In assignments of the form var=word, where word is any shell word. The var=word is called an assignment word in the shell lexical grammar. Quotes can be omitted even when word expands to a string containing IFS characters:
```
var='has spaces'
var2=$var
```
However! The following situation is completely different:
```
export var2="$var"
```
Quotes are required here, because the var2="$var" is not an assignment word: assignment words are only recognized before the command name (here that’s the export word) but not after. Consequently, the usual field splitting rules apply.
Around case expressions:
```
case $var in
...
esac
```

Note: If in doubt, quote.

Suppressing errors with `||:`

With set -e in effect, any error will take your script down. Sometimes, that’s not what you want. For example:

 num=$(printf "%s\n" "$var" | grep -Eo '[0-9]+')

When var contains no digits, grep(1) will exit with a non-zero exit code. Consequently, the exit code of the assignment is non-zero, taking down the shell. In a situation like this, it’s often not an issue that the grep matched nothing. To suppress the error, use:

 num=$(printf "%s\n" "$var" | grep -Eo '[0-9]+' ||:)

That’s just a contraction of || and the : and the missing space is simply a stylistic choice. This makes the exit code of the pipeline the exit code of : which is always 0.

Beware: of unexpected subshells

Consider the following example:

#!/bin/sh
set -eu

seq 10 | while read -r num; do
        sum=$((sum+num))
done

printf "%d\n" "$sum"

When executed under:

Zsh, this script prints 55
Bash, this script prints 0
Dash, this script prints 0

Why? Well, this has to do with POSIX rules for pipe execution. The standard says:

Additionally, each command of a multi-command pipeline is in a subshell environment; as an extension, however, any or all commands in a pipeline may be executed in the current environment.

Zsh apparently executes the while loop in the current shell execution environment, thus the variable sum is the same variable in the while loop as the rest of the script. Unlike Bash and Dash which execute the loop in a separate execution environment with its own set of variables. You can verify this easily by printing partial sums in the body of the loop: the partial results are all correct, but the result is sometimes 0.

Beware: of the exit code of a pipeline

The exit code of a pipeline is the exit code of the last command in the pipeline, thus:

grep '^root:' /etc/paswd | cut -d: -f7

Exits with 0 despite the typo, and that is often undesirable. Unfortunately, the following doesn’t work as expected:

{ grep '^root:' /etc/paswd || exit 1; } | cut -d: -f7

(It should be clear why this doesn’t work. See the prior section.)

The only portable way to avoid this behavior (where it matters) is to split up the pipeline into multiple stages:

filtered=$(grep '^root:' /etc/paswd)
printf "%s\n" "$filtered" | cut -d: -f7

A temporary file could also be used instead of the variable:

grep >filtered '^root:' /etc/paswd
cut <filtered -d: -f7

Don’t forget to remove the file on exit.

There is a non-standard option, called pipefail, supported by both Bash and Zsh:

#!/bin/zsh
set -eu -o pipefail

false | true

The exit code of this program is 1 as desired.

Beware: of unexpected variable scope

The default scope of shell variables is global:

#!/bin/sh
set -eu

f() {
        x=42
}

x=0
f
printf "%d\n" "$x"

This script outputs 42 and often, that’s not what you want. There’s a non-POSIX extension called local which allows you to tie the scope of a variable to the run-time of a function:

#!/bin/sh
set -eu

f() {
        local x=42
}

x=0
f
printf "%d\n" "$x"

This script outputs 0.

Note: While non-standard, the local extension is widely supported (at least in Zsh, Bash and Dash).

Beware: of assuming a particular working directory

Often, scripts are written in a particular working directory in mind. Typically, the programmer assumes that the current working directory is going to be the directory containing the script, and for a long time, the script is only used (and consequently, tested) as:

./script.sh

Everything works until somebody tries to execute it from somewhere else:

./bin/script.sh

This alters the meaning of relative paths in the script. Therefore, if you reference other files in your script, such as other (sourced) scripts, always anchor the path relative to your script:

dir=$(realpath "$(dirname "$0")")
. "$dir/common.sh.inc"

Sometimes, it’s advantageous to temporarily change the current working directory, for example:

(
cd "$repo_dir"
git ls-files
)

The change of current working directory is then local to the subshell ( … ).

Sometimes, tools provide options that allow you to achieve the same effect a change of working directory would have, as is the case with the git invocation above. We could thus write:

git -C "$repo_dir" ls-files

See git(1) for details.

Beware: of eval

The eval built-in makes it possible to evaluate a string. This is sometimes useful. For example, to refer to a variable whose name is the value of another variable:

#!/bin/sh
set -eu

check_variable_matches_re() {
        local var_name=$1; shift
        local re=$1; shift
        eval "local val=\${$var_name}"
        printf "%s\n" "$val" | grep -Eq "$re" || {
                printf "Variable %s does not match regular expression: %s\n" \
                        "$var_name" "$re"
                return 1
        }
}

foo=42
check_variable_matches_re foo '^[0-9]+$'

bar=answer42
check_variable_matches_re bar '^[a-z]+$'

Running this, we get:

Variable bar does not match regular expression: ^[a-z]+$

This is all fine as long as the name of the variable is a hard-coded constant. But once the name of the variable is user-supplied, very bad things can happen very quickly:

user_input='lsdjadfalhfda=$(touch /hahaha)'

# later...

check_variable_matches_re "$user_input" '^$'

It is thus advisable to avoid eval if at all possible!

There are non-standard extensions to obtain the value of a variable whose name is stored in another variable var:

Bash: ${!var}
Zsh: ${(P)var}

If you absolutely need this feature, targeting a non-POSIX shell is probably preferable to using eval.

Beware: of limitations of set -e

Sometimes, not even set -e will prevent nasty things from happening. For example, the following command:

export "FOO=$(cmd)"

is fundamentally different from:

FOO=$(cmd)
export FOO

We already explained why the quotes are only required in the first form. But there is another difference which has to do with error handling:

In the first form, the exit code of the command is the exit code of export, not of the expansion $(cmd). Thus, if cmd fails, the export will still succeed and the execution will continue.
In the second form, the exit code of the assignment is the exit code of the expansion, thus if cmd fails, the script will abort due to set -e.

Therefore, the second form is preferable whenever the right-hand side of the assignment contains expansions which may fail.

Beware: of passing secrets on the command line

Sometimes, tools provide a -p option to provide a password, or some other option with the same intent (providing a secret string to the program).

This is fundamentally wrong. If you come across this, do not use it:

The program’s command line, including options and their values, is visible as /proc/pid/cmdline and displayed as the output of tools such as ps(1), top(1), htop(1), etc. Mounting procfs with hidepid=2 (see proc(5)) solves this, but it still won’t stop systemctl(1) from happily spitting out the full command line.

So, no. The only correct way to provide a secret to a program is via stdin.

Even if you trust your machine and don’t mind the secret being (temporarily?) visible in the process listing, you still need to take care when constructing command lines containing secrets, as the commands are recorded in your shell’s history file. By convention, shells won’t record any commands beginning with whitespace in the history file, so prefixing such command with a single space might do the trick.

Powerful tools

There are several incredibly powerful command-line tools you should probably know:

curl(1) is a Swiss-army knife utility supporting several network protocols, most notably HTTP. It allows you to make arbitrary HTTP requests from the command line:
```
curl \
        -sSf \
        -X POST \
        -H "Content-Type: application/json" \
        -d @file \
        --url-query key=value \
        "https://nswi106.cz/some/path"
```
See curl(1) for a lot of additional options.
jq(1) is a command-line utility and a scripting language for JSON processing. JSON is of course served by most web APIs, but did you know that even command-line tools sometimes provide JSON output?
```
blkdev_size_gib=$(lsblk --bytes --json \
  | jq -r '[.blockdevices[].children[].size] | add / (1024*1024*1024)')
```
Also, jq can be used to construct JSON as input to other programs, such as curl, allowing you to perform complex API calls from the command line.

Take a look a the jq tutorial and jq(1).

parallel(1) is a utility for task parallelization. It is rather difficult to master, but the performance gains may well be worth it. For example, to convert many JPEG files in parallel:
```
mkdir -p small
ls *.jpg | parallel convert {} -size 800 small/{}
```
Take a look at parallel(1) and parallel_tutorial(7).

Missing bits

Some bits are still missing and will be added in future revisions of this document. Let us know if you want to contribute any of these:

Interactive vs non-interactive shell use
(*) Shell internals

Acknowledgements

Tomáš Volf provided extensive feedback and several corrections

Thanks!

In POSIX compatibility mode, Bash still understands many so-called “bashisms”, such as [[ … ]]. For good measure, test your scripts under a proper POSIX shell, such as Dash. ↩
Basically, it’s true(1); but unlike true, it’s a so-called special shell built-in. Try the following:
```
% unset var
% var=x :
% echo $var
```
vs.
```
% unset var
% var=x true
% echo $var
```
↩

Shell

Introduction

Terminology

Shell, the interpreter

Shell, the language

Shell script portability

Does portability matter in practice?

Choose your battles

High-level design of shell scripts

Example #1: gitdo

Hashbang

set -eu

usage()

getopts

The actual program

Example #2: snapback

Example #3: passman

Example #4: deadlink

Shell idioms and best practice

printf

Compound command redirections

Short-circuit evaluation

Parameter expansion

Conditional parameter expansion operators

String operators

The argument array $@

Setting array elements

Shifting the array

Transforming the array

Argument pass-through

Naming positional arguments

readonly

Omitting quotes

Suppressing errors with ||:

Beware: of unexpected subshells

Beware: of the exit code of a pipeline

Beware: of unexpected variable scope

Beware: of assuming a particular working directory

Beware: of eval

Beware: of limitations of set -e

Beware: of passing secrets on the command line

Powerful tools

Further reading

Missing bits

Acknowledgements

Suppressing errors with `||:`