Chapter 16

The Shell (II)


In Chapter 8 we introduced some of the shell special characters. By way of review, we learned that the shell interprets the octothorpe # as the beginning of a comment. By itself, the asterisk * is “expanded” by the shell to the names of all files in the current directory. When linked with other characters, such as A* or *B, the shell expands the expression to the names of all files beginning with A or ending with B. The greather-than sign > directs the output to a named file. The vertical bar or pipe | allows the output of one command to be directed to the input of the following command. When placed at the end of a command line, the ampersand & causes the shell to execute the command as a background process, and immediately returns a prompt so the user can execute other commands. The semicolon ; indicates the end of a command; this allows more than one command to be placed on a single line. The backslash \ escapes the special meaning of the immediately following character so it is treated literally. The single quote or apostrophe ' can escape the special meaning of all characters up to the appearance of a matching single quote.

In this chapter we will continue to describe shell special characters and identify their functions. In addition, we will learn about the shell alias function.

Shell Special Characters

The remaining special shell characters include the following: the dollars sign $, the greve `, the less-than sign <, the question mark ?, and the double quote ". We’ll consider the function of each of these characters in turn.

Shell Variables

Like any programming language, the shell allows information to be stored and retrieved through shell variables. Variables can be given all sorts of names, such as value, Meter, A34x and BARLINE3. In order to retrieve information from a variable, the variable name is preceded by a dollars sign. For example, the string $VARIABLE means “the current value of the variable named VARIABLE. Suppose you had a file named $FILE in the current directory ($FILE is a legitimate filename on UNIX systems). If you type:

sort $FILE

The shell will assume that there is a variable named FILE, and retrieve its contents. Since the contents are likely to be empty, the above command is identical to typing:

sort

In order to sort the file named $FILE, the dollars sign would need to be escaped:

sort \$FILE

Depending on the type of shell, variables can be assigned numerical or string values in various ways. For most shells, variables can be assigned using the equals sign (with no intervening spaces). For example, the integer 7 can be assigned to the variable X as follows:

X=7

Or the string “hello” can be assigned to a variable by placing the string in quotation marks:

X="hello"

Single quotation marks can also be used:

X='hello'

If you had a file named hello in the current directory, and if the variable X had been assigned as above, then the following command would sort this file:

sort $X

The Shell Greve

It is often useful to be able to save the results of some operation in a shell variable. Suppose for example, that we want to sort a file containing the word zebra. But we’re not certain what file (or files) contain this word. Manually, we would need to carry out two operations. First we would search for any file(s) containing the word:

grep -l zebra *

We might find that the word “zebra” appears in the files animals and mammals. Having determine what files to sort, now we would actually carry out the appropriate sort command:

sort animals mammals

If we found that word “zebra” occurred in 50 files, then typing the appropriate sort command would require a lot of typing. Alternatively, we could use a shell variable to store the results of the first command, and then retrieve the filenames in the second command. For this, we must use the greve character (`). UNIX shells will execute whatever command(s) appear between two greve characters; the result of the operation can then be treated as a string which may be assigned to a shell variable or used in some other way. An alternate encoding system is to use the equivalent form $(...) instead of `...`. This can help in readability of the code since the greve character looks similar to an apostrophe, and additionally, the dollar/parentheses forms can be nested while the greve forms cannot.

In the following commands, the filenames produced by the grep command are assigned to a shell variable named FILES. In the subsequent command a dollars sign instructs the shell to retrieve the contents of this variable:

FILES=`grep -l zebra *`
sort $FILES

or, use the $() which allows for multiple nested commands.

FILES=$(grep -l zebra *)

Alternatively, we can avoid the FILES variable altogether, and execute the following command:

sort `grep -l zebra *`

or

sort $(grep -l zebra *)

The shell interprets the above command as follows: First it recognizes the presence of the command delineated by greves. This command is executed before the sort command. The grep -l command will generate as output a string of filenames. This output will replace the material delineated by greves. Finally, the sort command will be executed — using the filenames generated by the grep command.

This command structure is useful in a variety of circumstances. For example, suppose we wanted to identify any encoded works that are composed by Josquin and are also in triple meter:

grep -l '!!!COM: Josquin' `grep -l '!!!AMT:.*triple' *`

or

grep -l '!!!COM: Josquin' $(grep -l '!!!AMT:.*triple' *)

Here we have imbedded one grep “inside” another. Remember that the command delineated by the greve is executed first. In this case, we begin by searching all of the files in the current directory for an AMT reference record containing the keyword triple. The l option causes the output to consist of only filenames. Then the second grep is executed. It looks for files that contain a COM reference record containing the keyword Josquin. But this second grep only searches those filenames passed to it by the first grep. In other words, the composer search is restricted to only those files that have a triple meter designation.

Consider another way of using the greve structure. Suppose we have a file named opus16. We would like to know what other works contain the same instrumentation as opus16, but we’ve forgotten what the precise instrumentation is. We can first seach opus16 for the instrumentation data (encoded in the AIN: reference record), and then search for this information in all files in the current directory. This task can be carried out using a single command line:

grep -l `grep '!!!AIN:' opus16` *

or

grep -l $(grep '!!!AIN:' opus16) *

In this example, the imbedded command provides the regular expression rather than the files to be searched.

Single Quotes, Double Quotes

In Chapter 8 we learned that single quotation marks can be used to escape the special meanings of reserved shell characters — such as * and $. Double quotation marks " have a similar effect with one important exception. The dollars sign continues to retain its special meaning inside double quotes.

The UNIX echo command causes information to be printed or displayed. Consider the following three commands:

echo $A
echo "$A"
echo '$A'

In the first and second commands, the shell looks for a variable named A and attempts to echo the contents of this variable on the display. Unless A happens to be a defined shell variable, only an empty line will be displayed. In the third command, the string $A is treated literally, and is echoed back to the display. There are circumstances where the double quotes are more useful, but for most casual users, the single quotes provide the best means for disengaging the meanings of special characters.

Using Shell Variables

Let’s consider an example where shell variables prove to be useful in Humdrum processing. Suppose for some score that we want to change the stem-directions in measures 34 through 38 from up-stems to down-stems. First, we need to establish the line number corresponding to the beginning of measure 34 and the line number corresponding to the end of measure 38 (i.e. beginning of measure 39). In the following script, grep is used to assign these line numbers to the shell variables $A and $B.

A=`grep -n ^=34`
B=`grep -n ^=39`

Now we can construct an appropriate humsed command. Recall that each substitute (s) command in humsed can be preceded by a range indication. In the following command, the $A and $B variables convey the appropriate range to each substitution. This means that the substitutions are limited to the line numbers ranging between $A and $B.

humsed "$A,$Bs/\/XXX/g; $A,$Bs///\/g; $A,$Bs/XXX/\//g" inputfile

Notice that we have used double quotes " rather than single quotes. The quotation marks are necessary to pass all three substitutions as an argument to humsed. Using singe quotes, however, would have caused $A and $B to be treated as literal strings rather than shell variables.

Aliases

An alias is an alternative name for something. The shell provides a way of defining aliases, and these aliases can prove very convenient.

Consider, by way of example, the following common pipeline:

sort inputfile | uniq -c | sort -n

In Chapter 17 we will see that this is a useful way for generating inventories. Typically, this sequence occurs at the end of a pipeline where some preliminary processing has taken place, such as:

timebase -t 8 input | ditto | hint | rid -GLI \
     | sort | uniq -c | sort -n

Since the construction sort | uniq -c | sort -n is so common, we might want to define an alias for it. To do so, we simply execute the alias command. In this case, we’ve defined a new command called inventory:

alias inventory="sort | uniq -c | sort -n"

Having defined this alias, we can now make use of it. Any time we type the word inventory, the shell will expand it to sort | uniq -c | sort -n. The above command can be shortened as follows:

timebase -t 8 input | ditto | hint | rid -GLI | inventory

Another common task is eliminating barlines. Frequently, we need to use the construction:

grep -v ^=

Actually, this is not the most prudent construction. Depending on the spines present in a document, sometimes barlines will be mixed with null tokens in other spines that do not encode explicit barlines. E.g.

.	=23	=23	.	.	=23

A more careful way of eliminating barlines would use the following regular expression:

egrep -v '^(\.   )*='

That is, eliminate all lines that either begin with an equals-sign, or have one or more leading null tokens followed by a token with a leading equals-sign. Since this is somewhat complicated to remember, we might alias it. In the following command, we have created a new command called nobarlines:

alias nobarlines='egrep -v '^(\. )*='

In Humdrum, a good use of aliases is to define commonly used regular expressions. Consider the regular expression used to define tandem interpretations that encode meter signatures. Here we are searching for an asterisk at the beginning of a line, followed by the upper-case letter M followed by a digit, followed by zero or more digits, followed by a slash, followed by a digit:

grep '^\*M[0-9][0-9]*/[0-9]' inputfile

Actually, this regular expression will fail to find any meter signature that is not in the first spine. A more circumspect regular expression will include the possibility of a leading tab:

grep '   *\*M[0-9][0-9]*/[0-9]' inputfile

Since this is a cumbersome regular expression, it can help to provide an alias. Here we have aliased the regular expression to the name metersig:

alias metersig="'    *\*M[0-9][0-9]*/[0-9]'"

Now we can search for meter signatures as follows:

grep metersig inputfile

Reprise

In this chapter we have discussed how the shell interprets the dollars sign $, the greve `, and the double quote ". When followed by printable characters, the dollars sign is interpreted as designating the value of a shell variable. Any command enclosed between two greve characters is executed by the shell first, and the returned output of the command is available as an input parameter to some other command. Like single quotes, double quotes can be used to escape special shell characters; however, an important difference is that the dollars-sign retains its special meaning within the double quotes. This allows shell variables to be embedded into text strings.

We have also learned that the shell alias command can be used to provide a convenient short-hand or way of abbreviating a complex pipeline or regular expression into a single user-defined keyword.