SNOBOL4 is a very large and rich language, providing a diverse assortment of built-in features. It is also an extensible language; it allows you to define new data types, functions, and operators. You can, by creating your own entities, obtain another level of conciseness and power of expression.
We will begin with program-defined functions because they allow a program to be partitioned into smaller, more manageable segments. As functions tend to be just a few lines long, transfers of control within them are usually obvious and manageable. If your main program has complex, intertwined GOTOs, consider how the use of functions would clarify things.
Functions also allow us to postpone the complete development of an algorithm. We can design the overall program structure, using function names for components which will be developed later. Furthermore, if a particular function proves inefficient, it can be replaced later with an improved version.
The concept of a function should be clear from all the examples of SNOBOL4's built-in functions. A function accepts some number of arguments, performs a computation based on their values, and returns a result and a success signal. A function can also signal failure, and not return any value.
We can define a new function by specifying its name and arguments. The definition will be composed of "dummy arguments" -- place holders that show how the arguments are to be used in the function. Later, when the function is called, the actual arguments will replace the dummy arguments in the computation.
We define a new function in SNOBOL4 by using the built-in function DEFINE. We call it with a "prototype string" containing the new function's name and arguments. DEFINE makes the new function's name known to SNOBOL4, so it can be used subsequently.
Suppose we want to create a new function called SHIFT, which would circularly rotate a string through a specified number of character positions. We'll define all rotations as being to the left -- characters removed from the front of the string are placed back on the end. For example, SHIFT('ENGRAVING',3) would return the string 'RAVINGENG'.
We will begin by defining the function name and its dummy arguments, S and N. Any names of your choosing can by used for dummy arguments. In a program, it would look like this:
DEFINE('SHIFT(S,N)')It is important to realize that the DEFINE function must be executed for the definition to occur. Most other programming languages process function definitions when a program is compiled. SNOBOL4's system is more flexible; the prototype string can itself be the result of other run-time computations. In an extreme case, data input to a program could determine the names and kinds of functions to be defined.
Having declared the function name and dummy arguments, we need to provide the statements which will implement the function. A very simple convention applies:
When the function is used, SNOBOL4 transfers control to a statement label with the same name as the function.
In this case, the first statement of the function would be labeled SHIFT. There is no limit to the number of statements comprising the function body.
First, a function may return a value by assigning it to a variable with the same name as the function. If no assignment occurs, the result is the null string.
Second, the function must tell SNOBOL4 that it is finished, and that control should return back to the caller. It does this by transferring to the special label RETURN.
The label RETURN should not appear anywhere in your program. It is a special name, reserved by SNOBOL4 for just this purpose.
With this information, we can now write our SHIFT function. We will remove the first N characters from the beginning of the argument string, and place them on the end. The function body looks like this:
SHIFT S LEN(N) . FRONT REM . REST SHIFT = REST FRONT :(RETURN)Each time SHIFT is called, the particular arguments used are placed in S and N. The first statement splits S into two parts, assigning them to variables FRONT and REST. The second statement reassembles them in the shifted order, and assigns them to vari-
able SHIFT, to be returned as the function result. The GOTO then transfers to label RETURN to return back to the caller.
What happens if we try the function call SHIFT('PEAR',7)? As the function is defined above, the pattern match would fail, since LEN(7) is longer than the subject string. The assignment to FRONT and REST would not take place, and the function would return an erroneous result.
Now we could extend the definition of SHIFT to cycle the argument string multiple times. In general, though, we want to develop a convenient method that allows a function to signal an exceptional condition back to the caller. Function failure allows us to do just that. Another convention is provided:
Transferring to the special label FRETURN returns from a function signaling failure to the caller. No value is returned as the function result.
We can now rework the function body to signal failure when N is too large. In this case, the pattern match fails, and we detect the failure in the GOTO field:
SHIFT S LEN(N) . FRONT REM . REST :F(FRETURN) SHIFT = REST FRONT :(RETURN)In general, the transfer to FRETURN does not need to be the result of the failure of a particular statement. Any success or failure could be tested to produce a transfer to FRETURN. For example, if we decided to explicitly test the length of S, the function could begin with:
SHIFT GT(N, SIZE(S)) :S(FRETURN) . . .
FRONT and REST were used in this function as temporary variables to rearrange the argument string. If they had appeared elsewhere in your program, their old values would be destroyed. Such inadvertent conflicts become harder to avoid as your function library grows. The prototype string used with DEFINE can specify "local variables" to be protected when the function is called. For our SHIFT function, the call would look like this:
DEFINE('SHIFT(S,N)FRONT,REST')The local variables appear after the argument list. When SHIFT is called, any existing values for FRONT and REST will be saved on a pushdown stack. FRONT and REST are set to the null string,
and control is transferred to the first statement of the function body. When the function returns, FRONT and REST are restored to their previous values.
Since the same potential problem exists for dummy arguments S and N, SNOBOL4 automatically saves their values before assigning the actual arguments to them. And just like local variables, when the function returns, the dummy arguments are restored to their original values.
Once a function has been defined, it may be used in exactly the same manner as a built-in function. It may appear in a statement anywhere its value is needed -- in the subject, pattern, or replacement fields. If used with the indirect reference operation, functions may even be used in the GOTO field. Of course, a function may be used as the argument of another function.
The value returned by a function is not restricted to strings. Any SNOBOL4 data type, including patterns, may be returned. Earlier, in the pattern match chapter, we showed how simple patterns could be tailored to our needs by using them in more complicated clauses. The specific example was a variation of the BREAK pattern which would not match the null string. Let's use a programdefined function to create a new function, BREAK1, with this property. The definition statement might look like this:
DEFINE('BREAK1(S)')and the function body, like this:
BREAK1 BREAK1 = NOTANY(S) BREAK(S) :(RETURN)This function can now be used directly in a pattern match. For example, BREAK1('abc') constructs a pattern which matches a nonnull string, up to the occurrence of the letters 'a', 'b', or 'c'. Of course, the pattern returned by a function can be as complex as desired, giving us an elegant method to define our own pattern matching primitives.
SNOBOL4 does not know or care which statements belong to a particular function. There is no explicit END statement for individual functions. To keep programs readable, we'll have to impose some discipline of our own. Also, having to execute the DEFINE function is a mixed blessing. It offers tremendous flexibility, but requires us to place all our DEFINE's at the beginning of a program. Here is the system proposed by Gimpel, which I like to use to manage functions and their definitions:
We keep the function definition, any one-time initialization, and the function body together as a unit. A GOTO transfers control around the function body after the definition and initialization statements are executed. Also present are comments describing its use and any exceptional conditions. Rewriting the SHIFT function in this form, and taking this opportunity to avoid rebuilding the pattern each time the function is called, it looks like this:
* SHIFT(S,N) - Shift string S left N character positions. * As characters are removed from the left side of the * string, they are placed on the end. * * The function fails if N is larger than the size of S. DEFINE('SHIFT(S,N)FRONT,REST') SHIFT_PAT = LEN(*N) . FRONT REM . REST :(SHIFT_END) SHIFT S SHIFT_PAT :F(FRETURN) SHIFT = REST FRONT :(RETURN) SHIFT_ENDNow this group of lines can be incorporated as a unit into the beginning of any program that wants to use it. When execution begins, the first statement defines the SHIFT function. Next we define a pattern, called SHIFT_PAT, for use when the function is called. The pattern definition is only executed once, so we use the unevaluated expression operator (*N) to obtain the current value of N on each function call. After defining the pattern, we "jump around" the function body, to label SHIFT_END. (Remember, we are defining the function now, not executing it; falling into the function body would be an error.) The function is now defined, and ready to be used.
In general, functions should be prepared in this form:
* Fname - Description of use DEFINE('Fname(arg1,...,argn)local1,...,localn') . . . * Any one-time initialization for Fname . . . :(Fname_END) Fname Function body . . . Fname_ENDIf you place your functions in individual disk files, they can be included in new programs as necessary. By preparing functions in this form, they will all be defined and initialized when execution begins.
When discussing pattern matching, we used a pattern to convert a character to its ASCII decimal value. In BASIC, two functions are provided for similar operations: ASC and CHR$. We can create SNOBOL4 equivalents like this:
* ASC(S) - Return the ASCII code for the first character of * string S. * * The value returned is an integer between 0 and 255. * The function fails if S is null. DEFINE('ASC(S)C') ASC_ONE = LEN(1) . C ASC_PAT = BREAK(*C) @ASC :(ASC_END) ASC S ASC_ONE :F(FRETURN) &ALPHABET ASC_PAT :(RETURN) ASC_END * CHR(N) - Converts an integer ASCII code to a one character * string. * * The argument N is an integer between 0 and 255. * The function fails if N is greater than 255. DEFINE('CHR(N)') CHR_PAT = TAB(*N) LEN(1) . CHR :(CHR_END) CHR &ALPHABET CHR_PAT :S(RETURN) F(FRETURN) CHR_ENDNote that both functions were written to work correctly regardless of the anchoring mode in use by the calling program.
(The CHR function is shown here as an example only. Vanilla SNOBOL4 provides a built-in function, CHAR(N), for this purpose. See Chapter 8 of the Reference Manual, "Built-in Functions.")
Function calls in SNOBOL4 transmit the "value" of the argument to the function. Variables used in the function call cannot be harmed by the function. This type of function usage is referred to as "call by value." Occasionally, we might want the function to access the argument variables themselves. The name operator introduced in the previous chapter provides this ability. The function call still transmits a value, but the value used is the "name" of a variable.
Consider a function called SWAP, which will exchange the contents of two variables. If we wanted to exchange the contents of variables COUNT and OLDCOUNT, we would say SWAP(.COUNT, .OLDCOUNT). The function looks like this:
* SWAP(.V1, .V2) - Exchange the contents of two variables. * The variables must be prefixed with the name operator * when the function is called. DEFINE('SWAP(X,Y)TEMP') :(SWAP_END) SWAP TEMP = $X $X = $Y $Y = TEMP :(RETURN) SWAP_ENDThe name operator allows us to access the argument variables. If we had not used it, the function would be called with the variables' values, with no indication of where they came from. Calls to SWAP are not limited to simple variable arguments. Anything capable of receiving the name operator, such as array and table elements, could be used: SWAP(.A<4,3>, .T<'YOU'>).
There are certain situations where call by name occurs implicitly. If the argument is an array or table name, or a programdefined data type (discussed below), it points to the actual data object, which can then be modified by the function. For example, if FILL were a function which loads an array with values read from a file, the statements
A = ARRAY(25) FILL(A)would cause array A to be altered.
The CODE.SNO program was provided to allow interactive experiments with SNOBOL4 statements. If you create functions using the preceding format, they also can be tested using CODE.SNO.
Use your text editor to create a disk file containing the SHIFT function. (Be certain to include the GOTO that transfers around the function body.) Call the file SHIFT.SNO. Now, start the CODE.SNO program, and type the following:
? SLOAD('SHIFT.SNO') Success ? OUTPUT = SHIFT('COTTON',4) ONCOTT ? OUTPUT = SHIFT('OAK',4) Failure
The statements that comprise a function are free to call any functions they choose, including the function they are defining. Of course, for this to make sense, they must call themselves with a simplified version of the original problem, or an endless loop would result. Eventually, the function calls itself with an argument so simple that it can return an answer without any further recursive calls. It's like winding a clock spring up. The central, non-recursive answer to the innermost call provides an answer to the next turn out, with the recursive calls unwinding until the original problem can be solved.
There is no explicit declaration for recursion; any function can be used recursively if it is designed properly. However, all local variables should be declared in the DEFINE function so they will be saved and restored during recursive calls.
Sometimes, recursion can produce dramatically smaller programs. "Algorithms in SNOBOL4" provides a example with the recursive function, ROMAN. It convert's an integer in the range 0 to 3999 to its Roman numeral equivalent. Two premises are required:
The function uses these two rules to produce a recursive solution for some integer N. The algorithm looks like this:
The rightmost digit is removed from the argument and converted by premise 1. Removing the digit effectively divides the argument by 10, simplifying the problem.The previously converted unit's digit is appended to the result.
The reduced argument is then converted by calling ROMAN recursively and "multiplying" the result by 10 according to premise 2.
Here's the function (note that a "plus sign" in column one allows a statement to be continued over several lines):
* ROMAN(N) - Convert integer N to Roman numeral form. * * N must be positive and less than 4000. * * An asterisk appears in the result if N >= 4000. * * The function fails if N is not an integer. DEFINE('ROMAN(N)UNITS') :(ROMAN_END) * Get rightmost digit to UNITS and remove it from N. * Return null result if argument is null. ROMAN N RPOS(1) LEN(1) . UNITS = :F(RETURN) * Search for digit, replace with its Roman form. * Return failing if not a digit. '0,1I,2II,3III,4IV,5V,6VI,7VII,8VIII,9IX,' UNITS + BREAK(',') . UNITS :F(FRETURN) * Convert rest of N and multiply by 10. Propagate a * failure return from recursive call back to caller. ROMAN = REPLACE(ROMAN(N), 'IVXLCDM', 'XLCDM**') + UNITS :S(RETURN) F(FRETURN) ROMAN_ENDThe first call to ROMAN may have an integer argument. The statement labeled ROMAN causes N to be converted to a string, and subsequent recursive calls use a string argument. The recursive calls cease when reducing N finally produces a null string argument -- the match at statement ROMAN fails, and the function returns immediately with a null result.
With the exception of arrays and tables, a variable may have only one item of data in it at a time. In many applications, it is convenient if several data items can be associated with a variable. For example, if we wanted to work with complex numbers, a variable should contain two numbers -- the real and imaginary parts. In an inventory system, an individual product might require values such as name, price, quantity, and manufacturer.
Program-defined data types enlarge SNOBOL4's repertoire to include new objects such as COMPLEX or PRODUCT. SNOBOL4 only provides a system for managing these new types; defining a data type does not magically invest SNOBOL4 with a knowledge of complex arithmetic or inventory accounting. It is still up to you to provide the computational support for each new type.
A program-defined data type will consist of a number of fields,
each containing an individual data element. We begin by selecting names for the data type and fields. An inventory system might use the data type name PRODUCT, and field names NAME, PRICE, QUANTITY, and MFG.
A data type is defined by providing a prototype string to the built-in DATA function. The prototype assumes a form similar to a function call, with the data type taking the place of the function name, and the field names replacing the arguments. The form of the prototype string is:
'TYPENAME(FIELD1,FIELD2,...,FIELDn)'Blanks are not permitted within a prototype. Try creating a new data type using the CODE.SNO program:
? DATA('PRODUCT(NAME,PRICE,QUANTITY,MFG)') SuccessThe DATA function tells SNOBOL4 to define an object creation function with the new data type's name:
PRODUCT(arg1, arg2, arg3, arg4)This new function can be called whenever we wish to create a new object with the PRODUCT data type. Its arguments are the initial values to be given to the four fields which comprise a PRODUCT. The function returns a pointer to the new object, which can be stored in a variable, array, or table. Try creating two new objects as follows:
? ITEM1 = PRODUCT('CAPERS', 2, 48, 'BRINE BROTHERS') ? ITEM2 = PRODUCT('PICKLES', 1, 72, 'PETER PIPER INC.')
The defining call to the DATA function also created several field reference functions. In this case, their names would be:
NAME(arg) PRICE(arg) QUANTITY(arg) MFG(arg)The argument used with each function is an object created by the PRODUCT function. Try accessing ITEM1's fields:
? OUTPUT = MFG(ITEM1) BRINE BROTHERS ? OUTPUT = PRICE(ITEM1) * QUANTITY(ITEM1) 96We can alter the value of a field after an object is created. Field reference functions can also be used as the object of an assignment, so:
? QUANTITY(ITEM2) = QUANTITY(ITEM2) - 12changes the QUANTITY field of ITEM2 from 72 to 60.
It is important to recognize that variables like ITEM1 and ITEM2 contain "pointers" to the data. Assigning ITEM1 to another variable, say LASTITEM, merely copies the pointer; both variables still point to the same physical packet of data in memory. Altering the QUANTITY field of ITEM1 would alter the QUANTITY field of LASTITEM. This is the same behavior observed earlier for array and table names.
The built-in COPY function creates a unique copy of an object-- one which is independent of the original. Try using it with CODE.SNO:
? LASTITEM = COPY(ITEM1) ? QUANTITY(ITEM1) = 24 ? OUTPUT = QUANTITY(LASTITEM) 48
Our inventory example used string and integer values as the field contents. In fact, any SNOBOL4 data type may be stored in a field, including pointers to other program-defined types. Complex structures, such as queues, stacks, trees, and arbitrary graphs may be created.
For example, if we wanted to link together all products made by the same manufacturer, PRODUCT could be defined with an additional field. We won't go through the exercise with CODE.SNO, but will sketch out the changes:
DATA('PRODUCT(NAME,PRICE,QUANTITY,MFG,MFGLINK')As each product is defined, we will determine if we have another product from the same manufacturer. If so, MFGLINK is set to point to that other product. If not, it is set to the null string. A table M provides a convenient way to keep track of manufacturers. Assume variable COMPANY contains the manufacturer's name as each product is defined. Then all of the requisite searching and linking can be accomplished in one statement:
M<COMPANY> = PRODUCT(..., ..., ..., COMPANY, M<COMPANY>)If this is the company's first appearance, it is not in the table, and the last argument to the PRODUCT function sets MFGLINK to the null string. The assignment statement uses the company as the table subscript, and the entry points to the current product.
If another product definition uses the same company, MFGLINK will point to the previous product, and the table will be updated to point to the current product. In this manner, all products from a manufacturer will be threaded together. Each thread starts with a table entry, and goes through each product's MFGLINK field, ending with a null string in the last product's MFGLINK.
Now if we wanted to display all products supplied by a particular manufacturer, we select and follow the appropriate thread:
X = M<COMPANY> LOOP OUTPUT = DIFFER(X) NAME(X) :F(DONE) X = MFGLINK(X) :(LOOP) DONE
The DATATYPE function allows you to learn the type of data in a particular variable. It is useful when the kind of processing to be performed depends on the data type. The formal data type name is returned as an upper-case string:
? OUTPUT = DATATYPE(54) INTEGER ? OUTPUT = DATATYPE(ITEM1) PRODUCT
If you can define new functions and data types, why not new operators too? Indeed, SNOBOL4 provides this feature, although most programs can be written without it. For the sake of completeness, we'll provide a brief discussion.
Unary or binary operators can be thought of as functions of one or two arguments. For example, A + B can be written in functional form as PLUS(A,B), where PLUS is some function which implements addition. Operators can be redefined by specifying a function to replace them. We still write our program in terms of the operator's graphic symbol, but SNOBOL4 will use the specified function whenever the operator must be performed.
The built-in function OPSYN creates synonyms and new definitions for operators. Synonyms permit different names or symbols to be used in place of a function or operator. The general form of OPSYN is:
OPSYN(new name, old name, i)The new name is defined as a synonym of the old name. The third argument is 0, 1, or 2 if we are defining functions, unary operators, or binary operators respectively.
We can make the name LENGTH a synonym for the SIZE function:
? OPSYN('LENGTH', 'SIZE', 0) ? OUTPUT = LENGTH('RABBIT') 6The word synonym is not quite an accurate description of OPSYN. The name LENGTH becomes associated with the "code" that implements the SIZE function, not with the word SIZE per se. If SIZE was subsequently redefined -- perhaps as a program-defined function--LENGTH would continue to return the number of characters in a string.
Take a moment to examine the tables in Chapter 4 of the Reference Manual, "Operators," in the reference section. Note that in each table there are a number of operator symbols whose definition is <none>.
If you use an undefined binary operator, you'll get an error:
? OUTPUT = 1 # 1 Execution error #5, Undefined function or operationHowever, we could make this operator synonymous with the DIFFER function (which also uses two arguments) and use it instead:
? OPSYN('#', 'DIFFER', 2) ? OUTPUT = 1 # 2 FailureConversely, we can define a function in place of an operator:
? OPSYN('PLUS', '+', 2) ? OUTPUT = PLUS(4, 5) 9Unary operators can be similarly treated, using 1 as the third argument:
? OPSYN('!', 'ANY', 1) ? 'ABC321' !'3C' . OUTPUT COperators can be created to maintain a stack, or navigate around a tree. The full generality of functions and programdefined data types are available to your operators. Through this technique you can make SNOBOL4 speak the language of your particular problem.