1 files changed, 1093 insertions, 0 deletions
diff --git a/ipl/packs/ibpag2/README b/ipl/packs/ibpag2/README
new file mode 100644
index 0000000..c2f5d82
--- /dev/null
+++ b/ipl/packs/ibpag2/README
@@ -0,0 +1,1093 @@
+
+
+
+
+
+
+		       A User's Manual for Ibpag2
+	       (Icon-Based Parser Generation System 2)
+			     Version 1.2
+
+				- or -
+
+	       How to Use an LR-based Parser Generator
+
+
+		       Richard L. Goerwitz, III
+			University of Chicago
+
+
+
+
+
+
+1.__What_is_Ibpag2?
+
+	Ibpag2 is a so-called "parser generator," i.e. a tool for
+automating the process of generating a recognizer and/or parser from
+abstract structural descriptions of an input language.  Put in more
+practical terms, Ibpag2 is a piece of software that a) reads a source
+file containing a grammar that defines an input language, and then b)
+outputs an automaton that recognizes that language.  The user may, at
+his or her option, specify actions this automaton should take when it
+sees various substructures within its input language.  By default,
+however, the parser simply recognizes a given sequence as belonging,
+or not, to that language.
+
+	Ibpag2 utilizes so-called "LR" table generation and parsing
+algorithms.  These algorithms facilitate construction of reasonably
+fast deterministic pushdown automata that are powerful enough to
+handle most commonly used programming language constructs.  LR-based
+systems come in three main flavors: SLR(1), LALR(1), and LR(1).  The
+LR(1) flavor is fairly easy to implement, but uses too many resources
+to be practical.  LALR(1) algorithms are harder to implement, but much
+faster, and the parse tables they construct use considerably less
+memory than do those of their LR(1) counterparts.  SLR(1) algorithms
+are the easiest to implement, compile the fastest, and use about as
+much memory as LALR(1)s.  SLR(1) is the least powerful of the three,
+though, so there is a tradeoff.  Ibpag2 is an "enhanced" SLR(1) parser
+generator.  It is enhanced in the sense that it can operate both in
+its native SLR(1) mode, and in a more powerful "quasi-GLR" mode (on
+which, see section 5 below).
+
+	As its full title ("Icon-Based Parser Generator 2") implies,
+Ibpag2 is written in Icon [2,3], as are the automata it creates.
+Ibpag2 has been tested with Icon version 8.10.  So far I have only run
+it on an i386 box running Xenix 2.3.3, and on a Sun 4 running some
+version of SunOS.  I have many reports, though, of it running under
+other UNIX variants.  It will probably also run under other operating
+systems, though modifications will in some instances be required.
+Using Ibpag2 under MS-DOS may not be possible, on account of the way
+it manages memory.
+
+	The Ibpag2 distribution adheres to de facto UNIX installation
+standards: Just set the appropriate variables in the makefile, and
+then "make install."  For those who are using a non-UNIX system, or
+who have not installed such a package before, there is a section at
+the end entitled "Installing Ibpag2" that details the installation
+procedure (section 6).
+
+	Aside from the above-mentioned installation section (6), the
+remainder of this document aims to provide the reader a) with a
+simple, practical explanation of what LR-family parser generators are
+and how they work (section 2), and b) with a set of directions
+specifically on how to use Ibpag2 (section 3).  There is also an
+advanced section on debugging (4), and one on using Ibpag2 with non-LR
+and/or ambiguous languages (5).  The discussion is geared for those
+that have little or no experience in parsing or automaton theory.  For
+very advanced reading, consult the bibliography.  For a brief summary
+of Ibpag's command-line options, see the main Ibpag2 source file,
+ibpag2.icn, or invoke ibpag2 with the -h (help) option.
+
+	In general, be warned that Ibpag2 works best with small or
+medium-sized grammars.  Its parse tables have to be reconstructed at
+run-time, and the code for doing this can become a bit cumbersome for
+grammars with more than 100 rules and fifty or so terminal symbols.  I
+myself have processed grammars with as many as 300 terminals and 400
+rules.  Although the resulting automata run well enough, the output
+files are over 300k, and Ibpag2 takes a long time to create them.  If
+you must use Ibpag2 with a very large grammar symbols, try the -c
+command-line option (which produces compressed parse tables).  This
+option is discussed below, in section 4.  Compiling (rather than
+interpreting) Ibpag2 may result in much faster processing, as will
+resetting your BLOCKSIZE and STRSIZE environment variables.  See the
+installation section (6) below on using the Icon compiler to create
+the Ibpag2 executable.  Good starting values for BLOCKSIZE and STRSIZE
+are triple their default values (i.e. 3 x 65000).  These variables are
+discussed in the Icon manual page.
+
+	My ultimate aim in writing this document has been to make
+accessible to the non-CS portion of the Icon community what for them
+might seem an inaccessible branch of applied parsing and automaton
+theory.  I am a philologist myself, and feel that there is a great
+deal that can and ought to be done to make advanced tools accessible
+to people with other interests than twiddling bits or pondering the
+true meaning of epsilon closures :-).
+
+	Any comments on the Ibpag2 system itself or its documentation
+will be gratefully received.  Write to me at the address appended to
+the final section (6).
+
+
+2.__What_is_an_LR_Parser_Generator?
+
+	Back in the late 50s and 60s, linguists, mathematicians, and
+software engineers all became intensely interested in the formal
+properties of languages: Can they be described as a series of logical
+structures and relations?  Can computers recognize and manipulate
+these structures efficiently?  Linguists, in particular, quickly
+realized that the amount of structural complexity, ambiguity, and pure
+noise in natural language would render it computationally intractable,
+especially given the limited memory/throughput of then available CPUs.
+Mathematicians and engineers, however, found that many of the
+formalized notations they dealt with could, in fact, be (re)designed
+in such a way that efficient computer processing could - at least in
+principle - be achieved.
+
+	Principle, in this case, did not squarely meet reality until
+viable parser generation tools came into being.  Parser generation
+tools map an abstract structural description of a formal notation or
+"language" to working computer code.  Ideally, the designer simply
+makes assertions like:
+
+	an expression is composed of either
+	    1) a term (e.g. 10), or
+	    2) an expression, a "+" or "-", and another expression
+
+Parser generator systems translate these assertions (the "grammar")
+into a machine, i.e. automaton, that can recognize and/or manipulate
+input streams that conform to the "language" so described.
+
+	Let me dwell, for a moment, on the toy expression grammar
+offered above.  Note that it describes a set of simple mathematical
+constructs like:
+
+	9
+	9 + 3
+	9 + 3 - 8
+
+According to the specifications given above, the nine, three, and
+eight alone constitute terms - which are also expressions (via rule
+1).  Because these terms are also expressions, "9 + 3" can be reduced
+to a larger expression by rule 2.  The same is true for "9 + 3 - 8,"
+except that there rule 2 must apply twice - once for "9 + 3," and then
+again for that and the remainder of the line - in effect grouping the
+expressions as ( ( (9) + (3) ) - (8) ).  It is also possible to group
+the expression ( (9) + ( (3) - (8) ) ), although for the discussion
+that immediately follows this second grouping will be ignored (see
+below on the terms "precedence" and "associativity").
+
+	If we add actions to the above grammar specification, we can
+create a calculator-like automaton.  Traditionally, LR-family automata
+(like the ones Ibpag2 creates) contain a parser, one or more stacks,
+and a set of action tables.  The parser reads from an input stream
+segmented into "tokens" (e.g. TERM, '+', '-'), and then manipulates
+its stacks according to directives contained in so-called "action" and
+"goto" tables.  As it reads the input stream, the parser matches rules
+with action code specified by the programmer, e.g. rule 2 above might
+be matched with code that added/subtracted the expressions on either
+side of the '+'/'-' operator, and produced (in calculator style) the
+result.  Alternatively, it might be matched with code that generated
+an equivalent construct in another language.
+
+	In the case of our toy expression grammar above, the
+corresponding LR automaton operates as follows.  Omitting and/or
+simplifying some of the inner details, it first looks at the input
+stream to see what the next token is.  If the next token is an
+operator or end-of-input, it checks the top of its stack.  If the top
+of the stack has a term on it, that term is popped off, and pushed
+back on, this time renamed as an expression (rule 1 above).  The input
+token is then shifted from the input stream onto the stack, unless it
+is the end-of-input token, in which case the parser returns with a
+result.  If the top of the stack has an expression on it (rather than
+a term), the parser pops the top three elements off of the stack, and
+then either subtracts the third element from the first or adds the two
+together, depending on whether the second element down was the
+addition or subtraction operator, and the result is pushed onto the
+stack as yet another expression.
+
+	Even in this much-simplified form, the automaton's structure
+is complex.  Let us look briefly, therefore, at a practical example of
+its actual workings.  If we were to feed it "9 + 3 + 8," our
+calculator would take the following actions:
+
+	1) read the 9, and push it onto the stack as a term
+	2) see a plus sign on the input stream
+	3) pop the term (9) off of the stack and push it back on again
+	   (this time calling it an expression)
+	4) push the plus sign onto the stack
+	5) read the 3, and push it onto the stack as a term
+	6) see a minus sign on the input stream
+	7) pop the 3 off of the stack and push it back on again (this
+	   time calling it an expression)
+	8) see a minus sign still waiting on the input stream
+	9) pop 9, +, and 3 off of the stack, apply the plus operator
+	   to 9 and 3, then push the result onto the stack again a
+	   single expression (the stack now has 12 on top)
+	10) read the minus sign, and push it onto the stack
+	11) read the 8, and push it onto the stack as a term
+	12) see the end of input coming up on the input stream
+	13) pop the 8 off of the stack and push it back on again as an
+	   expression
+	14) see the end-of-input token still sitting on the input
+	   stream 
+	15) pop 12, -, and 8 off of the stack, apply the minus operator
+	   to 12 and 8, then push the result onto the stack again (the
+	   stack now has 4 on top)
+	16) return the "answer" (i.e. 4)
+
+	This series of actions is hard to describe, and even more so
+to model as part of a hand-written computer program.  And, even if
+such a program were written by hand, this program would have to be
+modified, at times radically, every time the grammar it assumes was
+augmented or changed.  What I am leading up to is that, with a parser
+generator, the hand compilation stage can be eliminated by allowing
+the programmer simply to declare his/her tokens and language specs,
+then have the appropriate automaton constructed with little, or no,
+human intervention.  This is why parser generation tools were critical
+to the development of not just theoretically feasible, but truly
+*practical*, LR-based computer language design systems.
+
+
+3.__Using_Ibpag2
+
+	To recode the above toy expression grammar in
+Ibpag2-compatible format is relatively simple, especially if we omit
+the actions initially, and concentrate on simple recognition.  We need
+only a set of token declarations and three rules.  Certain
+modifications will have to be made to the token declarations later on.
+For general illustration's sake, however, the following will suffice:
+
+	%token TERM, '+', '-'
+	%%
+	expression : TERM
+	expression : expression, '+', expression
+	expression : expression, '-', expression
+
+TERM, and the addition and subtraction operators, are the tokens (i.e.
+the terminals symbols out of which the grammar is constructed - the
+things that the input stream is segmented into).  Note the %token
+keyword used to declare them.  The colon means "is composed of."  The
+double percent sign separates token declarations from the grammar
+proper.
+
+	Adding in our actions - which above were keyed to a complex
+set of decisions based on input tokens and stack conditions - requires
+just a few extra lines of Ibpag2 action code, set off in curly braces:
+
+	%token TERM, '+', '-'
+	%%
+	expression : TERM { return arg1 }
+	expression : expression, '+', expression { return arg1 + arg3 }
+	expression : expression, '-', expression { return arg1 - arg3 }
+
+Using a "|" shorthand for repeated left-hand sides of rules, we may
+reformat this as:
+
+	%token TERM, '+', '-'
+	%%
+	expression : TERM { return arg1 }
+		   | expression, '+', expression { return arg1 + arg3 }
+		   | expression, '-', expression { return arg1 - arg3 }
+
+	ArgX above refers to the Xth element of the right-hand side of
+the preceding rule.  So, for example, arg1 in "{ return arg1 }" above
+refers to TERM - the only right-hand side element of the first rule.
+The action "{ return arg1 }" means, "once you find a TERM and have
+renamed it as an expression, use the value of TERM as the value for
+that expression."  By way of contrast, the action "{ return arg1 +
+arg3 }" means, in conjunction with the rule it follows: "When you find
+an expression consisting of a sub-expression, a plus operator, and
+another sub-expression, use the value of sub-expression 1 + the value
+of sub-expression 2 as the value for the expression as a whole."
+Technically, the action "{ return arg1 }" for expression : TERM is not
+necessary, since the Ibpag2 parser, by default, pushes the value of
+the last RHS arg onto the stack.  For epsilon productions (to be
+discussed below), it pushes &null.
+
+	One serious problem with this set of specifications is that
+the operators '-' and '+' are left associative.  We humans take this
+for granted, because correct algebraic grouping is something our
+high-school math teachers burned into us.  The computer, though, has
+to be told, pedantically, how to group addition and subtraction
+expressions.  It has to be explicitly instructed, in other words, to
+group expressions like "9 + 32 - 4" as (9 + 32) - 4.  Without
+instructions of this kind, the parser does not know, after it has read
+"9 + 32" and is looking at a minus sign, whether to shift the minus
+sign onto the stack, and eventually try to group as 9 + (32 - 4), or
+to reduce "9 + 32" to an expression and group as (9 + 32) - 4.
+Although in this case the grouping may not seem to matter, it
+sometimes does.  Some operators group right to left.  The unary minus
+sign, for example, is one such operator (--4 groups as (- (- 4))).  To
+include the unary minus sign in our grammar, we might append yet
+another rule:
+
+	%token TERM
+	%left '+', '-'
+	%right '-'
+	%%
+	expression : TERM { return arg1 }
+		   | expression, '+', expression { return arg1 + arg3 }
+		   | expression, '-', expression { return arg1 - arg3 }
+		   | '-', expression { return - arg2 }
+
+The trouble with this arrangement is that the minus sign was already
+declared as left associative.  To get around the conflict we use a
+"dummy" token declaration, and a %prec declaration in the applicable
+rule:
+
+	%token TERM
+	%left '+', '-'
+	%right UMINUS
+	%%
+	expression : TERM { return arg1 }
+		   | expression, '+', expression { return arg1 + arg3 }
+		   | expression, '-', expression { return arg1 - arg3 }
+		   | '-', expression %prec UMINUS { return - arg2 }
+
+The %prec declaration simply tells the parser that, even though the
+rule contains a '-' operator, the rule should be handled as if the
+operator were UMINUS.  UMINUS is not actually used as a symbol in the
+right-hand side of any rule (hence the designation "dummy").  It is
+there simply to make the last rule behave as if the minus sign in the
+last rule were different than in the second-to-last rule.
+
+	Let us now add in multiplication and division operators to our
+calculator specifications, and see what happens.  Let me reiterate
+here that the action "{ return arg1 }" for rule 1 (expression : TERM)
+is not strictly necessary, since the default is to push the last RHS
+arg onto the value stack:
+
+	%token TERM
+	%left '+', '-'
+	%left '*', '/'
+	%right UMINUS
+	%%
+	expression : TERM { return arg1 }
+		   | expression, '+', expression { return arg1 + arg3 }
+		   | expression, '-', expression { return arg1 - arg3 }
+		   | expression, '*', expression { return arg1 * arg3 }
+		   | expression, '/', expression { return arg1 / arg3 }
+		   | '-', expression %prec UMINUS { return - arg2 }
+
+Note that the multiplication and division operators were defined
+*after* the addition and subtraction operators.  The reason for this
+is that, technically speaking, the grammar itself is ambiguous.  If we
+treat all operators identically, the parser will not be able to tell
+whether "9 + 1 * 3" should be parsed as (9 + 1) * 3 or as 9 + (1 * 3).
+As we all know from our high-school algebra, multiplication has a
+higher precedence than addition.  You do the multiplications before
+the additions, in other words, no matter where they occur.  To tell
+the parser to behave in this same manner, we declare '*' after '+'.
+Note that, despite their higher priority, the '*' and '/' operators
+are still left associative.  Hence, given "3 / 4 * 7," the parser will
+group its input as (3 / 4) * 7.  As a brain teaser, try to figure out
+how the parser might group the input "9 + 3 / 4 * 7."  Remember that
+higher-precedence rules get done first, but that same-precedence rules
+get done according to associativity.
+
+	The only fundamental problem remaining with the above grammar
+is that it assumes that the end of the input coincides with the end of
+the line.  Is it possible to redefine the language described as
+consisting of arbitrary many lines?  The answer to this question is
+"yes."  One can simply add another set of productions to the grammar
+that state, essentially, that the input language consists of lines
+made up of an expression and a carriage return or of nothing.  Nothing
+is indicated by the keyword epsilon.  Note that only the first rule
+has an action field:
+
+	lines	: lines, expression, '\n'	{ write(arg2) }
+		| lines, '\n'
+		| epsilon
+
+This rule-series may seem rather abstruse, but it becomes a bit
+clearer when you think about what happens on actual input.  If there
+is no input (epsilon), nothing gets printed, because lines : epsilon
+has no action field.  If the parser sees an expression and a newline,
+the parser takes this as an instance of epsilon, plus an expression,
+plus a newline.  This, then, becomes the first component of rule 1 if
+another expression + newline follows, or of rule two if just a newline
+occurs.  Every time an instance of rule 1 occurs, the action "{
+write(arg2) }" is executed, i.e. the value of the expression gets
+printed.  If this still seems hard to fathom, try walking through
+step-by-step.  Even experienced hands may find these sorts of rules
+difficult to construct and debug.
+
+	Note that "lines" is now the so-called "start symbol" of our
+grammar.  It is, in other words, the goal of every parse.  By default
+the left-hand side symbol of the first rule is the start symbol.  This
+may be overridden with a %start declaration in the tokens section (on
+which, see the sample Ibpag2 input file below).
+
+	With our new, multi-line start symbol in place, the only piece
+that needs to be added, in order to make our calculator specification
+a full working input to Ibpag2, is a tokenizer.  A tokenizer is a
+routine that reads input from a file or from some other stream (e.g.
+the user's console), and then segments this input into tokens that its
+parser can understand.  In some cases, the tokens must be accompanied
+by a literal value.  For example, if we encounter a TERM, we return
+TERM, just as it is listed in the %token declaration.  But what is the
+literal value of a TERM token?  It could be, for example, 9, or 5, or
+700.  The tokenizer returns the symbol TERM, in this case, but then
+records that TERM's actual value by setting some global variable.  In
+Ibpag2's parser, this variable is assumed to be "iilval."  In the
+tokenizer, therefore, one might write
+
+	iilval := (literal value)
+	suspend TERM
+
+For literal operators like '+' and '*', there is no need to set
+iilval, since their literal value is irrelevant.  One simply returns
+these as integers (usually via "suspend ord(c)").
+
+	The tokenizer routine is normally appended to the grammar
+after another double percent sign.  Everything after this second
+double percent sign is copied literally to the output file.
+Alternatively, the tokenizer can be $included via Icon's preprocessor.
+Ibpag2 demands that the tokenizer be called iilex, and that it take a
+single file argument, that it be a generator, and that it fail when it
+reaches end-of-input.  Combined with our "lines" productions, the
+addition of an iilex routine to our calculator grammar yields the
+following Ibpag2 input file:
+
+	%token TERM
+	%left '+', '-'
+	%left '*', '/'
+	%right UMINUS
+
+	%start lines
+
+	%%
+
+	expression : TERM { return arg1 }
+		   | expression, '+', expression { return arg1 + arg3 }
+		   | expression, '-', expression { return arg1 - arg3 }
+		   | expression, '*', expression { return arg1 * arg3 }
+		   | expression, '/', expression { return arg1 / arg3 }
+		   | '-', expression %prec UMINUS { return - arg2 }
+
+	lines	   : lines, expression, '\n'	{ write(arg2) }
+		   | lines, '\n'
+		   | epsilon
+
+	%%
+
+	procedure iilex(infile)
+
+	    local nextchar, c, num
+
+	    nextchar := create !(!infile || "\n" || "\n")
+	    c := @nextchar | fail
+
+	    repeat {
+		if any(&digits, c) then {
+		    if not (\num ||:= c) then
+			num := c
+		} else {
+		    if iilval := \num then {
+			suspend TERM
+			num := &null
+		    }
+		    if any('+-*/()\n', c) then {
+			iilval := c
+			suspend ord(c)
+		    } else {
+			if not any(' \t', c) then {
+			    # deliberate error - will be handled later
+			    suspend &null
+			}
+		    }
+		}
+		c := @nextchar | break
+	    }
+	    if iilval := \num then {
+		return TERM
+		num := &null
+	    }
+
+	end
+
+	procedure main()
+	    return iiparse(&input, 1)
+	end
+
+As noted above, the tokenizer (iilex) must be a generator.  It must
+suspend integers either directly (e.g. ord(c)), or else via symbolic
+defines like TERM, created by Ibpag2 on the basis of %token, %right,
+%left, and %nonassoc declarations.  The tokenizer must fail on end of
+input.
+
+	If you like, cut the above code out, place it in a temporary
+file, tmp.ibp, and then feed this file to Ibpag2 by typing "ibpag2 -f
+tmp.ibp -o tmp.icn."  If your system supports input and output
+redirection, type: "ibpag2 < tmp.ibp > tmp.icn."  Ibpag2 will turn
+your grammar specifications and actions into a routine called iiparse.
+If you look above, you will see that I appended a main procedure that,
+in fact, calls iiparse().  Iiparse() takes two arguments: 1) an input
+stream, and 2) a switch that, if nonnull, tells the parser to fail
+rather than abort on unrecoverable errors.  When Ibpag2 is finished
+creating its output file (tmp.icn above), compile that file the way
+you would compile any other Icon program (e.g. "icont tmp").  Finally,
+run the executable.  You should be able to type in various simple
+arithmetic expressions and have the program spit back answers each
+time you hit a return.  The only problem you might encounter is that
+the parser aborts on erroneous input.
+
+	The issue of erroneous input brings up yet another point of
+general Ibpag2 usage.  Normally, if one is processing input, one does
+not want to abort on errors, but rather just emit an error message,
+and to continue processing - if this is at all possible.  To do this,
+Ibpag2 provides a simple but fairly effective mechanism: A reserved
+"error" token.
+
+	When Ibpag2 encounters an error, it will remove symbols from
+its stack until it has backtracked to a point where the error token is
+legal.  It then shifts the error token onto the stack, and tries to
+re-start the token stream at the point where it left off, discarding
+tokens if necessary in order to get itself resynchronized.  The parser
+considers itself resynchronized when it has successfully read and
+shifted three tokens after shifting the error token.  Until then it
+remains in an error state, and will not output additional error
+messages as it discards tokens.
+
+	This explanation may sound a bit abstruse, but in practice it
+is turns out to be quite simple.  To implement error handling for our
+calculator, we really have to add only one production to the end of
+the "lines" section:
+
+	lines	   : lines, expression, '\n'	{ write(arg2) }
+		   | lines, '\n'
+		   | epsilon
+		   | error, '\n'	{
+					  write("syntax error; try again:")
+					  iierrok
+					}
+
+Given the above grammar, the parser will handle errors as follows: If
+an error occurs (say it has an expression then an operator on its
+stack and sees a newline on the input stream) the parser will throw
+out the operator, then check if the error token would be OK in this
+state (which it would not).  Then it would throw out the expression.
+At this point, the stack is in the ready-to-read-a-lines state - the
+state it was in before it read the last expression.  Since "lines" may
+consist of error and '\n,' the error token is legal here, and so the
+parser pushes error onto the stack, then looks back at the input
+stream (where a newline is still waiting).  Since the newline now
+completes the rule lines : error, '\n', the parser pushes the newline
+onto its stack, then executes the action associated with this
+production, i.e. it writes "syntax error; try again:" to the console,
+prompting the user for additional input.
+
+	The keyword "iierrok" in the above error production's action
+field is there for a subtle, but important, reason: It tells the
+parser to consider itself resynchronized, even if three tokens have
+not yet been shifted.  If iierrok were not in the action code for this
+rule, and the user were to supply more bad input after the prompt,
+then the parser would simply discard those tokens, without emitting
+another error message.  Why?  Because, as you will recall, the parser
+discards tokens after an error, in efforts to resynchronize itself.
+Until it reads and shifts three tokens successfully, it considers
+itself in an error state, and will not emit additional error messages.
+The three-token resync rule is there to prevent a cascade of
+irrelevant error messages touched off by a single error.  In our
+calculator's case above, though, we are smarter than the parser.  We
+know that it is resynchronized as soon as it reduces error, '\n' to
+lines.  So if a syntax error occurs on the next token, it should be
+reported.  Adding "iierrok" to the action insures that the parser will
+do just this.
+
+	In addition to iierrok, there are several other directives
+Ibpag2 accepts as part of the action code segments.  These are as
+follows:
+
+	iiclearin		clear the current input token
+	IIERROR			perform error recovery
+	IIACCEPT		simulate an accept action
+
+There are several other directives (all implemented as macros) that
+Ibpag2 accepts in GLR mode.  For a discussion of GLR mode, see below,
+section 5.  IIERROR in particular, and error recovery in general, work
+a bit differently in that mode than they do in Ibpag2's normal (i.e.
+LR) mode.
+
+	There are admittedly many other topics that might be covered
+here.  This treatment, however, is intended as a general nontechnical
+introduction, and not as a complete textbook on parser generation use.
+If you want to learn more about this topic, consult the bibliography.
+Also, check the UNIX manual pages on the YACC utility (Yet Another
+Compiler Compiler).  Ibpag's input format is fairly close (too close,
+perhaps) to YACC's.  In fact, most of what is said about YACC in UNIX
+documentation can be carried directly over to Ibpag2.  Several salient
+differences, though, should be kept in mind:
+
+        1) YACC's "$$ = x" constructs are replaced by "return x" (e.g.
+           "$$ = $1 + $3" -> "return $1 + $3" [$1 is a synonym for
+	   "arg1", $3 for "arg3", etc.])
+
+        2) all variables within a given action are, by default, local
+           to that action; i.e. they cannot be accessed by other
+           actions unless you declare them global elsewhere (e.g. in
+           the pass-through part of the declarations section %{ ...
+           %})
+
+        3) the %union and %type declarations/tags are not needed by
+	   Ibpag2 (both for better and for worse)
+
+        4) tokens and symbols are separated from each other by a comma
+           in Ibpag2 files (e.g. %token '+', '-' and S : NP, VP)
+
+        5) epsilon is indicated by the keyword "epsilon" (e.g. REL :
+           epsilon), and not by an empty RHS
+
+        6) both epsilon and error *may* be declared as %tokens for
+           reasons of precedence, although they retain hard-coded
+           internal values (-2 and -1, respectively)
+
+        7) all actions must follow the last RHS symbol of the rule
+           they apply to (preceded by an optional %prec directive); to
+           achieve S : NP { action1 }, VP { action2 }, insert a dummy
+           rule: S : NP, dummy, VP { action2 }; dummy : epsilon {
+           action1 } ;
+
+        8) YYERROR, YYACCEPT, yyclearin, and yyerrok are the same,
+           except they are written IIERROR, IIACCEPT, iiclearin, and
+           iierrok (i.e. "ii" replaces "yy")
+
+        9) Ibpag2's input files are tokenized as modified Icon files,
+           and, as a consequence, Icon's reserved words must not be
+           used as symbols (e.g. "if : if, then" is no go)
+
+I myself find YACC to be ugly.  As a result, Ibpag2 is not an exact
+YACC clone.  I would like to underscore the fact that I have no
+intention to move in this direction, either.  It's as YACC-like as
+it's going to get!
+
+	Both YACC and non-YACC users should note number 9 in the above
+list.  Don't use things like "while," "every," "do," etc. as symbols
+in your grammar!  Just use the same rules for Ibpag2 nonterminals as
+for Icon variables, and you'll be OK.
+
+	For those that just can't bear using anything but a strictly
+YACC-conformant system, I've included a preprocessor with the Ibpag2
+distribution called (at one user's recommendation) "iacc."  Iacc reads
+&input - assumed to be a YACCish grammar - and sends to &output an
+Ibpag2-conformant file.  I have not tested this file extensively, and
+there are likely to be bugs in the way I've handled the necessary 2
+token lookaheads and value stack references.  Give it a whirl, though,
+if you are feeling adventurous.  The only reason I personally use Iacc
+is that some YACCs (e.g. BSD YACC) have particularly nice debugging
+messages and help.  If my grammar is particularly complex, I just run
+it through YACC without action code first, then use Iacc to convert it
+to Ibpag2 format.  Iacc's output, as I noted, is not meant to be
+pretty, so I invariably end up doing a little editing - usually just
+respacing a few rules, and re-inserting any comments that I might have
+put in the original YACC file.
+
+	In general, Ibpag2 (like YACC) handles epsilon moves and
+indirect cycles.  LR-mode shift-reduce conflicts are also handled in
+the normal way (i.e. pick the rule with the highest priority, and, in
+cases where the priority is the same, check the associativities).  In
+contrast to YACC, Ibpag2 flags reduce/reduce conflicts as errors
+(since these often conceal deeper precedence problems and just plain
+kludges).  Reduce/reduce conflict errors are easily enough remedied,
+if need be, via (dummy) precedences.  One can convert these errors to
+warnings by specifying -y on the command line.  With the -y option,
+reduce/reduce conflicts are resolved in favor of the rule that occurs
+first in the grammar.  The -y switch also prevents Ibpag2 from
+aborting on shift/reduce conflicts, telling it instead to resolve in
+favor of shift.  Basically, -y is a partial YACC compatibility switch.
+Normally (i.e. in SLR mode) Ibpag2 is much more finicky than YACC
+about conflicts in its grammars.
+
+	Also in contrast to YACC, Ibpag2 supports multiple
+simultaneous parsers.  Ibpag2 normally names its main parser routine
+iiparse().  By using the -m command-line option, however, you can
+override this default behavior, and force Ibpag2 to augment this name
+in some uniquely identifiable fashion.  For example, "ibpag2 -m _1 <
+tmp.ibp > tmp.icn" will force Ibpag2 to write a parser called
+"iiparse_1" to tmp.icn.  Note that, instead of calling iilex, this
+iiparse_1() routine will now call iilex_1, and all necessary global
+variables will have _1 appended to them (e.g. errors will become
+errors_1).  I don't expect that many people will have occasion to use
+this feature.  It is there, though, for those that want it.
+
+
+4.__Debugging
+
+	Constructing and debugging LR(1) family parsers can sometimes
+be hair raising, even with a parser generator.  Several precautions
+can be taken, however, to minimize the agony.  The first is to declare
+all tokens initially as part of a single %token declaration, i.e. with
+no precedences, and with the same associativities.  Also, leave out
+action code until the grammar seems to be working.  In this stage, you
+can even run the grammar through (BSD)YACC or GNU Bison.  All you
+would need to do is remove the commas between tokens and symbols, and
+place a semicolon at the end of every rule.  During this and all
+debugging stages, supply Ibpag2 with a -v command-line switch.  This
+will cause Ibpag2 to write a summary of rules, tokens, and its two
+state tables to "ibpag2.output" (a bit like GNU Bison, but with a
+hard-coded name).  If you get messages about conflicts in your parse
+tables (e.g.  "unresolvable reduce/reduce conflict, state 5, token
+257, rules 4,5").  This file will tell you what rules these are, and
+what token number 257 is.  Use precedences and associativities to
+clear these problems up as they arise.  If you are comfortable having
+reduce/reduce errors resolved by the order in which the conflicting
+rules occur, then use the -y command-line switch.  With -y on the
+command line, Ibpag2 will always resolve in favor of the earlier rule.
+This option will also cause it to resolve all shift/reduce conflicts
+in favor of shift.
+
+	There are certain languages that are not ambiguous that SLR(1)
+parsers like Ibpag2 will fail to produce an unambiguous parse table
+for.  The classic example is
+
+	expr : lval, '=', rval | rval
+	lval : '*', rval       | ID
+	rval : lval
+
+C programmers will recognize this as a toy expression grammar with
+code for identifiers, assignments, and pointers.  The problem is that
+if we feed this grammar to Ibpag2, it will claim that there is a
+conflict on lookahead '='.  In truth, there is no ambiguity.  The SLR
+parser simply doesn't remember the pathway the parser used to get to
+the state it is in when it sees '=' on the input stream.  Whether the
+parser gets into this state by seeing '*' plus and ID, or by seeing
+just an ID, it knows to turn the ID into an lval.  Then it knows to
+turn lval into rval.  At this point, though, it doesn't know whether
+to shift the = sign via rule 1, or to turn rval and the preceding '*'
+into an lval.  The parser has "forgotten" that the '*' is there
+waiting on level down on the stack!
+
+	The solution to this problem is actually quite simple (at
+least in concept).  Just provide a unique pathway in the grammar for
+the conflicting rules.  In this case, they are rules 1 and 5 (the
+first and last):
+
+	expr : lval, '=', rval | rval
+	lval : '*', pval | ID
+	pval : lval
+	rval : lval
+
+Now when the parser sees '*,' it can only have a pval after it.  Never
+mind that pval is composed of precisely the same things as rval.  The
+point is that the parser generator follows a different route after
+seeing '*' than if it starts with ID and no preceding '*'.  Hence it
+"remembers" that that the '*' is back on the stack, waiting for the
+"lval : '*', pval" rule to apply.  There is no more conflict.
+
+	Go ahead and run these grammars through Ibpag2 if you aren't
+sure what is going on.  Remember to declare ID as a token, and to
+place "%%" in the appropriate spot!
+
+	If you get your parser up and running, but find that it is not
+functioning quite the way you expect, add the following line somewhere
+near the start of Ibpag2's output file:
+
+	$define IIDEBUG
+
+If you like, you can add it to the beginning of your Ibpag2 input
+file.  Place it in the declarations section (before the first double
+percent sign), and surround it by %{ and %}, e.g.:
+
+	%{
+	$define IIDEBUG
+	%}
+
+This tells Ibpag2 to send $define IIDEBUG straight through to the
+output file.
+
+	What defining IIDEBUG does is tell iiparse, once compiled, to
+emit profuse debugging messages about the parser's actions, and about
+the state of its stacks.  This display will not make a whole lot of
+sense to anyone who doesn't understand LR-family parsers, so those who
+want to access this feature should perhaps go through a standard
+reference like Aho, Sethi, and Ullman [1].
+
+	If, after you are finished debugging your grammar, you find
+that Ibpag2's output files are rather large, you may try saving space
+by compressing the action and goto tables.  This is accomplished by
+invoking Ibpag2 with the -c (compress) option.  Using this option
+makes debugging difficult, and makes the parser run a bit more slowly.
+It also only works for rather large grammars with long nonterminal
+symbol names.  Don't even consider it until the grammar is thoroughly
+debugged and you have determined that the output file's size is just
+too great for practical use.  Even then, compression may or may not
+help, depending on how long your nonterminal names are.  In general,
+Ibpag2 is best as a teaching tool, or as a production system for
+medium or small grammars.
+
+
+5.__Using_Ibpag2_with_Non-LR_Grammars
+
+	There may be times when you *want* to parse languages that no
+LR-based algorithm can handle.  There may be times, that is, when the
+grammar you want to use contains conflicts or ambiguities that are
+there by design, and not by oversight.  For example, you may want to
+parse a natural language.  Full-blown natural languages involve many
+highly ambiguous constructs, and are not LR-parsable.  By invoking it
+with the -a option, Ibpag2 can parse or recognize certain natural
+languages, or, more practically speaking, certain NL subsets.  The
+letter "a" in -a is supposed to stand for "ambiguous," although what
+this option really does is put Ibpag2 into a quasi-GLR mode - i.e.
+into a kind of "generalized" LR mode in which it can accept non-LR
+grammars [4,5].
+
+	User-visible changes to Ibpag2's operation in quasi-GLR mode
+(i.e. with the -a option) are as follows:
+
+	1) iiparse() is now a generator
+	2) action code can use suspend as well as return
+	3) IIERROR places the current thread in an error state (i.e.
+	   it doesn't *necessarily* trigger error recovery; see below)
+	4) there are two new action-code directives (iiprune and
+           iiisolate) and a general define (AUTO_PRUNE)
+	5) conflicts due to ambiguities in the grammar no longer
+	   result in aborted processing (so, e.g., if you do not
+	   specify the -y option on a grammar with reduce/reduce
+	   conflicts, Ibpag2 will simply generate a parser capable of
+	   producing multiple parses for the same input)
+
+	In quasi-GLR mode, iiparse() should be invoked in a way that
+will render multiple results usable, if they are available (e.g.
+"every result := iiparse(&input) do...".  Action code is also allowed
+to produce more than one value (i.e. to use suspend).  When it does
+so, iiparse() creates separate parse threads for each value.  So, for
+instance, if your action code for some production suspends both of the
+following lists,
+
+	["noun", "will", "gloss: desire"]
+	["noun", "will", "gloss: legal document mandating how _
+	                  one's possessions are to be disposed _
+			  of after one's death"],
+
+iiparse() would create two separate parse threads - one for each
+result.  Note that in this case, the syntactic structure of each
+thread is the same.  It is their semantics (i.e. the stuff on the
+value stack) that differs.
+
+	If you use the iierrok and iiclearin macros in your action
+code before suspending any result, their affect persists through all
+subseqent suspensions and resulting parse threads.  If you use these
+macros after suspending one or more times, however, they are valid
+only for the parse thread generated by the next suspension.  By way of
+contrast, the IIERROR macro *always* flags only the next parse thread
+as erroneous.  Likewise, IIACCEPT always simulates an accept action on
+the next suspension only.  IIERROR and IIACCEPT, in other words, never
+have any effect on subsequent suspensions and parse threads other than
+the one that immediately follows them.  This is true of iierrok and
+iiclearin only when used after the first suspension.
+
+	In quasi-GLR mode, IIERROR (number three in the difference
+list above) becomes a mechanism for placing the current parse thread
+in error mode.  This is similar to, but not quite identical to, how
+IIERROR functions in straight LR mode.  In quasi-GLR mode, if other
+threads can carry on the parse without error the erroneous parse
+thread is quietly clobbered.  Full-blown error recovery only occurs if
+all of the other parsers halt as well.  This makes sense if you think
+about it.  Why keep erroneous threads around when there are threads
+still continuing a valid parse?  For some large interactive systems,
+it might be necessary to keep bogus threads around longer, and weed
+them out only after a lengthy grading process.  If you are
+constructing a system such as this, you'll have to modify Ibpag2's
+iiglrpar.lib file.  In particular, you'll need to change the segment
+in iiparse() that takes out the trash, so to speak, in such a way that
+it does so only if the error count in a given parser either rises
+above a specific threshhold or else exceeds the number of errors in
+the "most correct" parser by a certain amount.  This is not that hard
+to do.  I just don't expect that most parsers people generate with
+Ibpag2 will use IIERROR or error recovery in general in so involved a
+fashion.
+
+	Iiprune and iiisolate (number 4 above) are used to control the
+growth of the parallel parser array.  In order to give straightforward
+(read "implementationally trivial") support for action code, Ibpag2
+cannot create a parse "forest" in the sense that a standard GLR parser
+does.  Instead, it simply duplicates the current parser environment
+whenever it encounters a conflict in its action table.  Even if the
+conflict turns out to reflect only a local ambiguity, the parsers, by
+default, remain separate.  Put differently, Ibpag2's quasi-GLR parser,
+by default, makes no direct effort to reduce the size of its parser
+arrays or to alter the essentially linear structure of their value and
+state stacks.  Size reduction, where necessary and/or desirable, is up
+to the programmer.  What the iiprune macro is there to do is to give
+the programmer a way of pruning a given thread out of the active
+parser list.  Iiisolate allows him or her to prune out every thread
+*but* the current one.  AUTO_PRUNE makes the parser behave more like a
+standard GLR parser, instructing it to prune parse threads that are
+essentially duplicating another parse thread's efforts.  The parser,
+though, does not build a parse tree per se, the way most GLR parsers
+typically do, but rather manipulates its value stack like a
+traditional LR-family parser.
+
+	Iiprune is useful when, for example, the semantics (i.e. your
+"action" code segments) determine that a given parse thread is no
+longer viable, and you want to signal the syntactic analyzer not to
+continue pursuing it.  The difference between iiprune and IIERROR is
+that iiprune clobbers the current parser immediately.  IIERROR only
+puts it into an error state.  If all active parsers end up in an error
+state, and none can shift additional input symbols, then the IIERROR
+macro induces error recovery.  Iiprune does not.  NB: iiprune, if used
+in action code that suspends multiple results, cancels the current and
+remaining results (i.e. it does not clobber parsers already spun off
+by previous suspensions by invocation of that same code; it merely
+cuts the result sequence).  Iiprune essentially stands in for "fail"
+in this situation.  Fail itself can be used in the code, but be warned
+that iiparse() will still push *at least one* value onto its value
+stack, even if a given action code segment fails.  This keeps the
+value stack in sync with the syntax.  To avoid confusion, I recommend
+not using "fail" in any action code.
+
+	Iiisolate is useful if, during error recovery, you prompt the
+user interactively, or do something else that cannot be elegantly done
+in parallel for two or more distinct parse threads.  Iiisolate allows
+you to preserve only the the current parse thread, and to clobber the
+rest.  Iiisolate can also be useful as a way of making sure that only
+one thread carries on the parse in non-error situations.  Suppose that
+we have a series of productions:
+
+	sentences  :  sentences sentence '\n'
+			{ print_parse(arg2) }
+		   |  sentences '\n'
+		   |  error '\n'
+		   |  epsilon
+
+If we get a sentence with more than one parse, all of the underlying
+threads that produced these parses will be active for the next
+sentence as well.  In many situations this will not be what we want.
+If our desire it to have only one active parse thread at the start of
+each sentence, we simply tell our lexical analyzer to suspend two
+newlines every time it sees a newline on the input stream.  This
+insures that the second rule will always apply right after the first.
+We then insert iiisolate directives for both it and the one error
+production:
+
+	sentences  :  sentences sentence '\n'
+			{ print_parse(arg2) }
+		   |  sentences '\n'
+			{ iiisolate }
+		   |  error '\n'
+			{ iiisolate; iierrok }
+		   |  epsilon
+
+The effect here is to allow multiple parsers to be generated only
+while parsing "sentence".  The iiisolate directive, in other words,
+sees to it that no sentence parse will ever begin with multiple active
+parsers.  As with LR mode, iierrok clears the error flag for the
+(current) parser.
+
+	Note that if you use iiisolate in action code that suspends
+multiple results, iiisolate will clobber all parsers but the one
+generated by the next suspension.
+
+	If there is no need for close control over the details of the
+parser array, and you wish only to clobber parsers that end up doing
+the same thing as some other parser (and hence returning identical
+values), then just make sure you add "$define AUTO_PRUNE" to the
+pass-through code section at the top of the file.  Put differently,
+defining AUTO_PRUNE instructs the quasi-GLR parser to weed out parsers
+that are in the same state, and which have identical value stacks.
+AUTO_PRUNE can often be used in place of iiisolate in situations like
+the one discussed just above.  Its only drawback is that it slows
+the parser a bit.
+
+	Other than these deviations (action code and iiparse becoming
+generators, IIERROR's altered behavior, and the addition of iiprune,
+iiisolate, and AUTO_PRUNE), Ibpag2's quasi-GLR mode - at least on the
+surface - works pretty much like its straight LR mode.  In fact, if
+you take one of your SLR(1) grammars, and run it through Ibpag2 using
+the -a option, you probably won't notice any difference in the
+resulting automaton unless you do some debugging or perform some
+timing tests (the GLR parser is slower, though for straight SLR(1)
+grammars not by much).  Even with non-SLR(1) grammars, the quasi-GLR
+parser will clip along merrily, using all the same sorts of rules,
+action code, and macros that you would typically use in LR mode!
+
+
+6.__Installing_Ibpag
+
+	If you are a UNIX user, or have a generic "make" utility, you
+are in luck.  Just edit Makefile.dist according to the directions
+given in that file, rename it as "makefile," then execute "make."
+Ibpag2 should be created automatically.  If everything goes smoothly,
+then "make install" (su-ing root, if both possible and necessary for
+correct installation of the iiparse.icn file).  Check with your system
+administrator if you are on a public system, and aren't sure what to
+do.
+
+	Please be sure to read the directions in the makefile
+carefully, and set DESTDIR and LIBDIR to the directory where you want
+the executable and parser file to reside.  Also, make sure the paths
+you specify are correct for your Icon executables.  Although Ibpag2
+will apparently compile using iconc, I would recommend using the
+interpreter, icont, first, unless you are planning on working with a
+large grammar.
+
+	If you are using some other system - one that lacks "make" -
+then shame on your manufacturer :-).  You'll be a bit inconvenienced.
+Try typing:
+
+	icont -o ibpag2 follow.icn ibpag2.icn ibreader.icn \
+		 ibtokens.icn ibutil.icn ibwriter.icn iohno.icn \
+		 outbits.icn slritems.icn slrtbls.icn shrnktbl.icn \
+		 version.icn slshupto.icn
+
+The backslashes merely indicate that the next line is a continuation.
+The whole thing should, in other words, be on a single line.  As noted
+above, you may compile rather than interpret - if your OS supports the
+Icon compiler.  Just replace "icont" above with "iconc."  The
+resulting executable will run considerably faster than with "icont,"
+although the time required to compile it may be large, and the (still
+somewhat experimental) compiler may not work smoothly in all
+environments.
+
+	If your operating system support environment variables, and
+you have set up your LPATH according to the specifications in the Icon
+distribution (see below), then you may copy iiparse.lib and
+iiglrpar.lib to some file in your LPATH.  If you do not do this, or if
+your OS does not support environment variables, then you must be in
+the directory where you keep your Ibpag2 files when you use it, or
+else invoke Ibpag2 with the -p dirname option (where dirname is the
+directory that holds the iiparse.lib and iiglrpar.lib files that come
+with the Ibpag2 distribution).  The .lib files contain template
+parsers that are critical to Ibpag2's operation.  Ibpag2 will abort if
+it cannot find them.
+
+	If your operating system permits the creation of macros or
+batch files, it might be useful to create one that changes
+automatically to the Ibpag2 source directory, and runs the executable.
+This has the side-benefit of making it easier for Ibapg2 to find the
+parser library files, iiparse.lib and iiglrpar.lib.  Under DOS, for
+instance, one might create a batch file that says:
+
+	c:
+	cd c:\ibpag2
+	iconx ibpag2 %1 %2 %3 %4 %5 %6 %7 %8 %9
+
+DOS, it turns out, has to execute Icon files indirectly through iconx,
+so this technique has yet another advantage in that it hides the
+second level of indirection - although it prevents you from using
+input and output redirection.  Naturally, the above example assumes
+that Ibpag2 is in c:\ibpag2.
+
+	Ibpag2 assumes the existence on your system, not only of an
+Icon interpreter or compiler, but also of an up-to-date Icon Program
+Library.  There are several routines included in the IPL that Bibleref
+uses.  Make sure you (or the local system administrators) have put the
+IPL online, and have translated the appropriate object modules.  Set
+your IPATH environment variable to point to the place where the object
+modules reside.  Set LPATH to point to the modules' source files.
+Both IPATH and LPATH are documented in doc directory of the Icon
+source tree (ipd224.doc).  If your system does not support environment
+variables, copy ximage.icn, options.icn, ebcdic.icn, and escape.icn
+from the IPL into the Ibpag2 source directory, and compile them in
+with the rest of the Ibpag2 source files, either by adding them to the
+SRC variable in the makefile, or by adding them manually to the "icont
+-o ..." command line given above.
+
+	If you have any problems installing or using Ibpag2, please
+feel free to drop me, Richard Goerwitz, an e-mail message at
+goer@midway.uchicago.edu, or (via the post) at:
+
+	5410 S. Ridgewood Ct., 2E
+	Chicago, IL   60615
+
+
+6.__Bibliography
+
+1.  Aho, Alfred V., Sethi, Ravi, and Ullman, Jeffrey D.  Compilers.
+    Addison-Wesley: Reading, Massachusetts, second printing, 1988.
+
+2.  Griswold, Ralph E. and Griswold, Madge T.  The Icon Programming
+    Language. Prentice-Hall, Inc.: Englewood Cliffs, New Jersey, USA,
+    second edition, 1990.
+
+3.  Griswold, Ralph E., Jeffery, Clinton L., and Townsend, Gregg M.
+    Version 8.10 of the Icon Programming Language.  Univ. of Arizona
+    Icon Project Document 212, 1993.  (obtain via anonymous FTP from
+    cs.arizona.edu ~ftp/icon/docs/ipd212.doc)
+
+4.  Tomita, Masaru.  Efficient Parsing for Natural Language.  Boston:
+    Kluwer Academic Publishers, c. 1985.
+
+5.  Tomita, Masaru editor.  Generalized LR Parsing.  Boston:  Kluwer
+    Academic Publishers, 1991.