summaryrefslogtreecommitdiff
path: root/ipl/packs/ibpag2/README
blob: 0accdddfead3cbedfff43dc6b859f07135b3e174 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084






		       A User's Manual for Ibpag2
	       (Icon-Based Parser Generation System 2)
			     Version 1.2

				- or -

	       How to Use an LR-based Parser Generator


		       Richard L. Goerwitz, III
			University of Chicago






1.__What_is_Ibpag2?

	Ibpag2 is a so-called "parser generator," i.e. a tool for
automating the process of generating a recognizer and/or parser from
abstract structural descriptions of an input language.  Put in more
practical terms, Ibpag2 is a piece of software that a) reads a source
file containing a grammar that defines an input language, and then b)
outputs an automaton that recognizes that language.  The user may, at
his or her option, specify actions this automaton should take when it
sees various substructures within its input language.  By default,
however, the parser simply recognizes a given sequence as belonging,
or not, to that language.

	Ibpag2 utilizes so-called "LR" table generation and parsing
algorithms.  These algorithms facilitate construction of reasonably
fast deterministic pushdown automata that are powerful enough to
handle most commonly used programming language constructs.  LR-based
systems come in three main flavors: SLR(1), LALR(1), and LR(1).  The
LR(1) flavor is fairly easy to implement, but uses too many resources
to be practical.  LALR(1) algorithms are harder to implement, but much
faster, and the parse tables they construct use considerably less
memory than do those of their LR(1) counterparts.  SLR(1) algorithms
are the easiest to implement, compile the fastest, and use about as
much memory as LALR(1)s.  SLR(1) is the least powerful of the three,
though, so there is a tradeoff.  Ibpag2 is an "enhanced" SLR(1) parser
generator.  It is enhanced in the sense that it can operate both in
its native SLR(1) mode, and in a more powerful "quasi-GLR" mode (on
which, see section 5 below).

	As its full title ("Icon-Based Parser Generator 2") implies,
Ibpag2 is written in Icon [2,3], as are the automata it creates.
Ibpag2 has been tested with Icon version 8.10.  So far I have only run
it on an i386 box running Xenix 2.3.3, and on a Sun 4 running some
version of SunOS.  I have many reports, though, of it running under
other UNIX variants.  It will probably also run under other operating
systems, though modifications will in some instances be required.
Using Ibpag2 under MS-DOS may not be possible, on account of the way
it manages memory.

	The Ibpag2 distribution adheres to de facto UNIX installation
standards: Just set the appropriate variables in the makefile, and
then "make install."  For those who are using a non-UNIX system, or
who have not installed such a package before, there is a section at
the end entitled "Installing Ibpag2" that details the installation
procedure (section 6).

	Aside from the above-mentioned installation section (6), the
remainder of this document aims to provide the reader a) with a
simple, practical explanation of what LR-family parser generators are
and how they work (section 2), and b) with a set of directions
specifically on how to use Ibpag2 (section 3).  There is also an
advanced section on debugging (4), and one on using Ibpag2 with non-LR
and/or ambiguous languages (5).  The discussion is geared for those
that have little or no experience in parsing or automaton theory.  For
very advanced reading, consult the bibliography.  For a brief summary
of Ibpag's command-line options, see the main Ibpag2 source file,
ibpag2.icn, or invoke ibpag2 with the -h (help) option.

	In general, be warned that Ibpag2 works best with small or
medium-sized grammars.  Its parse tables have to be reconstructed at
run-time, and the code for doing this can become a bit cumbersome for
grammars with more than 100 rules and fifty or so terminal symbols.  I
myself have processed grammars with as many as 300 terminals and 400
rules.  Although the resulting automata run well enough, the output
files are over 300k, and Ibpag2 takes a long time to create them.  If
you must use Ibpag2 with a very large grammar symbols, try the -c
command-line option (which produces compressed parse tables).  This
option is discussed below, in section 4.  Compiling (rather than
interpreting) Ibpag2 may result in much faster processing, as will
resetting your BLOCKSIZE and STRSIZE environment variables.  See the
installation section (6) below on using the Icon compiler to create
the Ibpag2 executable.  Good starting values for BLOCKSIZE and STRSIZE
are triple their default values (i.e. 3 x 65000).  These variables are
discussed in the Icon manual page.

	My ultimate aim in writing this document has been to make
accessible to the non-CS portion of the Icon community what for them
might seem an inaccessible branch of applied parsing and automaton
theory.  I am a philologist myself, and feel that there is a great
deal that can and ought to be done to make advanced tools accessible
to people with other interests than twiddling bits or pondering the
true meaning of epsilon closures :-).

	Any comments on the Ibpag2 system itself or its documentation
will be gratefully received.  Write to me at the address appended to
the final section (6).


2.__What_is_an_LR_Parser_Generator?

	Back in the late 50s and 60s, linguists, mathematicians, and
software engineers all became intensely interested in the formal
properties of languages: Can they be described as a series of logical
structures and relations?  Can computers recognize and manipulate
these structures efficiently?  Linguists, in particular, quickly
realized that the amount of structural complexity, ambiguity, and pure
noise in natural language would render it computationally intractable,
especially given the limited memory/throughput of then available CPUs.
Mathematicians and engineers, however, found that many of the
formalized notations they dealt with could, in fact, be (re)designed
in such a way that efficient computer processing could - at least in
principle - be achieved.

	Principle, in this case, did not squarely meet reality until
viable parser generation tools came into being.  Parser generation
tools map an abstract structural description of a formal notation or
"language" to working computer code.  Ideally, the designer simply
makes assertions like:

	an expression is composed of either
	    1) a term (e.g. 10), or
	    2) an expression, a "+" or "-", and another expression

Parser generator systems translate these assertions (the "grammar")
into a machine, i.e. automaton, that can recognize and/or manipulate
input streams that conform to the "language" so described.

	Let me dwell, for a moment, on the toy expression grammar
offered above.  Note that it describes a set of simple mathematical
constructs like:

	9
	9 + 3
	9 + 3 - 8

According to the specifications given above, the nine, three, and
eight alone constitute terms - which are also expressions (via rule
1).  Because these terms are also expressions, "9 + 3" can be reduced
to a larger expression by rule 2.  The same is true for "9 + 3 - 8,"
except that there rule 2 must apply twice - once for "9 + 3," and then
again for that and the remainder of the line - in effect grouping the
expressions as ( ( (9) + (3) ) - (8) ).  It is also possible to group
the expression ( (9) + ( (3) - (8) ) ), although for the discussion
that immediately follows this second grouping will be ignored (see
below on the terms "precedence" and "associativity").

	If we add actions to the above grammar specification, we can
create a calculator-like automaton.  Traditionally, LR-family automata
(like the ones Ibpag2 creates) contain a parser, one or more stacks,
and a set of action tables.  The parser reads from an input stream
segmented into "tokens" (e.g. TERM, '+', '-'), and then manipulates
its stacks according to directives contained in so-called "action" and
"goto" tables.  As it reads the input stream, the parser matches rules
with action code specified by the programmer, e.g. rule 2 above might
be matched with code that added/subtracted the expressions on either
side of the '+'/'-' operator, and produced (in calculator style) the
result.  Alternatively, it might be matched with code that generated
an equivalent construct in another language.

	In the case of our toy expression grammar above, the
corresponding LR automaton operates as follows.  Omitting and/or
simplifying some of the inner details, it first looks at the input
stream to see what the next token is.  If the next token is an
operator or end-of-input, it checks the top of its stack.  If the top
of the stack has a term on it, that term is popped off, and pushed
back on, this time renamed as an expression (rule 1 above).  The input
token is then shifted from the input stream onto the stack, unless it
is the end-of-input token, in which case the parser returns with a
result.  If the top of the stack has an expression on it (rather than
a term), the parser pops the top three elements off of the stack, and
then either subtracts the third element from the first or adds the two
together, depending on whether the second element down was the
addition or subtraction operator, and the result is pushed onto the
stack as yet another expression.

	Even in this much-simplified form, the automaton's structure
is complex.  Let us look briefly, therefore, at a practical example of
its actual workings.  If we were to feed it "9 + 3 + 8," our
calculator would take the following actions:

	1) read the 9, and push it onto the stack as a term
	2) see a plus sign on the input stream
	3) pop the term (9) off of the stack and push it back on again
	   (this time calling it an expression)
	4) push the plus sign onto the stack
	5) read the 3, and push it onto the stack as a term
	6) see a minus sign on the input stream
	7) pop the 3 off of the stack and push it back on again (this
	   time calling it an expression)
	8) see a minus sign still waiting on the input stream
	9) pop 9, +, and 3 off of the stack, apply the plus operator
	   to 9 and 3, then push the result onto the stack again a
	   single expression (the stack now has 12 on top)
	10) read the minus sign, and push it onto the stack
	11) read the 8, and push it onto the stack as a term
	12) see the end of input coming up on the input stream
	13) pop the 8 off of the stack and push it back on again as an
	   expression
	14) see the end-of-input token still sitting on the input
	   stream 
	15) pop 12, -, and 8 off of the stack, apply the minus operator
	   to 12 and 8, then push the result onto the stack again (the
	   stack now has 4 on top)
	16) return the "answer" (i.e. 4)

	This series of actions is hard to describe, and even more so
to model as part of a hand-written computer program.  And, even if
such a program were written by hand, this program would have to be
modified, at times radically, every time the grammar it assumes was
augmented or changed.  What I am leading up to is that, with a parser
generator, the hand compilation stage can be eliminated by allowing
the programmer simply to declare his/her tokens and language specs,
then have the appropriate automaton constructed with little, or no,
human intervention.  This is why parser generation tools were critical
to the development of not just theoretically feasible, but truly
*practical*, LR-based computer language design systems.


3.__Using_Ibpag2

	To recode the above toy expression grammar in
Ibpag2-compatible format is relatively simple, especially if we omit
the actions initially, and concentrate on simple recognition.  We need
only a set of token declarations and three rules.  Certain
modifications will have to be made to the token declarations later on.
For general illustration's sake, however, the following will suffice:

	%token TERM, '+', '-'
	%%
	expression : TERM
	expression : expression, '+', expression
	expression : expression, '-', expression

TERM, and the addition and subtraction operators, are the tokens (i.e.
the terminals symbols out of which the grammar is constructed - the
things that the input stream is segmented into).  Note the %token
keyword used to declare them.  The colon means "is composed of."  The
double percent sign separates token declarations from the grammar
proper.

	Adding in our actions - which above were keyed to a complex
set of decisions based on input tokens and stack conditions - requires
just a few extra lines of Ibpag2 action code, set off in curly braces:

	%token TERM, '+', '-'
	%%
	expression : TERM { return arg1 }
	expression : expression, '+', expression { return arg1 + arg3 }
	expression : expression, '-', expression { return arg1 - arg3 }

Using a "|" shorthand for repeated left-hand sides of rules, we may
reformat this as:

	%token TERM, '+', '-'
	%%
	expression : TERM { return arg1 }
		   | expression, '+', expression { return arg1 + arg3 }
		   | expression, '-', expression { return arg1 - arg3 }

	ArgX above refers to the Xth element of the right-hand side of
the preceding rule.  So, for example, arg1 in "{ return arg1 }" above
refers to TERM - the only right-hand side element of the first rule.
The action "{ return arg1 }" means, "once you find a TERM and have
renamed it as an expression, use the value of TERM as the value for
that expression."  By way of contrast, the action "{ return arg1 +
arg3 }" means, in conjunction with the rule it follows: "When you find
an expression consisting of a sub-expression, a plus operator, and
another sub-expression, use the value of sub-expression 1 + the value
of sub-expression 2 as the value for the expression as a whole."
Technically, the action "{ return arg1 }" for expression : TERM is not
necessary, since the Ibpag2 parser, by default, pushes the value of
the last RHS arg onto the stack.  For epsilon productions (to be
discussed below), it pushes &null.

	One serious problem with this set of specifications is that
the operators '-' and '+' are left associative.  We humans take this
for granted, because correct algebraic grouping is something our
high-school math teachers burned into us.  The computer, though, has
to be told, pedantically, how to group addition and subtraction
expressions.  It has to be explicitly instructed, in other words, to
group expressions like "9 + 32 - 4" as (9 + 32) - 4.  Without
instructions of this kind, the parser does not know, after it has read
"9 + 32" and is looking at a minus sign, whether to shift the minus
sign onto the stack, and eventually try to group as 9 + (32 - 4), or
to reduce "9 + 32" to an expression and group as (9 + 32) - 4.
Although in this case the grouping may not seem to matter, it
sometimes does.  Some operators group right to left.  The unary minus
sign, for example, is one such operator (--4 groups as (- (- 4))).  To
include the unary minus sign in our grammar, we might append yet
another rule:

	%token TERM
	%left '+', '-'
	%right '-'
	%%
	expression : TERM { return arg1 }
		   | expression, '+', expression { return arg1 + arg3 }
		   | expression, '-', expression { return arg1 - arg3 }
		   | '-', expression { return - arg2 }

The trouble with this arrangement is that the minus sign was already
declared as left associative.  To get around the conflict we use a
"dummy" token declaration, and a %prec declaration in the applicable
rule:

	%token TERM
	%left '+', '-'
	%right UMINUS
	%%
	expression : TERM { return arg1 }
		   | expression, '+', expression { return arg1 + arg3 }
		   | expression, '-', expression { return arg1 - arg3 }
		   | '-', expression %prec UMINUS { return - arg2 }

The %prec declaration simply tells the parser that, even though the
rule contains a '-' operator, the rule should be handled as if the
operator were UMINUS.  UMINUS is not actually used as a symbol in the
right-hand side of any rule (hence the designation "dummy").  It is
there simply to make the last rule behave as if the minus sign in the
last rule were different than in the second-to-last rule.

	Let us now add in multiplication and division operators to our
calculator specifications, and see what happens.  Let me reiterate
here that the action "{ return arg1 }" for rule 1 (expression : TERM)
is not strictly necessary, since the default is to push the last RHS
arg onto the value stack:

	%token TERM
	%left '+', '-'
	%left '*', '/'
	%right UMINUS
	%%
	expression : TERM { return arg1 }
		   | expression, '+', expression { return arg1 + arg3 }
		   | expression, '-', expression { return arg1 - arg3 }
		   | expression, '*', expression { return arg1 * arg3 }
		   | expression, '/', expression { return arg1 / arg3 }
		   | '-', expression %prec UMINUS { return - arg2 }

Note that the multiplication and division operators were defined
*after* the addition and subtraction operators.  The reason for this
is that, technically speaking, the grammar itself is ambiguous.  If we
treat all operators identically, the parser will not be able to tell
whether "9 + 1 * 3" should be parsed as (9 + 1) * 3 or as 9 + (1 * 3).
As we all know from our high-school algebra, multiplication has a
higher precedence than addition.  You do the multiplications before
the additions, in other words, no matter where they occur.  To tell
the parser to behave in this same manner, we declare '*' after '+'.
Note that, despite their higher priority, the '*' and '/' operators
are still left associative.  Hence, given "3 / 4 * 7," the parser will
group its input as (3 / 4) * 7.  As a brain teaser, try to figure out
how the parser might group the input "9 + 3 / 4 * 7."  Remember that
higher-precedence rules get done first, but that same-precedence rules
get done according to associativity.

	The only fundamental problem remaining with the above grammar
is that it assumes that the end of the input coincides with the end of
the line.  Is it possible to redefine the language described as
consisting of arbitrary many lines?  The answer to this question is
"yes."  One can simply add another set of productions to the grammar
that state, essentially, that the input language consists of lines
made up of an expression and a carriage return or of nothing.  Nothing
is indicated by the keyword epsilon.  Note that only the first rule
has an action field:

	lines	: lines, expression, '\n'	{ write(arg2) }
		| lines, '\n'
		| epsilon

This rule-series may seem rather abstruse, but it becomes a bit
clearer when you think about what happens on actual input.  If there
is no input (epsilon), nothing gets printed, because lines : epsilon
has no action field.  If the parser sees an expression and a newline,
the parser takes this as an instance of epsilon, plus an expression,
plus a newline.  This, then, becomes the first component of rule 1 if
another expression + newline follows, or of rule two if just a newline
occurs.  Every time an instance of rule 1 occurs, the action "{
write(arg2) }" is executed, i.e. the value of the expression gets
printed.  If this still seems hard to fathom, try walking through
step-by-step.  Even experienced hands may find these sorts of rules
difficult to construct and debug.

	Note that "lines" is now the so-called "start symbol" of our
grammar.  It is, in other words, the goal of every parse.  By default
the left-hand side symbol of the first rule is the start symbol.  This
may be overridden with a %start declaration in the tokens section (on
which, see the sample Ibpag2 input file below).

	With our new, multi-line start symbol in place, the only piece
that needs to be added, in order to make our calculator specification
a full working input to Ibpag2, is a tokenizer.  A tokenizer is a
routine that reads input from a file or from some other stream (e.g.
the user's console), and then segments this input into tokens that its
parser can understand.  In some cases, the tokens must be accompanied
by a literal value.  For example, if we encounter a TERM, we return
TERM, just as it is listed in the %token declaration.  But what is the
literal value of a TERM token?  It could be, for example, 9, or 5, or
700.  The tokenizer returns the symbol TERM, in this case, but then
records that TERM's actual value by setting some global variable.  In
Ibpag2's parser, this variable is assumed to be "iilval."  In the
tokenizer, therefore, one might write

	iilval := (literal value)
	suspend TERM

For literal operators like '+' and '*', there is no need to set
iilval, since their literal value is irrelevant.  One simply returns
these as integers (usually via "suspend ord(c)").

	The tokenizer routine is normally appended to the grammar
after another double percent sign.  Everything after this second
double percent sign is copied literally to the output file.
Alternatively, the tokenizer can be $included via Icon's preprocessor.
Ibpag2 demands that the tokenizer be called iilex, and that it take a
single file argument, that it be a generator, and that it fail when it
reaches end-of-input.  Combined with our "lines" productions, the
addition of an iilex routine to our calculator grammar yields the
following Ibpag2 input file:

	%token TERM
	%left '+', '-'
	%left '*', '/'
	%right UMINUS

	%start lines

	%%

	expression : TERM { return arg1 }
		   | expression, '+', expression { return arg1 + arg3 }
		   | expression, '-', expression { return arg1 - arg3 }
		   | expression, '*', expression { return arg1 * arg3 }
		   | expression, '/', expression { return arg1 / arg3 }
		   | '-', expression %prec UMINUS { return - arg2 }

	lines	   : lines, expression, '\n'	{ write(arg2) }
		   | lines, '\n'
		   | epsilon

	%%

	procedure iilex(infile)

	    local nextchar, c, num

	    nextchar := create !(!infile || "\n" || "\n")
	    c := @nextchar | fail

	    repeat {
		if any(&digits, c) then {
		    if not (\num ||:= c) then
			num := c
		} else {
		    if iilval := \num then {
			suspend TERM
			num := &null
		    }
		    if any('+-*/()\n', c) then {
			iilval := c
			suspend ord(c)
		    } else {
			if not any(' \t', c) then {
			    # deliberate error - will be handled later
			    suspend &null
			}
		    }
		}
		c := @nextchar | break
	    }
	    if iilval := \num then {
		return TERM
		num := &null
	    }

	end

	procedure main()
	    return iiparse(&input, 1)
	end

As noted above, the tokenizer (iilex) must be a generator.  It must
suspend integers either directly (e.g. ord(c)), or else via symbolic
defines like TERM, created by Ibpag2 on the basis of %token, %right,
%left, and %nonassoc declarations.  The tokenizer must fail on end of
input.

	If you like, cut the above code out, place it in a temporary
file, tmp.ibp, and then feed this file to Ibpag2 by typing "ibpag2 -f
tmp.ibp -o tmp.icn."  If your system supports input and output
redirection, type: "ibpag2 < tmp.ibp > tmp.icn."  Ibpag2 will turn
your grammar specifications and actions into a routine called iiparse.
If you look above, you will see that I appended a main procedure that,
in fact, calls iiparse().  Iiparse() takes two arguments: 1) an input
stream, and 2) a switch that, if nonnull, tells the parser to fail
rather than abort on unrecoverable errors.  When Ibpag2 is finished
creating its output file (tmp.icn above), compile that file the way
you would compile any other Icon program (e.g. "icont tmp").  Finally,
run the executable.  You should be able to type in various simple
arithmetic expressions and have the program spit back answers each
time you hit a return.  The only problem you might encounter is that
the parser aborts on erroneous input.

	The issue of erroneous input brings up yet another point of
general Ibpag2 usage.  Normally, if one is processing input, one does
not want to abort on errors, but rather just emit an error message,
and to continue processing - if this is at all possible.  To do this,
Ibpag2 provides a simple but fairly effective mechanism: A reserved
"error" token.

	When Ibpag2 encounters an error, it will remove symbols from
its stack until it has backtracked to a point where the error token is
legal.  It then shifts the error token onto the stack, and tries to
re-start the token stream at the point where it left off, discarding
tokens if necessary in order to get itself resynchronized.  The parser
considers itself resynchronized when it has successfully read and
shifted three tokens after shifting the error token.  Until then it
remains in an error state, and will not output additional error
messages as it discards tokens.

	This explanation may sound a bit abstruse, but in practice it
is turns out to be quite simple.  To implement error handling for our
calculator, we really have to add only one production to the end of
the "lines" section:

	lines	   : lines, expression, '\n'	{ write(arg2) }
		   | lines, '\n'
		   | epsilon
		   | error, '\n'	{
					  write("syntax error; try again:")
					  iierrok
					}

Given the above grammar, the parser will handle errors as follows: If
an error occurs (say it has an expression then an operator on its
stack and sees a newline on the input stream) the parser will throw
out the operator, then check if the error token would be OK in this
state (which it would not).  Then it would throw out the expression.
At this point, the stack is in the ready-to-read-a-lines state - the
state it was in before it read the last expression.  Since "lines" may
consist of error and '\n,' the error token is legal here, and so the
parser pushes error onto the stack, then looks back at the input
stream (where a newline is still waiting).  Since the newline now
completes the rule lines : error, '\n', the parser pushes the newline
onto its stack, then executes the action associated with this
production, i.e. it writes "syntax error; try again:" to the console,
prompting the user for additional input.

	The keyword "iierrok" in the above error production's action
field is there for a subtle, but important, reason: It tells the
parser to consider itself resynchronized, even if three tokens have
not yet been shifted.  If iierrok were not in the action code for this
rule, and the user were to supply more bad input after the prompt,
then the parser would simply discard those tokens, without emitting
another error message.  Why?  Because, as you will recall, the parser
discards tokens after an error, in efforts to resynchronize itself.
Until it reads and shifts three tokens successfully, it considers
itself in an error state, and will not emit additional error messages.
The three-token resync rule is there to prevent a cascade of
irrelevant error messages touched off by a single error.  In our
calculator's case above, though, we are smarter than the parser.  We
know that it is resynchronized as soon as it reduces error, '\n' to
lines.  So if a syntax error occurs on the next token, it should be
reported.  Adding "iierrok" to the action insures that the parser will
do just this.

	In addition to iierrok, there are several other directives
Ibpag2 accepts as part of the action code segments.  These are as
follows:

	iiclearin		clear the current input token
	IIERROR			perform error recovery
	IIACCEPT		simulate an accept action

There are several other directives (all implemented as macros) that
Ibpag2 accepts in GLR mode.  For a discussion of GLR mode, see below,
section 5.  IIERROR in particular, and error recovery in general, work
a bit differently in that mode than they do in Ibpag2's normal (i.e.
LR) mode.

	There are admittedly many other topics that might be covered
here.  This treatment, however, is intended as a general nontechnical
introduction, and not as a complete textbook on parser generation use.
If you want to learn more about this topic, consult the bibliography.
Also, check the UNIX manual pages on the YACC utility (Yet Another
Compiler Compiler).  Ibpag's input format is fairly close (too close,
perhaps) to YACC's.  In fact, most of what is said about YACC in UNIX
documentation can be carried directly over to Ibpag2.  Several salient
differences, though, should be kept in mind:

        1) YACC's "$$ = x" constructs are replaced by "return x" (e.g.
           "$$ = $1 + $3" -> "return $1 + $3" [$1 is a synonym for
	   "arg1", $3 for "arg3", etc.])

        2) all variables within a given action are, by default, local
           to that action; i.e. they cannot be accessed by other
           actions unless you declare them global elsewhere (e.g. in
           the pass-through part of the declarations section %{ ...
           %})

        3) the %union and %type declarations/tags are not needed by
	   Ibpag2 (both for better and for worse)

        4) tokens and symbols are separated from each other by a comma
           in Ibpag2 files (e.g. %token '+', '-' and S : NP, VP)

        5) epsilon is indicated by the keyword "epsilon" (e.g. REL :
           epsilon), and not by an empty RHS

        6) both epsilon and error *may* be declared as %tokens for
           reasons of precedence, although they retain hard-coded
           internal values (-2 and -1, respectively)

        7) all actions must follow the last RHS symbol of the rule
           they apply to (preceded by an optional %prec directive); to
           achieve S : NP { action1 }, VP { action2 }, insert a dummy
           rule: S : NP, dummy, VP { action2 }; dummy : epsilon {
           action1 } ;

        8) YYERROR, YYACCEPT, yyclearin, and yyerrok are the same,
           except they are written IIERROR, IIACCEPT, iiclearin, and
           iierrok (i.e. "ii" replaces "yy")

        9) Ibpag2's input files are tokenized as modified Icon files,
           and, as a consequence, Icon's reserved words must not be
           used as symbols (e.g. "if : if, then" is no go)

I myself find YACC to be ugly.  As a result, Ibpag2 is not an exact
YACC clone.  I would like to underscore the fact that I have no
intention to move in this direction, either.  It's as YACC-like as
it's going to get!

	Both YACC and non-YACC users should note number 9 in the above
list.  Don't use things like "while," "every," "do," etc. as symbols
in your grammar!  Just use the same rules for Ibpag2 nonterminals as
for Icon variables, and you'll be OK.

	For those that just can't bear using anything but a strictly
YACC-conformant system, I've included a preprocessor with the Ibpag2
distribution called (at one user's recommendation) "iacc."  Iacc reads
&input - assumed to be a YACCish grammar - and sends to &output an
Ibpag2-conformant file.  I have not tested this file extensively, and
there are likely to be bugs in the way I've handled the necessary 2
token lookaheads and value stack references.  Give it a whirl, though,
if you are feeling adventurous.  The only reason I personally use Iacc
is that some YACCs (e.g. BSD YACC) have particularly nice debugging
messages and help.  If my grammar is particularly complex, I just run
it through YACC without action code first, then use Iacc to convert it
to Ibpag2 format.  Iacc's output, as I noted, is not meant to be
pretty, so I invariably end up doing a little editing - usually just
respacing a few rules, and re-inserting any comments that I might have
put in the original YACC file.

	In general, Ibpag2 (like YACC) handles epsilon moves and
indirect cycles.  LR-mode shift-reduce conflicts are also handled in
the normal way (i.e. pick the rule with the highest priority, and, in
cases where the priority is the same, check the associativities).  In
contrast to YACC, Ibpag2 flags reduce/reduce conflicts as errors
(since these often conceal deeper precedence problems and just plain
kludges).  Reduce/reduce conflict errors are easily enough remedied,
if need be, via (dummy) precedences.  One can convert these errors to
warnings by specifying -y on the command line.  With the -y option,
reduce/reduce conflicts are resolved in favor of the rule that occurs
first in the grammar.  The -y switch also prevents Ibpag2 from
aborting on shift/reduce conflicts, telling it instead to resolve in
favor of shift.  Basically, -y is a partial YACC compatibility switch.
Normally (i.e. in SLR mode) Ibpag2 is much more finicky than YACC
about conflicts in its grammars.

	Also in contrast to YACC, Ibpag2 supports multiple
simultaneous parsers.  Ibpag2 normally names its main parser routine
iiparse().  By using the -m command-line option, however, you can
override this default behavior, and force Ibpag2 to augment this name
in some uniquely identifiable fashion.  For example, "ibpag2 -m _1 <
tmp.ibp > tmp.icn" will force Ibpag2 to write a parser called
"iiparse_1" to tmp.icn.  Note that, instead of calling iilex, this
iiparse_1() routine will now call iilex_1, and all necessary global
variables will have _1 appended to them (e.g. errors will become
errors_1).  I don't expect that many people will have occasion to use
this feature.  It is there, though, for those that want it.


4.__Debugging

	Constructing and debugging LR(1) family parsers can sometimes
be hair raising, even with a parser generator.  Several precautions
can be taken, however, to minimize the agony.  The first is to declare
all tokens initially as part of a single %token declaration, i.e. with
no precedences, and with the same associativities.  Also, leave out
action code until the grammar seems to be working.  In this stage, you
can even run the grammar through (BSD)YACC or GNU Bison.  All you
would need to do is remove the commas between tokens and symbols, and
place a semicolon at the end of every rule.  During this and all
debugging stages, supply Ibpag2 with a -v command-line switch.  This
will cause Ibpag2 to write a summary of rules, tokens, and its two
state tables to "ibpag2.output" (a bit like GNU Bison, but with a
hard-coded name).  If you get messages about conflicts in your parse
tables (e.g.  "unresolvable reduce/reduce conflict, state 5, token
257, rules 4,5").  This file will tell you what rules these are, and
what token number 257 is.  Use precedences and associativities to
clear these problems up as they arise.  If you are comfortable having
reduce/reduce errors resolved by the order in which the conflicting
rules occur, then use the -y command-line switch.  With -y on the
command line, Ibpag2 will always resolve in favor of the earlier rule.
This option will also cause it to resolve all shift/reduce conflicts
in favor of shift.

	There are certain languages that are not ambiguous that SLR(1)
parsers like Ibpag2 will fail to produce an unambiguous parse table
for.  The classic example is

	expr : lval, '=', rval | rval
	lval : '*', rval       | ID
	rval : lval

C programmers will recognize this as a toy expression grammar with
code for identifiers, assignments, and pointers.  The problem is that
if we feed this grammar to Ibpag2, it will claim that there is a
conflict on lookahead '='.  In truth, there is no ambiguity.  The SLR
parser simply doesn't remember the pathway the parser used to get to
the state it is in when it sees '=' on the input stream.  Whether the
parser gets into this state by seeing '*' plus and ID, or by seeing
just an ID, it knows to turn the ID into an lval.  Then it knows to
turn lval into rval.  At this point, though, it doesn't know whether
to shift the = sign via rule 1, or to turn rval and the preceding '*'
into an lval.  The parser has "forgotten" that the '*' is there
waiting on level down on the stack!

	The solution to this problem is actually quite simple (at
least in concept).  Just provide a unique pathway in the grammar for
the conflicting rules.  In this case, they are rules 1 and 5 (the
first and last):

	expr : lval, '=', rval | rval
	lval : '*', pval | ID
	pval : lval
	rval : lval

Now when the parser sees '*,' it can only have a pval after it.  Never
mind that pval is composed of precisely the same things as rval.  The
point is that the parser generator follows a different route after
seeing '*' than if it starts with ID and no preceding '*'.  Hence it
"remembers" that that the '*' is back on the stack, waiting for the
"lval : '*', pval" rule to apply.  There is no more conflict.

	Go ahead and run these grammars through Ibpag2 if you aren't
sure what is going on.  Remember to declare ID as a token, and to
place "%%" in the appropriate spot!

	If you get your parser up and running, but find that it is not
functioning quite the way you expect, add the following line somewhere
near the start of Ibpag2's output file:

	$define IIDEBUG

If you like, you can add it to the beginning of your Ibpag2 input
file.  Place it in the declarations section (before the first double
percent sign), and surround it by %{ and %}, e.g.:

	%{
	$define IIDEBUG
	%}

This tells Ibpag2 to send $define IIDEBUG straight through to the
output file.

	What defining IIDEBUG does is tell iiparse, once compiled, to
emit profuse debugging messages about the parser's actions, and about
the state of its stacks.  This display will not make a whole lot of
sense to anyone who doesn't understand LR-family parsers, so those who
want to access this feature should perhaps go through a standard
reference like Aho, Sethi, and Ullman [1].

	If, after you are finished debugging your grammar, you find
that Ibpag2's output files are rather large, you may try saving space
by compressing the action and goto tables.  This is accomplished by
invoking Ibpag2 with the -c (compress) option.  Using this option
makes debugging difficult, and makes the parser run a bit more slowly.
It also only works for rather large grammars with long nonterminal
symbol names.  Don't even consider it until the grammar is thoroughly
debugged and you have determined that the output file's size is just
too great for practical use.  Even then, compression may or may not
help, depending on how long your nonterminal names are.  In general,
Ibpag2 is best as a teaching tool, or as a production system for
medium or small grammars.


5.__Using_Ibpag2_with_Non-LR_Grammars

	There may be times when you *want* to parse languages that no
LR-based algorithm can handle.  There may be times, that is, when the
grammar you want to use contains conflicts or ambiguities that are
there by design, and not by oversight.  For example, you may want to
parse a natural language.  Full-blown natural languages involve many
highly ambiguous constructs, and are not LR-parsable.  By invoking it
with the -a option, Ibpag2 can parse or recognize certain natural
languages, or, more practically speaking, certain NL subsets.  The
letter "a" in -a is supposed to stand for "ambiguous," although what
this option really does is put Ibpag2 into a quasi-GLR mode - i.e.
into a kind of "generalized" LR mode in which it can accept non-LR
grammars [4,5].

	User-visible changes to Ibpag2's operation in quasi-GLR mode
(i.e. with the -a option) are as follows:

	1) iiparse() is now a generator
	2) action code can use suspend as well as return
	3) IIERROR places the current thread in an error state (i.e.
	   it doesn't *necessarily* trigger error recovery; see below)
	4) there are two new action-code directives (iiprune and
           iiisolate) and a general define (AUTO_PRUNE)
	5) conflicts due to ambiguities in the grammar no longer
	   result in aborted processing (so, e.g., if you do not
	   specify the -y option on a grammar with reduce/reduce
	   conflicts, Ibpag2 will simply generate a parser capable of
	   producing multiple parses for the same input)

	In quasi-GLR mode, iiparse() should be invoked in a way that
will render multiple results usable, if they are available (e.g.
"every result := iiparse(&input) do...".  Action code is also allowed
to produce more than one value (i.e. to use suspend).  When it does
so, iiparse() creates separate parse threads for each value.  So, for
instance, if your action code for some production suspends both of the
following lists,

	["noun", "will", "gloss: desire"]
	["noun", "will", "gloss: legal document mandating how _
	                  one's possessions are to be disposed _
			  of after one's death"],

iiparse() would create two separate parse threads - one for each
result.  Note that in this case, the syntactic structure of each
thread is the same.  It is their semantics (i.e. the stuff on the
value stack) that differs.

	If you use the iierrok and iiclearin macros in your action
code before suspending any result, their affect persists through all
subseqent suspensions and resulting parse threads.  If you use these
macros after suspending one or more times, however, they are valid
only for the parse thread generated by the next suspension.  By way of
contrast, the IIERROR macro *always* flags only the next parse thread
as erroneous.  Likewise, IIACCEPT always simulates an accept action on
the next suspension only.  IIERROR and IIACCEPT, in other words, never
have any effect on subsequent suspensions and parse threads other than
the one that immediately follows them.  This is true of iierrok and
iiclearin only when used after the first suspension.

	In quasi-GLR mode, IIERROR (number three in the difference
list above) becomes a mechanism for placing the current parse thread
in error mode.  This is similar to, but not quite identical to, how
IIERROR functions in straight LR mode.  In quasi-GLR mode, if other
threads can carry on the parse without error the erroneous parse
thread is quietly clobbered.  Full-blown error recovery only occurs if
all of the other parsers halt as well.  This makes sense if you think
about it.  Why keep erroneous threads around when there are threads
still continuing a valid parse?  For some large interactive systems,
it might be necessary to keep bogus threads around longer, and weed
them out only after a lengthy grading process.  If you are
constructing a system such as this, you'll have to modify Ibpag2's
iiglrpar.lib file.  In particular, you'll need to change the segment
in iiparse() that takes out the trash, so to speak, in such a way that
it does so only if the error count in a given parser either rises
above a specific threshhold or else exceeds the number of errors in
the "most correct" parser by a certain amount.  This is not that hard
to do.  I just don't expect that most parsers people generate with
Ibpag2 will use IIERROR or error recovery in general in so involved a
fashion.

	Iiprune and iiisolate (number 4 above) are used to control the
growth of the parallel parser array.  In order to give straightforward
(read "implementationally trivial") support for action code, Ibpag2
cannot create a parse "forest" in the sense that a standard GLR parser
does.  Instead, it simply duplicates the current parser environment
whenever it encounters a conflict in its action table.  Even if the
conflict turns out to reflect only a local ambiguity, the parsers, by
default, remain separate.  Put differently, Ibpag2's quasi-GLR parser,
by default, makes no direct effort to reduce the size of its parser
arrays or to alter the essentially linear structure of their value and
state stacks.  Size reduction, where necessary and/or desirable, is up
to the programmer.  What the iiprune macro is there to do is to give
the programmer a way of pruning a given thread out of the active
parser list.  Iiisolate allows him or her to prune out every thread
*but* the current one.  AUTO_PRUNE makes the parser behave more like a
standard GLR parser, instructing it to prune parse threads that are
essentially duplicating another parse thread's efforts.  The parser,
though, does not build a parse tree per se, the way most GLR parsers
typically do, but rather manipulates its value stack like a
traditional LR-family parser.

	Iiprune is useful when, for example, the semantics (i.e. your
"action" code segments) determine that a given parse thread is no
longer viable, and you want to signal the syntactic analyzer not to
continue pursuing it.  The difference between iiprune and IIERROR is
that iiprune clobbers the current parser immediately.  IIERROR only
puts it into an error state.  If all active parsers end up in an error
state, and none can shift additional input symbols, then the IIERROR
macro induces error recovery.  Iiprune does not.  NB: iiprune, if used
in action code that suspends multiple results, cancels the current and
remaining results (i.e. it does not clobber parsers already spun off
by previous suspensions by invocation of that same code; it merely
cuts the result sequence).  Iiprune essentially stands in for "fail"
in this situation.  Fail itself can be used in the code, but be warned
that iiparse() will still push *at least one* value onto its value
stack, even if a given action code segment fails.  This keeps the
value stack in sync with the syntax.  To avoid confusion, I recommend
not using "fail" in any action code.

	Iiisolate is useful if, during error recovery, you prompt the
user interactively, or do something else that cannot be elegantly done
in parallel for two or more distinct parse threads.  Iiisolate allows
you to preserve only the the current parse thread, and to clobber the
rest.  Iiisolate can also be useful as a way of making sure that only
one thread carries on the parse in non-error situations.  Suppose that
we have a series of productions:

	sentences  :  sentences sentence '\n'
			{ print_parse(arg2) }
		   |  sentences '\n'
		   |  error '\n'
		   |  epsilon

If we get a sentence with more than one parse, all of the underlying
threads that produced these parses will be active for the next
sentence as well.  In many situations this will not be what we want.
If our desire it to have only one active parse thread at the start of
each sentence, we simply tell our lexical analyzer to suspend two
newlines every time it sees a newline on the input stream.  This
insures that the second rule will always apply right after the first.
We then insert iiisolate directives for both it and the one error
production:

	sentences  :  sentences sentence '\n'
			{ print_parse(arg2) }
		   |  sentences '\n'
			{ iiisolate }
		   |  error '\n'
			{ iiisolate; iierrok }
		   |  epsilon

The effect here is to allow multiple parsers to be generated only
while parsing "sentence".  The iiisolate directive, in other words,
sees to it that no sentence parse will ever begin with multiple active
parsers.  As with LR mode, iierrok clears the error flag for the
(current) parser.

	Note that if you use iiisolate in action code that suspends
multiple results, iiisolate will clobber all parsers but the one
generated by the next suspension.

	If there is no need for close control over the details of the
parser array, and you wish only to clobber parsers that end up doing
the same thing as some other parser (and hence returning identical
values), then just make sure you add "$define AUTO_PRUNE" to the
pass-through code section at the top of the file.  Put differently,
defining AUTO_PRUNE instructs the quasi-GLR parser to weed out parsers
that are in the same state, and which have identical value stacks.
AUTO_PRUNE can often be used in place of iiisolate in situations like
the one discussed just above.  Its only drawback is that it slows
the parser a bit.

	Other than these deviations (action code and iiparse becoming
generators, IIERROR's altered behavior, and the addition of iiprune,
iiisolate, and AUTO_PRUNE), Ibpag2's quasi-GLR mode - at least on the
surface - works pretty much like its straight LR mode.  In fact, if
you take one of your SLR(1) grammars, and run it through Ibpag2 using
the -a option, you probably won't notice any difference in the
resulting automaton unless you do some debugging or perform some
timing tests (the GLR parser is slower, though for straight SLR(1)
grammars not by much).  Even with non-SLR(1) grammars, the quasi-GLR
parser will clip along merrily, using all the same sorts of rules,
action code, and macros that you would typically use in LR mode!


6.__Installing_Ibpag

	If you are a UNIX user, or have a generic "make" utility, you
are in luck.  Just edit Makefile.dist according to the directions
given in that file, rename it as "makefile," then execute "make."
Ibpag2 should be created automatically.  If everything goes smoothly,
then "make install" (su-ing root, if both possible and necessary for
correct installation of the iiparse.icn file).  Check with your system
administrator if you are on a public system, and aren't sure what to
do.

	Please be sure to read the directions in the makefile
carefully, and set DESTDIR and LIBDIR to the directory where you want
the executable and parser file to reside.  Also, make sure the paths
you specify are correct for your Icon executables.

	If you are using some other system - one that lacks "make" -
then shame on your manufacturer :-).  You'll be a bit inconvenienced.
Try typing:

	icont -o ibpag2 follow.icn ibpag2.icn ibreader.icn \
		 ibtokens.icn ibutil.icn ibwriter.icn iohno.icn \
		 outbits.icn slritems.icn slrtbls.icn shrnktbl.icn \
		 version.icn slshupto.icn

The backslashes merely indicate that the next line is a continuation.
The whole thing should, in other words, be on a single line.

	If your operating system support environment variables, and
you have set up your LPATH according to the specifications in the Icon
distribution (see below), then you may copy iiparse.lib and
iiglrpar.lib to some file in your LPATH.  If you do not do this, or if
your OS does not support environment variables, then you must be in
the directory where you keep your Ibpag2 files when you use it, or
else invoke Ibpag2 with the -p dirname option (where dirname is the
directory that holds the iiparse.lib and iiglrpar.lib files that come
with the Ibpag2 distribution).  The .lib files contain template
parsers that are critical to Ibpag2's operation.  Ibpag2 will abort if
it cannot find them.

	If your operating system permits the creation of macros or
batch files, it might be useful to create one that changes
automatically to the Ibpag2 source directory, and runs the executable.
This has the side-benefit of making it easier for Ibapg2 to find the
parser library files, iiparse.lib and iiglrpar.lib.  Under DOS, for
instance, one might create a batch file that says:

	c:
	cd c:\ibpag2
	iconx ibpag2 %1 %2 %3 %4 %5 %6 %7 %8 %9

DOS, it turns out, has to execute Icon files indirectly through iconx,
so this technique has yet another advantage in that it hides the
second level of indirection - although it prevents you from using
input and output redirection.  Naturally, the above example assumes
that Ibpag2 is in c:\ibpag2.

	Ibpag2 assumes the existence on your system, not only of an
Icon interpreter, but also of an up-to-date Icon Program
Library.  There are several routines included in the IPL that Bibleref
uses.  Make sure you (or the local system administrators) have put the
IPL online, and have translated the appropriate object modules.  Set
your IPATH environment variable to point to the place where the object
modules reside.  Set LPATH to point to the modules' source files.
Both IPATH and LPATH are documented in doc directory of the Icon
source tree (ipd224.doc).  If your system does not support environment
variables, copy ximage.icn, options.icn, ebcdic.icn, and escape.icn
from the IPL into the Ibpag2 source directory, and compile them in
with the rest of the Ibpag2 source files, either by adding them to the
SRC variable in the makefile, or by adding them manually to the "icont
-o ..." command line given above.

	If you have any problems installing or using Ibpag2, please
feel free to drop me, Richard Goerwitz, an e-mail message at
goer@midway.uchicago.edu, or (via the post) at:

	5410 S. Ridgewood Ct., 2E
	Chicago, IL   60615


6.__Bibliography

1.  Aho, Alfred V., Sethi, Ravi, and Ullman, Jeffrey D.  Compilers.
    Addison-Wesley: Reading, Massachusetts, second printing, 1988.

2.  Griswold, Ralph E. and Griswold, Madge T.  The Icon Programming
    Language. Prentice-Hall, Inc.: Englewood Cliffs, New Jersey, USA,
    second edition, 1990.

3.  Griswold, Ralph E., Jeffery, Clinton L., and Townsend, Gregg M.
    Version 8.10 of the Icon Programming Language.  Univ. of Arizona
    Icon Project Document 212, 1993.  (obtain via anonymous FTP from
    cs.arizona.edu ~ftp/icon/docs/ipd212.doc)

4.  Tomita, Masaru.  Efficient Parsing for Natural Language.  Boston:
    Kluwer Academic Publishers, c. 1985.

5.  Tomita, Masaru editor.  Generalized LR Parsing.  Boston:  Kluwer
    Academic Publishers, 1991.