The content of this page in no way reflects the opinions, standards,
or policy of the United States Air Force
Academy or the United States government.
Introduction
Welcome to the AdaGOOP home page. AdaGOOP is the Ada Generator of Object-Oriented Parsers. Using AdaGOOP, you can easily generate a parser which builds an object-oriented parse tree. Read here the Ada Letters paper (June 2000) describing AdaGOOP.
AdaGOOP 2005 is an Ada 2005 automatic parser generator built on top of the SCATC versions of aflex and ayacc. As SableCC does, AdaGOOP automatically generates code to build a parse tree, and a traversal of the parse tree using the visitor pattern. It is currently available at: http://adagoop.martincarlisle.com
AdaGOOP is distributed as Ada 2005 source code. To use the tool, simply download and compile the source (adagoop.adb contains the main procedure). AdaGOOP outputs will be run through SCAFLEX and SCAYACC. Source code for these tools is distributed in subfolders of the same names (scaflex.adb and scayacc.adb are the main procedures).
AdaGOOP is executed from a command prompt as follows:
adagoop input_file prefix (e.g. adagoop ada05.g ada05)
Prefix should be a simple name for the project. This name will be used several places in the output files.
The input file to AdaGOOP consists of three parts: token_macros, tokens, and the grammar. Token macros are regular expressions that form parts of tokens. For example,
token_macros
DIGIT [0-9]
INTEGER ({DIGIT}+)
defines two macros, DIGIT and INTEGER, that can be used to create other tokens. Tokens are formed in the same fashion; however, their regular expressions can only contain token macros, not other tokens. The regular expression operators supported are:
[0-9A-FZ] : any 1
character ‘0’-‘9’,‘A’-‘F’,or ‘Z’
[^xyz] : any character except x, y or z
\n : new line
. : any single character
\. : the character ‘.’ (similar for \”, \[, …)
x* : 0 or more occurrences of x
x+ : 1 or more occurrences of x
x? : 0 or 1 occurrences of x
x|y : either x or y
“x*” : literally “x*” (ignore operators)
{MACRO} : replace with definition of MACRO
() : parentheses, as (zy)*
Comments should be defined with the token “ignore”, e.g.
ignore "--"[^\n]*
Both tokens and token macros should be specified 1 per line. Comments may be interspersed with --. This is why the ignore token above must have the -- in quotation marks.
The final section, “grammar” gives an LALR(1) grammar. It should begin with a %start, indicating which non-terminal symbol is the starting symbol of the grammar, e.g.
grammar
%start A
A : A token1 B | token2 | ;
B : token2;
Each grammar rule consists of a non-terminal name, followed by a colon, followed by 1 or more sequences of terminals and non-terminals, separated by ‘|’. Note that a null production is specified by simply having one of these sequences be empty. The rule is terminated with a semicolon. White space is irrelevant. A complete sample file is given in Appendix A.
Throughout this section, “prefix” refers to the command-line argument passed to AdaGOOP. AdaGOOP generates prefix.l, an input file for scaflex. This file contains the specifications of tokens. The output file prefix.y is an input file for scayacc. It contains the grammar, plus code to build an object-oriented parse tree.
Once AdaGOOP has been run, you can create the lexer and parser by running (for our example):
scaflex prefix.l
gnatchop -w prefix.a
gnatchop -w prefix_io.a
gnatchop -w prefix_dfa.a
scayacc prefix.y
gnatchop -w prefix.a
gnatchop -w prefix_goto.a
sed -e 1d prefix_tokens.a > prefix_modified_tokens.a
gnatchop –w prefix_modified_tokens.a
gnatchop -w prefix_shift_reduce.a
Note gnatchop is needed only if your compilation environment (such as GNAT) requires package specifications and bodies to appear in separate files. sed is used only to delete the first line of prefix_tokens.a. We needed to add with context clauses to this file, but they appeared after the package declaration. To resolve this, we added the context clauses, another package declaration, and within a batch script used sed to delete the first package declaration.
For the object-oriented parse tree, AdaGOOP creates an abstract object, Parseable, that will serve as the root of our hierarchy (it has no fields):
type Parseable is abstract tagged null record;
We also create a single object type, Parseable_Token, to handle all terminal symbols:
type Parseable_Token is new
Parseable with record
Line : Natural;
Column : Natural;
Token_String : String_Ptr;
Token_Type : Token; -- enumeration of all tokens
end record;
Both Parseable and Parseable_Token are found in the package Prefix_Model. A production A => B C, where B and C are non-terminals, corresponds to a node in the parse tree containing pointers to the subtrees derived through B and C. We therefore create the following class declaration (which will appear in the package specification a_model.ads):
type
A_nonterminal is new Parseable with record
B_part : access B_nonterminal’Class;
C_part : access C_nonterminal’Class;
end record;
Since we have a single class for all terminal symbols, if we have A => B terminal1 C, we simply add a pointer to a Parseable_Token as the second field of the record. The parse tree then contains all of the information to completely recreate the original source text, with the exception of comment tokens (which are ignored).
To address the issue of multiple productions for a single non-terminal, we insert an abstract class for the non-terminal, and have concrete classes corresponding to each possible production. For example, if we have A => B terminal1 C | D, then AdGOOP outputs the following record declarations in the package A_Model:
type
A_nonterminal is abstract new Parseable
with null record;
type A_nonterminal_Ptr is access all A_nonterminal’Class;
type A_nonterminal1 is new A_nonterminal with record
B_part : access B_nonterminal’Class
Terminal1_Part : Parseable_Token_Ptr;
C_part : access C_nonterminal’Class;
end record;
type A_nonterminal2 is new A_nonterminal with record
D_part : access D_nonterminal’Class;
end record;
The tool also will automatically number the fields in the case where a non-terminal or terminal appears more than once on the right hand side of a production. So, if we had A => B terminal1 B instead, the record would look like:
type A_nonterminal1 is new
A_nonterminal with record
B_part1 : access B_nonterminal’Class
Terminal1_Part : Parseable_Token_Ptr;
B_part2 : access B_nonterminal’Class;
end record;
AdaGOOP also generates code for traversing the parse tree using the Visitor pattern. Each class in the parse tree hierarchy has an acceptor method, which is used to dispatch to the appropriate visitor method:
procedure Acceptor(This :
access Parseable;
I : access prefix_Visitor_Interface.
Visit_prefix_Interface'Class) is abstract;
This automatically generated interface specifies a visitor method that has no additional parameters and does not return a value. It is output to the file prefix_visitor_interface.ads.
As can be seen from the following sample body (automatically generated), Acceptor merely calls the appropriate visitor method found in the interface:
procedure Acceptor(This : access A_nonterminal2;
I : access
Visit_prefix_Interface'Class) is
begin
I.Visit_A_nonterminal2(This);
end Acceptor;
Visit_prefix_Interface contains one visitor method for each leaf of the hierarchy, plus supports both pre-order and post-order depth first traversals by providing “before” and “after” methods that can be overloaded:
limited with
A_Model;
limited with
prefix_Model;
package
prefix_Visitor_Interface is
type
Visit_prefix_Interface is interface;
procedure Visit_Parseable_Token(
I : access
Visit_prefix_Interface;
T : access
prefix_Model.Parseable_Token'Class) is null;
procedure Before_A_nonterminal1(
I : access
Visit_prefix_Interface;
N : access
A_Model.A_nonterminal1'Class) is null;
procedure
Visit_A_nonterminal1(
I : access
Visit_prefix_Interface;
N : access
A_Model.A_nonterminal1'Class) is abstract;
procedure
After_A_nonterminal1(
I : access
Visit_prefix_Interface;
N : access
A_Model.A_nonterminal1'Class) is null;
procedure Before_A_nonterminal2(
I : access
Visit_prefix_Interface;
N : access
A_Model.A_nonterminal2'Class) is null;
procedure
Visit_A_nonterminal2(
I : access
Visit_prefix_Interface;
N : access
A_Model.A_nonterminal2'Class) is abstract;
procedure
After_A_nonterminal2(
I : access
Visit_prefix_Interface;
N : access
A_Model.A_nonterminal2'Class) is null;
…
end
prefix_Visitor_Interface;
A depth first traversal class, DFS, is generated into the package prefix_DFS. DFS implements the visitor interface, and its visit methods perform a depth first traversal of the parse tree. For example, following is the visitor method corresponding to the grammar rule A ® A PROCEDURE B:
procedure Visit_A_nonterminal1(
I : access
DFS;
N : access
A_Model.A_nonterminal1'Class) is
I_Classwide :
access DFS'Class := I;
begin
I_Classwide.Before_A_nonterminal1(N);
N.A_part.Acceptor(I);
I_Classwide.Visit_Parseable_Token(N.PROCEDURE_part);
N.B_part.Acceptor(I);
I_Classwide.After_A_nonterminal1(N);
end
Visit_A_nonterminal1;
This visitor method calls the “before” method (for pre-order traversals), then calls Acceptor for each of the children to dispatch to the appropriate visitor, and finally calls the “after” method (for post-order traversals). Tokens don’t require dispatching, so the visit method can be called directly. It is interesting to note that Ada, unlike Java, does not perform dynamic dispatching by default. Therefore, it is necessary to convert the parameter I to a classwide type before calling Acceptor on each child so that dispatching occurs.
To create your own visitor, you simply create a child class for DFS. In this child class, you override the “before” and “after” methods for those parse tree classes that you wish to process:
limited with A_Model;
with Prefix_DFS;
package DFS_Example is
type DFS is new Prefix_DFS.DFS with null record;
overriding procedure After_A_nonterminal1(
I : access
DFS;
N : access
A_Model.A_nonterminal1'Class);
overriding
procedure Before_A_nonterminal2(
I : access
DFS;
N : access
A_Model.A_nonterminal2'Class);
…
end
DFS_Example;
Creating a main program for a tool generated with AdaGOOP is simple. You need simply call the “run” procedure in the prefix_parser package with a filename, then use get_parse_tree to get the root of resulting parse tree. The depth first traversal is started by calling the method Acceptor on the root of the tree.
with
Ada.Command_Line;
use
Ada.Command_Line;
with
ada.text_io;
use
ada.text_io;
with
prefix_parser;
with
prefix_model;
with
DFS_Print;
with
a_model;
procedure tool is
type
DFS_Access is access all DFS_Print.DFS'Class;
Parse_Tree :
prefix_model.parseable_ptr;
Visitor : DFS_Access
:= new DFS_Print.DFS;
begin
if
argument_count /= 1 then
put_line("usage:
tool filename");
return;
end if;
prefix_parser.run(argument(1));
parse_tree :=
prefix_parser.get_parse_tree;
parse_tree.Acceptor(Visitor);
end
tool;
AdaGOOP is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.