function_grammar SPIRIT
TOP PREVIOUS NEXT

Consider what should happen when the parser encounters expressions such as:


    axpy(a,x,y) = a * x + y
    factorial(x) = (x < 0.1) ? 1 : x * factorial(x - 1) 

The symbol table of recognized functions should be updated and a stack representation of the function assignment should be generated, much as occurs when a variable expression is encountered. For example, the stack representation of axpy(a,x,y)=a*x+y is


    0    variable_node    a
    1    variable_node    x
    2    function_node    mult
    3    variable_node    y
    4    function_node    add
    5    func_def_node    axpy#3

where axpy#3 is the mangled name of the axpy function taking 3 arguments. All these nodes contain pointers to the appropriate entries in the various symbol tables.

Separate actions for function declaration and function definition are required if YAC is to have the ability to re-define functions dynamically. That is, the function definition should be assigned to the function only in the course of evaluating the computational stack.

Public interface of function_grammar

The similar requirements of function_grammar and variable_grammar mean that their public interfaces are also very similar. The only difference lies in which symbol table is const and which is non-const:


    namespace yac {

    struct function_grammar
        : grammar<function_grammar, stack_closure::context_t>
    {
        typedef symbols<shared_ptr<function_node> > function_table_t;
        typedef symbols<double> var_table_t;

        function_table_t & functions;
        var_table_t const & global_vars;

        function_grammar(function_table_t & f, var_table_t const & v)
            : functions(f), global_vars(v)
        {}

        template <typename ScannerT>
        struct definition;
    };

    } // namespace yac

user_function

Before going on to talk about the definition struct of function_grammar itself, it is pertinent to ask, "What's in a function anyway?" Or rather, what's in the user_function that derives from it? Here, then, are the interesting parts of its interface:


    namespace yac {

    struct user_func_expression {
        typedef boost::shared_ptr<spirit::symbols<double> > symbol_table_ptr;

        vector<string> arg_names;
        symbol_table_ptr local_vars;
        stack stk;
    };

    struct user_function : public function
    {
        typedef spirit::symbols<double> symbol_table_t;

        user_function(std::string const & name, size_type const function_arity);
        user_function & operator=(user_func_expression const & ufe);

    private:
        double eval_priv(std::vector<double> const & params) const;

        size_type arity_;
        std::vector<std::string> arg_names_;
        symbol_table_t local_vars_;
        stack stk_;
    };

    } // namespace yac

The definition of eval_priv has already been presented, here to explain just how a user_function is evaluated. It was explained that a user_function contains three separate data stores all of which are used when the function comes to be evaluated for a particular set of input parameters.

Filling the data stores of a user_function is a two-stage process. When the variable is created, at parse time, it's name and arity are set. Storing this variable, in the symbol table of known functions, e.g. aypy#3, will enable the expression_grammar to parse any rhs expression involving aypy#3 that occurs thereafter. The parsing process goes on to add a func_def_node to the virtual machine's stack, representing the assignment of the function definition to the function variable. This assignment occurs when the stack comes to be evaluated, after the parsing is complete. A func_def_node contains two stores: a pointer to the function in the function symbol table and a user_func_expression variable that contains the three data stores described above.

The logic behind the above description is, I hope, transparent. What is interesting from a Spirit point of view is best explained using the example function declaration axpy(a,x,y)=a*x+y. The local_vars_ symbol table of this user_function will contain three variables, a, x and y. This symbol table is created dynamically and will be stored in a user_func_expression in the virtual machine's computational stack. However, this self-same store should also be made available to the expression_grammar variable tasked with parsing a*x+y and generating a stack from it. This requirement will lead to some quite complex use of the Spirit parser tools — the expression_grammar parser must also be created dynamically!

function_grammar::definition

The Spirit representation of the grammar needed to parse a statement such as axpy(a,x,y)=a*x+y is:


    func_def  
        =  func_decl
           [
               // Semantic actions to add the function to
               // the symbol table of known functions.
           ]
        >> '='
           // There is a catch here  —  expression
           // depends on a dynamically-created local_vars
           // store.
        >> expression
           [
               // Semantic actions to build a computational
               // stack that will assign the function definition
               // to the function variable.
           ]
        ;
    func_decl 
        = name >> '(' >> !name_list >> ')';
    name_list 
        = name >> *(',' >> name);

where func_decl is the rule used to parse axpy(a,x,y).

Clearly, parsing the input data is quite straight-forward, but the semantic actions that result are pretty complex. The snippet above illustrates the two-stage nature of the process of applying these actions. However, before considering them any further, let's return to the func_decl rule itself.

Three local variables are filled when parsing a function declaration like foo(a,b), the function name and its argument list which is used to fill both a vector (so that the parameters passed to the function can be assigned to the correct local variable) and as a symbol table. Local variables in the context of a Spirit parser suggest that a closure should be attached to func_decl:


    struct func_closure
        : spirit::closure<func_closure,
                          std::string,
                          std::vector<std::string>,
                          boost::shared_ptr<spirit::symbols<double> >
    {
        member1 name;
        member2 args;
        member3 local_vars;
    };

The local variable symbol table is stored as a shared_ptr to simplify the accounting process that ensures all pointers to its data from the stack representing the function definition remain valid.

An attempt to parse foo(a,b) must ensure that this symbol table is reset for each function parsed. This is done as late as possible (i.e., after foo(), using a lazy function wrapper for shared_ptr::reset.

Thereafter, there is one subtlety that should be considered when parsing a function declaration like foo(a,b). That is, foo(a,a) should be flagged as invalid, causing the parser to fail. Or in other words, the dynamically created func_def.local_vars closure variable should be used in parsing the function declaration itself! (Remember, a Spirit symbol table is a fully fledged parser in its own right.) Fortunately for us, Spirit has a rather cool lazy_p parser that fits the bill perfectly. lazy_p is a lazy parser; that is, it can be created dynamically. The final result is:


    // This rule parses "foo(a,b)", disallowing "foo(a,a)".
    // Note that the closure variables are attached to "func_def"
    // rather than to "func_decl".
    // That's safe because this grammar is not recursive.
    // "func_def" will go on to use these data to construct the
    // final semantic actions.
    func_decl
        =  name
           [
               func_def.name = construct_<std::string>(arg1, arg2)
           ]
        >> ch_p('(')
           [
               reset(func_def.local_vars, new_<var_table_t>())
           ]
        >> !list_p((name - lazy_p(*func_def.local_vars))
                   [
                       push_back(func_def.args,
                                 construct_<std::string>(arg1, arg2)),
                       add_symbol(*func_def.local_vars,
                                  construct_<std::string>(arg1, arg2))
                   ]
                 , ',')
        >> ')';
ALERT! new_ or new in semantic actions?

One very important point to note about the call to reset is that the data is created using Phoenix's new_ lazy function.

It is possible to use new var_table_t here (i.e., the code will compile) because all the information needed to invoke the var_table_t constructor is available at compile time. However, to do so is almost certainly an error.

A call to new will create a variable on the heap at compile time. This variable is then stored until the input data comes to be parsed, whereupon it is passed to shared_ptr::reset(). If more than one function comes to be parsed, then this variable will be passed more than once — disaster!

What I hope is gradually becoming clear is that almost all the difficulties that have been encountered so far have been in specifying exactly what semantic actions should be performed. Thereafter, Spirit makes it almost trivially easy to write the code itself. It really is cool to be Lazy. :-)

This feeling, that the semantic actions themselves are complex but that Spirit makes it easy to write the ensuing code, is reinforced when we come to consider the func_def rule.

Of the two "globally visible" semantic actions that are performed by function_grammar, the first, adding a new function to the symbol table of known functions, should be performed immediately after the func_decl rule has finished. That means that the function name will then be available to the expression_grammar variable that parses the function definition, enabling recursive functions to be defined:


    # x is a double, so testing (x == 0) isn't very clever.
    factorial(x) = (x < 0.1) ? 1 : x * factorial(x - 1) 

Once again, Spirit makes the difficult seem easy:


    func_def
        =  func_decl
           [
               func_def.mangled_name = mangled_name(func_def.name,
                                                    size(func_def.args)),

               add_symbol(var(self.functions),
                          func_def.mangled_name,
                          construct_<user_function>(
                              func_def.name,
                              size(func_def.args))),
           ]
           ...

Functions in the symbol table are accessed using a mangled version of their name to allow functions with the same name but different arity to be overloaded. The above code snippet uses yet another closure variable attached to func_def to store this mangled name. Yup, you've guess it — the mangled_name function, above, is a lazy function too. Finally, construct_ is used to create a wrapper for the user_function constructor. Ain't that just nifty?

We're almost ready to move on to the task of parsing the function definition. Before we can do so, however, we must first create the parser. Again, Spirit makes it easy. The fifth and final member of the func_def closure is a pointer to an expression_grammar variable. Creating the parser uses the new_ lazy function wrapper for operator new:


    struct func_closure
        : spirit::closure<func_closure,
                          ...,
                          boost::shared_ptr<expression_grammar> >
    {
        member5 expr;
    };

    func_def
        =  func_decl
           [
               ...
               // Create the expression_grammar instance that will
               // parse the function definition.
               reset(func_def.expr,
                         new_<expression_grammar>(
                             var(self.functions),
                             var(self.global_vars),
                             *func_def.local_vars))
           ]
           ...

Voilą! An expression_grammar parser that knows about the function's local variables! The final step, parsing the function definition and adding a func_def node to the stack that is 'returned' by function_grammar should now be straightforward. First, we must wrap the dynamically created expression_grammar variable up inside a lazy_p parser. Thereafter, we do no more than stk.push_back(new func_def_node(/* constructor args */)). Lazily, of course:


    func_def
        =  func_decl
           [
               ...
           ]
        >> '='
        >> lazy_p(*func_def.expr)
           [
               // Add a node to the stack which will reset the
               // definition of the named function.

               // The func_def_node constructor:
               // func_def_node(shared_ptr<function>,
               //               user_func_expression const &);
               push_back(self.stk,
                         new_<func_def_node>(
                             *find_symbol(var(self.functions),
                                          func_def.mangled_name),
                             construct_<user_func_expression>(
                                 func_def.args,
                                 func_def.local_vars,
                                 arg1)))
           ]
        ;

func_def_node's constructor takes two arguments. A pointer to the appropriate function in the function symbol table and a user_func_expression variable that contains the function definition. Note that arg1, above, is the closure returned by the expression_grammar that parsed this definition.

The actual code for function_grammar can be found here. Once again, I won't try and convince you that this has been an easy ride, but I think that Spirit and Phoenix have made it straightforward to apply some pretty hairy logic to the problem. Perhaps you think so too?

TOP PREVIOUS NEXT

Valid XHTML 1.1! Valid CSS!