Parse-Gnaw

Parse::Gnaw - An extensible parser. Define grammars using subroutine calls.
Define your own grammar extensions by defining new subroutines. Parse text
in memory or from/to files or other streams.

Gnaw is a perl module which implements full regular expressions and 
full text parsing grammars using nothing but pure perl code limited 
to subroutine closures, exception trapping via eval, and basic perl 
variables such as scalars, hashes, and arrays. 

Parse::Gnaw does not use regular expressions under the hood.

You write your grammar in pure perl. There is no intermediate 
"parser language" that then gets interpreted into something executable.

When you do a "use Parse::Gnaw", the Gnaw module will import a 
number of functions directly into your namespace. Yes, this is 
completely bad form for normal modules. But this is not a normal 
module. The imported subroutines include regular expression and 
parsing functions for matching, quantifiers, literals, 
alternations, character classes, and so on. You build up your 
grammar by calling these functions. The final call will return 
a code reference. This code reference is your grammar.

When you dereference that grammar, if it is a "match" function, 
then you pass in the string you want to parse.

	use Parse::Gnaw;

	# create the grammar
	my $grammar = match('hello');

	# apply the grammar to a string
	if($grammar->('hello world')) {
		print "match\n";
	} else {
		print "no match";
	}

You can also create the grammar and execute it in one step:

	my $texttoparse = "howdy partner";

	if(match('hello', 'world')->($texttoparse)) {
		print "match\n";
	} else {
		print "no match\n";
	}

Note the above example translated into perls regular expression syntax 
would look something like this:

	my $texttoparse = "howdy partner";

	if($texttoparse =~ m{hello\s*world}) {
		print "match\n";
	} else {
		print "no match\n";
	}

You can build up more complicated grammars fairly easily.
This one looks for a sentence about fruit.

	$grammar = match(
			ql('I would like to buy'), 
			some('a', qa('banana apple pear peach')
	));

	
	if($grammar->('yes, we have no bananas today')) {
		print "match\n";
	} else {
		print "no match\n";
	}

More complicated grammars can be handled by breaking up the grammar
into subroutines which act as rules. Here's an example of a somewhat
complex grammar using subroutines for subrules:


	sub trekname { qa('Jim Captain Spock Bones Doctor Scotty') } 

	sub occupation {a('ditch digger', 'bricklayer', 'mechanic')}
	sub mccoy_job { [ql("I'm a doctor, not a"), occupation, a('!', '.')] }
	sub mccoy_diag { [ "He's", 'dead', ',', trekname, a('!', '.') ] }
	sub mccoy_rant1 { [ql('You green-blooded Vulcan'), a('!', '.') ] }

	sub mccoy_isms { a(mccoy_job, mccoy_diag, mccoy_rant1) }

	sub spock_awe {['Fascinating', ',', trekname, '.']}
	sub spock_logic {['Highly', 'illogical',',', trekname, '.']}
	sub spock_sensors { [ql("It's life ,"), trekname, ql(', but not as we know it .')]}

	sub spock_isms {a(spock_awe, spock_logic, spock_sensors)}

	sub kirk_dipolomacy1 {ql('We come in peace .')}
	sub kirk_dipolomacy2 {ql('Shoot to kill .')}
	sub kirk_to_scotty {ql('I need warp speed now, Scotty !')}
	sub kirk_to_spock {ql('What is it , Spock ?')}
	sub kirk_to_bones {ql('Just fix him , Bones')}
	sub kirk_solution {ql('Activate ship self-destruct mechanism .')}

	sub kirk_isms {a(
		kirk_dipolomacy1, 
		kirk_dipolomacy2,
		kirk_to_scotty,
		kirk_to_spock,
		kirk_to_bones,
		kirk_solution
	)}

	sub time_units {qa('minutes hours days weeks')}
	sub scotty_phy101 {ql('Ya kenna change the laws of physics .')}
	sub scotty_estimate {[ ql("I'll have it ready for you in three"), time_units, '.' ]}

	sub scotty_isms { a(scotty_phy101, scotty_estimate) }

	sub alien_isms {'weeboo'}

	sub trek_isms {a(mccoy_isms, spock_isms, kirk_isms, scotty_isms, alien_isms )}

	sub trek_screenplay {some(trek_isms)}

	$grammar = parse(  trek_screenplay );


Given the grammar in the above example, you could create some text 
and see if it follows the trek screenplay format this way:


	my $script = <<'SCRIPT';

		What is it, Spock?
		It's life, Jim, but not as we know it.
		We come in peace.
		weeboo
		Shoot to kill.
		weeboo
		I need warp speed now, Scotty!
		I'll have it ready for you in three minutes.
		weeboo
		I need warp speed now, Scotty!
		Ya kenna change the laws of physics.
		weeboo
		weeboo
		Shoot to kill.
		Shoot to kill.
		I'm a doctor, not a bricklayer.
		Highly illogical, Doctor.
		You green-blooded Vulcan.
		Shoot to kill.
		Shoot to kill.
		He's dead, Jim.
		Activate ship self-destruct mechanism.
		Highly illogical, Captain.

	SCRIPT
	;


	$grammar->( $script )

And so on.

See the pod for more information.

	perldoc Parse::Gnaw


BETA RELEASE

Please note that this is a BETA RELEASE. 

It is still a work in progress and the entire package is subject 
to change at any time.

When I believe I've got a package that does everything, 
I'll make an production release, non-Beta release, of Parse::Gnaw 
The rev number will probably be 1.0 or greater for a 
production release.

Until a production release is available, please do not
use Parse::Gnaw to generate large massive complex grammars,
only to have the nuts and bolts under the hood change on you later.


INSTALLATION

To install this module, run the following commands:

	perl Makefile.PL
	make
	make test
	make install

SUPPORT AND DOCUMENTATION

After installing, you can find documentation for this module with the
perldoc command.

    perldoc Parse::Gnaw

You can also look for information at:

    RT, CPAN's request tracker
        http://rt.cpan.org/NoAuth/Bugs.html?Dist=Parse-Gnaw

    AnnoCPAN, Annotated CPAN documentation
        http://annocpan.org/dist/Parse-Gnaw

    CPAN Ratings
        http://cpanratings.perl.org/d/Parse-Gnaw

    Search CPAN
        http://search.cpan.org/dist/Parse-Gnaw


COPYRIGHT AND LICENCE

Copyright (C) 2008 Greg London

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.