|
To invoke grunk as a parsing library, you need to create an instance
of it, setting a configuration file and invoking it with a data source.
As opposed to streams, we use org.xml.sax.InputSource. This
has the advantage that it can accept a wide variety of source types and
gives a unified way of treating them. We do wish to emphasize the text-based
nature of grunk: It will read in the source looking specifically for
end of line markers and process each line in turn, checking it, if desired,
until there are no more recognized structures before moving on to the
next line. This may be altered by changing configuration parameters to allow
for, say a regular expression to recognize multiple blank lines.
This consists of precisely one method.
grunk(InputSource). This will process the data
an return a java.lang.Object. It assumes that your data
source has been wrappered in a SAX input source. It throws an
ncsa.emerge.grunk.GrunkException that should be caught and handled
in case grunk cannot parse something or finds a corrupt data source.
Be sure that you set a configuration appropriately first.
You will have to make an instance of grunk first. There are three possible constructors to choose from, depending upon your needs.
Grunk(). This relies on the default exporter,
which just means that you will receive a java.lang.String
object consisting of XML. All messages are piped to the console. This
is the most widely-used constructor.
Grunk(ExporterInterface). The argument is an implementation
of the interface for exporting the results. All messages are piped to
the console.
Grunk(ExporterInterface, ErrorMinder) The first
argument is an implementation of the interface for exporting the results.
The last result is an instantiation of the ErrorMinder
class. This is intended chiefly for system level debugging.
Setting a configuration is required for grunk to operate. The various formats are recorded in detail elsewhere, but the are two formats to be aware of. The full format, called TOSCA (for The One Syntax for the Configuration Analyzer) and its lightweight cousin, grunkLite. These may be set directly by invoking the appropriate method:
setConfiguration(InputSource) which assumes that the
configuration is in TOSCA format or,
setGrunkLiteConfiguration(InputSource) which assumes
that the configuration is in grunkLite format. In point of fact, what
happens is that grunk converts it to TOSCA format before starting its parsing.
As mentioned above, a grunkLite configuration file is transformed
into a TOSCA file. This is done with a XSL transformation and once you
have a transformation you want to use, you may invoke the an appropriate
method in the utility class ncsa.emerge.grunk.configuration.ConfigurationTransformation.
See the javadoc for more information.
It is best to read the javadoc for this class. Writing a custom exporter
is not hard and it merely has one method, writeNode(TreeNodeInterface).
It is unlikely though you will need to do this. You should realize
that the interface returns a java.lang.Object from this interface.
This permits you to customize the output of grunk into just about anything
you can figure out how to program. The two examples of XMLExporter
(which is the default that grunk makes unless you tell it something else)
and DOMExporter should be good guides and cover most cases
of interest. They respectively yield a java.lang.String
and an org.w3c.dom.Document, results so you must cast the
results before using them.
Here is an example of how to use ncsa.emerge.grunk.io.DOMExporter.
to get a DOM document of your source. All of this should be in a
try ... catch block to intercept
any GrunkException that arises.
We omit this to keep it more readable.
// ... whatever you need up to this point.
// Grunk needs its configuration, say it lives in the file myConfig.grk
InputSource myConfig = new InputSource(new java.io.FileReader("myConfig.grk"));
// Let's make the grunk instance.
Grunk grunk = new Grunk(myConfig, new DOMExporter());
// And now for the actual data source, assumed to be in the file mySource.dat
InputSource dataSource = new InputSource(new java.io.FileReader("mySource.dat"));
// Let's grunk it
Document myDomDoc = (Document) grunk.grunk(dataSource);
//... whatever else you need to do with it. You now have your data in
// a DOM document!
Here is a sample barebones invocation. It is assumed that you want
to open a configuration file, open a source file and grunk it. This is
a complete method for doing this. This returns a string, but remember
that it must be cast. It is assumed that you have imported org.xml.sax.InputSource
into your class as well as ncsa.emerge.grunk.*. We catch
the possible error.
String parseSource(String configFileName, String sourceFileName){
try{
Grunk grunk = new Grunk();
grunk.setConfiguration(new InputSource(new java.io.FileReader(configFileName)));
return (String)grunk.grunk(new InputSource(new java.io.FileReader(sourceFileName)));
}catch(GrunkException ge){
System.out.println("An informative diagnostic message:\n" + ge.getMessage());
}
}
It is quite likely that grunk will be running in an environment where
an error should not be communicated to the user. For example, if grunk
were invoked from a browser to interpret some downloaded text from a database
query with the aim of converting it into some simple HTML. Here good programming
would dictate that there be an exception to catch and that some reasonable
message be sent to the user, rather than a bewildering stack trace. Grunk
aims at uphollding decorum at all times, and this is the reason for having
a class to mind errors. This class takes a java.io.OutputStream
as the single argument for its constructor. All console messages will
be sent there.
ErrorMinder() This assumes that all output should
be sent to java.lang.System.out. When running grunk from
the command line, this is exactly what happens, which is why error messages
go to the console. ErrorMinder(java.io.OutputStream) All messages from
grunk are sent to the given stream. This may be a file for instance. The
stream will be flushed, but not closed automatically when grunk is
done with it, in case further processing is needed.