Grunk
Overview
Grunk (for
GRammar
UNderstanding
Kernel) is a
library for parsing and extracting structured metadata from
semi-structured text formats. It is based on a very flexible parsing
engine capable of detecting a wide variety of patterns in text formats
and extracting information from them. Formats are described in a
simple and powerful XML configuration from which Grunk builds a parser
at runtime, so adapting Grunk to a new format does not require
a coding or compilation step.
Grunk features:
- Pure Java implementation
- Powerful two-step parser with pattern-matching based on Perl5 regular expressions
- Inline transformations making it possible to parse otherwise tricky syntaxes
- XML-based configuration
- Support for XML output
- Flexible API
Look here for documentation.