The Java API for XML Processing (JAXP) makes it easy to process
XML data using applications written in the Java programming language. JAXP
leverages the parser standards SAX (Simple API for XML Parsing) and DOM
(Document Object Model) so that you can choose to parse your data as a stream
of events or to build a tree-structured representation of it. The latest versions of
JAXP also support the XSLT (XML Stylesheet Language Transformations) standard,
giving you control over the presentation of the data and enabling you to
convert the data to other XML documents or to other formats, such as HTML.
JAXP also provides namespace support, allowing you to work with schemas that
might otherwise have naming conflicts.
Designed to be flexible, JAXP allows you to use any XML-compliant parser
from within your application. It does this with what is called a pluggability layer, which allows you to plug in an implementation of the SAX or DOM APIs. The
pluggability layer also allows you to plug in an XSL processor, which lets you
transform your XML data in a variety of ways, including the way it is displayed.
There are different type of parser available to parse the XML...
A. Event based parser (The SAX API)
B. Tree based parser (DOM)
The SAX API
The Simple API for XML (SAX) defines an API for an event-based
parser. Being event-based means that the parser reads an XML document from
beginning to end, and each time it recognizes a syntax construction, it notifies
the application that is running it. The SAX parser notifies the application by calling
methods from the ContentHandler interface. For example, when the parser
comes to a less than symbol (“<”), it calls the startElement method; when it
comes to character data, it calls the characters method; when it comes to the
less than symbol followed by a slash (“so on. To illustrate, let’s look at part of the example XML document from the
first section and walk through what the parser does for each line. (For simplicity,
calls to the method ignorableWhiteSpace are not included.)
Consider the following XML to parse.
<pricelist>
<coffee>
<name>Mocha Java </name>
<price>11.95</price>
</coffee>
<coffee>
<name>Sumatra</name>
<price>12.50</price>
</coffee>
</pricelist>
This would be parse as below.
<priceList> [parser calls startElement]
<coffee> [parser calls startElement]
<name>Mocha Java</name> [parser calls startElement,
characters, and endElement]
<price>11.95</price> [parser calls startElement,
characters, and endElement]
</coffee> [parser calls endElement]
The default implementations of the methods that the parser calls do nothing, so
you need to write a subclass implementing the appropriate methods to get the
functionality you want. For example, suppose you want to get the price per
pound for Mocha Java. You would write a class extending DefaultHandler (the
default implementation of ContentHandler) in which you write your own implementations
of the methods startElement and characters.
You first need to create a SAXParser object from a SAXParserFactory object. You
would call the method parse on it, passing it the price list and an instance of
your new handler class (with its new implementations of the methods startElement
and characters). In this example, the price list is a file, but the parse method can also take a variety of other input sources, including an InputStream object, a URL, and an InputSource object.
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse("priceList.xml", handler);
The result of calling the method parse depends, of course, on how the methods
in handler were implemented. The SAX parser will go through the file
priceList.xml line by line, calling the appropriate methods. In addition to the
methods already mentioned, the parser will call other methods such as startDocument, endDocument, ignorableWhiteSpace, and processingInstructions,
but these methods still have their default implementations and thus do nothing.
The following method definitions show one way to implement the methods
characters and startElement so that they find the price for Mocha Java and
print it out. Because of the way the SAX parser works, these two methods work
together to look for the name element, the characters “Mocha Java”, and the
price element immediately following Mocha Java. These methods use three
flags to keep track of which conditions have been met. Note that the SAX parser
will have to invoke both methods more than once before the conditions for printing
the price are met.
public void startElement(..., String elementName, ...){
if(elementName.equals("name")){
inName = true;
} else if(elementName.equals("price") && inMochaJava ){
inPrice = true;
inName = false;
}
}
public void characters(char [] buf, int offset, int len) {
String s = new String(buf, offset, len);
if (inName && s.equals("Mocha Java")) {
inMochaJava = true;
inName = false;
} else if (inPrice) {
System.out.println("The price of Mocha Java is: " + s);
inMochaJava = false;
inPrice = false;
}
}
No comments:
Post a Comment