Java Parsers Interview Questions and Answers

Why do we need XML parser?
We need XML parser because we do not want to do everything in our application from scratch, and we need some "helper" programs or libraries to do something very low-level but very necessary to us. These low-level but necessary things include checking the well-formedness, validating the document against its DTD or schema (just for validating parsers), resolving character reference, understanding CDATA sections, and so on. XML parsers are just such "helper" programs and they will do all these jobsl. With XML parsers, we are shielded from a lot of these complexicities and we could concentrate ourselves on just programming at high-level through the API's implemented by the parsers, and thus gain programming efficiency.

What is SAX parser? 

An event-based sequential access parser API that only operates on portions of the XML document at any one time.

What is DOM parser ? 

The Document Object Model (DOM) is an application programming interface (API) for valid HTML and well-formed XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated. In the DOM specification, the term "document" is used in the broad sense - increasingly, XML is being used as a way of representing many different kinds of information that may be stored in diverse systems, and much of this would traditionally be seen as data rather than as documents.

What is the difference between a DOMParser and a SAXParser?

DOM parsers and SAX parsers work in different ways.

  • A DOM parser creates a tree structure in memory from the input document and then waits for requests from client. But a SAX parser does not create any internal structure. Instead, it takes the occurrences of components of a input document as events, and tells the client what it reads as it reads through the input document.
  • A DOM parser always serves the client application with the entire document no matter how much is actually needed by the client. But a SAX parser serves the client application always only with pieces of the document at any given time.
  • With DOM parser, method calls in client application have to be explicit and forms a kind of chain. But with SAX, some certain methods (usually overriden by the cient) will be invoked automatically (implicitly) in a way which is called "callback" when some certain events occur. These methods do not have to be called explicitly by the client, though we could call them explicitly.

How do we decide on which parser is good?

Ideally a good parser should be fast (time efficient),space efficient, rich in functionality and easy to use . But in reality, none of the main parsers have all these features at the same time. For example, a DOMParser is rich in functionality (because it creates a DOM tree in memory and allows you to access any part of the document repeatedly and allows you to modify the DOM tree), but it is space inefficient when the document is huge, and it takes a little bit long to learn how to work with it. A SAXParser, however, is much more space efficient in case of big input document (because it creates no internal structure). What's more, it runs faster and is easier to learn than DOMParser because its API is really simple. But from the functionality point of view, it provides less functions which mean that the users themselves have to take care of more, such as creating their own data structures

how does Normalize method works in DOM Parser ? or

Normalization in DOM parsing with java - how does it work?
Puts all Text nodes in the full depth of the sub-tree underneath this Node, including attribute nodes, into a "normal" form where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes. This can be used to ensure that the DOM view of a document is the same as if it were saved and re-loaded, and is useful when operations (such as XPointer [XPointer] lookups) that depend on a particular document tree structure are to be used. If the parameter "normalize-characters" of the DOMConfiguration object attached to the Node.ownerDocument is true, this method will also fully normalize the characters of the Text nodes.

This basically means that the following XML element


could be represented like this in a denormalized node:

Element foo
    Text node: ""
    Text node: "Hello "
    Text node: "wor"
    Text node: "ld"

When normalized, the node will look like this

Element foo
    Text node: "Hello world"

And the same goes for attributes: <foo bar="Hello world"/>, comments, etc.

When DOM consider the given two nodes are equal ?
Two nodes are equal if and only if the following conditions are satisfied:

  • The two nodes are of the same type.
  • The following string attributes are equal: nodeName, localName, namespaceURI, prefix, nodeValue . This is: they are both null, or they have the same length and are character for character identical.
  • The attributes NamedNodeMaps are equal. This is: they are both null, or they have the same length and for each node that exists in one map there is a node that exists in the other map and is equal, although not necessarily at the same index.
  • The childNodes NodeLists are equal. This is: they are both null, or they have the same length and contain equal nodes at the same index. Note that normalization can affect equality; to avoid this, nodes should be normalized before being compared.
Can SAX and DOM parsers be used at the same time?
Yes, of course, because the use of a DOM parser and a SAX parser is independent. For example, if your application needs to work on two XML documents, and does different things on each document, you could use a DOM parser on one document and a SAX parser on another, and then combine the results or make the processings cooperate with each other.