Package com.randomnoun.common
Class XmlUtil
java.lang.Object
com.randomnoun.common.XmlUtil
XML utility functions
- Author:
- knoxg
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
An abstract stack-based XML parser.static class
Convert a NodeList into something that Java1.5 can treat as Iterable, so that it can be used infor (Node node : nodeList) { ...
static class
Convert a table into a List of Lists (each top-level list represents a table row, each second-level list represents a table cell). -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic void
Remove leading/trailing whitespace from all text nodes in this nodeList.static String
getCleanXml
(InputStream inputStream, boolean isHtml) Clean a HTML inputStream through the tagsoup filter.static String
getCleanXml
(String inputXml, boolean isHtml) Clean some HTML text through the tagsoup filter.static String
Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string.static String
getTextNonRecursive
(Element element) Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string.static String
getTextPreserveElements
(Element element, String[] tagNames) Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string.static String
getXmlString
(Node node, boolean omitXmlDeclaration) Converts a document node subtree back into an XML stringstatic void
processContentHandler
(ContentHandler contentHandler, String xmlText) Parse a string of XML text using a SAX contentHandler.static Document
Return a DOM document object from an InputStreamstatic Document
toDocument
(String text) Return a DOM document object from an XML string
-
Constructor Details
-
XmlUtil
public XmlUtil()
-
-
Method Details
-
getCleanXml
Clean some HTML text through the tagsoup filter. The returned string is guaranteed to be well-formed XML (and can therefore be used by other tools that expect valid XML).- Parameters:
inputXml
- input XML documentisHtml
- if true, uses the HTML schema, omits the XML declaration, and uses the html method- Throws:
SAXException
- if the tagsoup library could not parse the input stringIllegalStateException
- if an error occurred reading from a string (should never occur)
-
getCleanXml
Clean a HTML inputStream through the tagsoup filter. The returned string is guaranteed to be well-formed XML (and can therefore be used by other tools that expect valid XML).- Parameters:
inputStream
- input XML streamisHtml
- if true, uses the HTML schema, omits the XML declaration, and uses the html method- Throws:
SAXException
- if the tagsoup library could not parse the input stringIllegalStateException
- if an error occurred reading from a string (should never occur)
-
getText
Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string.Elements are recursed into.
- Parameters:
element
- the element that contains, as child nodes, the text to be returned.- Returns:
- the contents of all the CDATA children of the specified element.
-
getTextPreserveElements
Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string. Any elements with tagNames that are included in the tagNames parameter of this method are also included.Attributes of these tags are also included in the result, but may be reordered.
Self-closing elements (e.g.
<br/>
) are expanded into opening and closing elements (e.g.<br></br>
)Elements are recursed into.
- Parameters:
element
- the element that contains, as child nodes, the text to be returned.- Returns:
- the contents of all the CDATA children of the specified element.
-
getTextNonRecursive
Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string.Elements are not recursed into.
- Parameters:
element
- the element that contains, as child nodes, the text to be returned.- Returns:
- the contents of all the CDATA children of the specified element.
-
toDocument
Return a DOM document object from an XML string- Parameters:
text
- the string representation of the XML to parse- Throws:
SAXException
-
toDocument
Return a DOM document object from an InputStream- Parameters:
is
- the InputStream containing the XML to parse- Throws:
SAXException
-
getXmlString
public static String getXmlString(Node node, boolean omitXmlDeclaration) throws TransformerException Converts a document node subtree back into an XML string- Parameters:
node
- a DOM nodeomitXmlDeclaration
- if true, omits the XML declaration from the returned result- Returns:
- the XML for this node
- Throws:
TransformerException
- if the transformation to XML failedIllegalStateException
- if the transformer could not be initialised
-
compact
Remove leading/trailing whitespace from all text nodes in this nodeList. Will iterate through subnodes recursively.- Parameters:
node
-
-
processContentHandler
public static void processContentHandler(ContentHandler contentHandler, String xmlText) throws SAXException, IllegalStateException Parse a string of XML text using a SAX contentHandler. Nothing is returned by this method - it is assumed that the contentHandler supplied maintains it's own state as it parses the XML supplied, and that this state can be extracted from this object afterwards.- Parameters:
contentHandler
- a SAX content handlerxmlText
- an XML document (or part thereof)- Throws:
SAXException
- if the document could not be parsedIllegalStateException
- if the parser could not be initialised, or an I/O error occurred (should not happen since we're just dealing with strings)
-