Class XmlUtil

java.lang.Object
com.randomnoun.common.XmlUtil

public class XmlUtil extends Object
XML utility functions
Author:
knoxg
See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
    An abstract stack-based XML parser.
    static class 
    Convert a NodeList into something that Java1.5 can treat as Iterable, so that it can be used in for (Node node : nodeList) { ...
    static class 
    Convert a table into a List of Lists (each top-level list represents a table row, each second-level list represents a table cell).
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static void
    compact(Node node)
    Remove leading/trailing whitespace from all text nodes in this nodeList.
    static String
    getCleanXml(InputStream inputStream, boolean isHtml)
    Clean a HTML inputStream through the tagsoup filter.
    static String
    getCleanXml(String inputXml, boolean isHtml)
    Clean some HTML text through the tagsoup filter.
    static String
    getText(Element element)
    Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string.
    static String
    Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string.
    static String
    getTextPreserveElements(Element element, String[] tagNames)
    Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string.
    static String
    getXmlString(Node node, boolean omitXmlDeclaration)
    Converts a document node subtree back into an XML string
    static void
    processContentHandler(ContentHandler contentHandler, String xmlText)
    Parse a string of XML text using a SAX contentHandler.
    static Document
    Return a DOM document object from an InputStream
    static Document
    Return a DOM document object from an XML string

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

  • Method Details

    • getCleanXml

      public static String getCleanXml(String inputXml, boolean isHtml) throws SAXException
      Clean some HTML text through the tagsoup filter. The returned string is guaranteed to be well-formed XML (and can therefore be used by other tools that expect valid XML).
      Parameters:
      inputXml - input XML document
      isHtml - if true, uses the HTML schema, omits the XML declaration, and uses the html method
      Throws:
      SAXException - if the tagsoup library could not parse the input string
      IllegalStateException - if an error occurred reading from a string (should never occur)
    • getCleanXml

      public static String getCleanXml(InputStream inputStream, boolean isHtml) throws SAXException
      Clean a HTML inputStream through the tagsoup filter. The returned string is guaranteed to be well-formed XML (and can therefore be used by other tools that expect valid XML).
      Parameters:
      inputStream - input XML stream
      isHtml - if true, uses the HTML schema, omits the XML declaration, and uses the html method
      Throws:
      SAXException - if the tagsoup library could not parse the input string
      IllegalStateException - if an error occurred reading from a string (should never occur)
    • getText

      public static String getText(Element element)
      Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string.

      Elements are recursed into.

      Parameters:
      element - the element that contains, as child nodes, the text to be returned.
      Returns:
      the contents of all the CDATA children of the specified element.
    • getTextPreserveElements

      public static String getTextPreserveElements(Element element, String[] tagNames)
      Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string. Any elements with tagNames that are included in the tagNames parameter of this method are also included.

      Attributes of these tags are also included in the result, but may be reordered.

      Self-closing elements (e.g. <br/>) are expanded into opening and closing elements (e.g. <br></br>)

      Elements are recursed into.

      Parameters:
      element - the element that contains, as child nodes, the text to be returned.
      Returns:
      the contents of all the CDATA children of the specified element.
    • getTextNonRecursive

      public static String getTextNonRecursive(Element element)
      Iterates through the child nodes of the specified element, and returns the contents of all Text and CDATA elements among those nodes, concatenated into a string.

      Elements are not recursed into.

      Parameters:
      element - the element that contains, as child nodes, the text to be returned.
      Returns:
      the contents of all the CDATA children of the specified element.
    • toDocument

      public static Document toDocument(String text) throws SAXException
      Return a DOM document object from an XML string
      Parameters:
      text - the string representation of the XML to parse
      Throws:
      SAXException
    • toDocument

      public static Document toDocument(InputStream is) throws SAXException
      Return a DOM document object from an InputStream
      Parameters:
      is - the InputStream containing the XML to parse
      Throws:
      SAXException
    • getXmlString

      public static String getXmlString(Node node, boolean omitXmlDeclaration) throws TransformerException
      Converts a document node subtree back into an XML string
      Parameters:
      node - a DOM node
      omitXmlDeclaration - if true, omits the XML declaration from the returned result
      Returns:
      the XML for this node
      Throws:
      TransformerException - if the transformation to XML failed
      IllegalStateException - if the transformer could not be initialised
    • compact

      public static void compact(Node node)
      Remove leading/trailing whitespace from all text nodes in this nodeList. Will iterate through subnodes recursively.
      Parameters:
      node -
    • processContentHandler

      public static void processContentHandler(ContentHandler contentHandler, String xmlText) throws SAXException, IllegalStateException
      Parse a string of XML text using a SAX contentHandler. Nothing is returned by this method - it is assumed that the contentHandler supplied maintains it's own state as it parses the XML supplied, and that this state can be extracted from this object afterwards.
      Parameters:
      contentHandler - a SAX content handler
      xmlText - an XML document (or part thereof)
      Throws:
      SAXException - if the document could not be parsed
      IllegalStateException - if the parser could not be initialised, or an I/O error occurred (should not happen since we're just dealing with strings)