Friday, February 13, 2009

XSLT output is missing the doctype

Keywords:
xslt identity transform missing doctype Document Type java DOM

Problem:
Running an XSLT 'identity transform' I would expect the XML data out to be identical to the data in ... it almost is, but it's missing the doctype declaration.
Can you tell the XSTL to keep the doctype in?

Solution:
Apparently not ... I can't find a good reference for this, but it seems that the spec (or at least xalan) is a bit vague on how this should be handled. Proposed work-arounds seem to be hard-coding the doctype in the <xsl:output> element (if outputting to xml) or hiding the doctype in the XSLT via <xsl:text disable-output-escaping="yes">.

If you want to make the inclusion of the doctype generic there's no pure XSTL solution - you can add it after the transform, but even that is not straightforward using pure java DOM elements ... here is the code:
    // assuming there's an inputDocument
    TransformerFactory factory = TransformerFactory.newInstance();
    StreamSource stylesheet = new StreamSource(stylesheetData);

    DOMSource source = new DOMSource(inputDocument);
    DOMResult result = new DOMResult();

    Transformer transformer = factory.newTransformer(stylesheet);
    transformer.transform(source, result);

    Document outputDocument = (Document)result.getNode();
    if (inputDocument.getDoctype() != null) {
        DocumentType inputDocType = inputDocument.getDoctype();
        
        // you can't importNode for DocumentType nodes but you can create 
        // them via DOMImplementation from there insert is easy
        DOMImplementation domImpl = outputDocument.getImplementation();
        DocumentType outputDocType = domImpl.createDocumentType(
              inputDocType.getName()
            , inputDocType.getPublicId()
            , inputDocType.getSystemId());
        outputDocument.insertBefore(outputDocType, outputDocument.getDocumentElement());
    } 


Notes:
It's interesting that this code is not allowed:
    DocumentType outputDocType = (DocumentType) outputDocument.importNode(inputDocType, true /*deep*/);

It seems DocumentType is a special element that can't be imported.

It's also interesting there's no method straight on the document:
    DocumentType outputDocType = outputDocument.createDocumentType(...);

The only way to get this kind of element is to obtain the special DOMImplementation object via the Document.getImplementation() method.

Quirks with the implementation? I've looked at this issue too long already to bother to find out why :)