26 Serialization

A processor may output a final result tree as a sequence of octets, although it is not required to be able to do so (see 27 Conformance). Stylesheet authors can use xsl:output declarations to specify how they wish result trees to be serialized. If a processor serializes a final result tree, it must do so as specified by these declarations.

The rules governing the output of the serializer are defined in [XSLT and XQuery Serialization]. The serialization is controlled using a number of serialization parameters. The values of these serialization parameters may be set within the stylesheet, using the xsl:output, xsl:result-document, and xsl:character-map declarations.

<!-- Category: declaration -->
<xsl:output
  name? = eqname
  method? = "xml" | "html" | "xhtml" | "text" | "json" | "adaptive" | eqname
  allow-duplicate-names? = boolean
  build-tree? = boolean
  byte-order-mark? = boolean
  cdata-section-elements? = eqnames
  doctype-public? = string
  doctype-system? = string
  encoding? = string
  escape-uri-attributes? = boolean
  html-version? = decimal
  include-content-type? = boolean
  indent? = boolean
  item-separator? = string
  json-node-output-method? = "xml" | "html" | "xhtml" | "text" | eqname
  media-type? = string
  normalization-form? = "NFC" | "NFD" | "NFKC" | "NFKD" | "fully-normalized" | "none" | nmtoken
  omit-xml-declaration? = boolean
  parameter-document? = uri
  standalone? = boolean | "omit"
  suppress-indentation? = eqnames
  undeclare-prefixes? = boolean
  use-character-maps? = eqnames
  version? = nmtoken />

The xsl:output declaration is optional; if used, it must always appear as a top-level element within a stylesheet module.

A stylesheet may contain multiple xsl:output declarations and may include or import stylesheet modules that also contain xsl:output declarations. The name of an xsl:output declaration is the value of its name attribute, if any.

[Definition: All the xsl:output declarations within a package that share the same name are grouped into a named output definition; those that have no name are grouped into a single unnamed output definition.]

An output definition is scoped to a package. If this is a library package the output definition applies only to xsl:result-document instructions within the same package. If it is the top-level package, the output definition applies to xsl:result-document instructions within the same package and also to the implicit final result tree.

A stylesheet always includes an unnamed output definition; in the absence of an unnamed xsl:output declaration, the unnamed output definition is equivalent to the one that would be used if the stylesheet contained an xsl:output declaration having no attributes.

A named output definition is used when its name matches the format attribute used in an xsl:result-document element. The unnamed output definition is used when an xsl:result-document element omits the format attribute. It is also used when serializing the principal result. .

All the xsl:output elements making up an output definition are effectively merged. For those attributes whose values are namespace-sensitive, the merging is done after lexical QNames have been converted into expanded QNames. For the cdata-section-elements and suppress-indentation attributes, the output definition uses the union of the values from all the constituent xsl:output declarations. For the use-character-maps attribute, the output definition uses the concatenation of the sequences of expanded QNames values from all the constituent xsl:output declarations, taking them in order of increasing import precedence, or where several have the same import precedence, in declaration order. For other attributes, the output definition uses the value of that attribute from the xsl:output declaration with the highest import precedence.

The parameter-document attribute allows serialization parameters to be supplied in an external document. The external document must contain an output:serialization-parameters element with the format described in Section 3.1 Setting Serialization Parameters by Means of a Data Model Instance SER30, and the parameters are interpreted as described in that specification.

If present, the URI supplied in the parameter-document attribute is dereferenced, after resolution against the base URI of the xsl:output element if it is a relative reference. The parameter document should be read during static analysis of the stylesheet. A serialization error occurs if the result of dereferencing the URI is ill-formed or invalid; but if no document can be found at the specified location, the attribute should be ignored.

A serialization parameter specified in the parameter-document takes precedence over a value supplied directly in the output declaration, except that the values of the cdata-section-elements and suppress-indentation attributes are merged in the same way as when multiple xsl:output declarations are merged.

[ERR XTSE1560] It is a static error if two xsl:output declarations within an output definition specify explicit values for the same attribute (other than cdata-section-elements, suppress-indentation, and use-character-maps), with the values of the attributes being not equal, unless there is another xsl:output declaration within the same output definition that has higher import precedence and that specifies an explicit value for the same attribute.

The build-tree attribute controls whether the raw principal result or secondary result is converted to a final result tree. The default depends on the value of the method attribute: the default is yes if the method attribute specifies xml, html, xhtml, or text, or if it is omitted; the default is no if the method attribute specifies json or adaptive. A final result tree may be constructed whether or not it is subsequently serialized.

Note:

The default for build-tree may differ for user-defined serialization methods or for serialization methods introduced in future versions of this specification.

Unless the processor implements the XPath 3.1 Feature, the method values json and adaptive must be rejected as invalid, and the attributes allow-duplicate-names and json-node-output-method must be ignored. The meaning of these output methods and serialization parameters is defined in [XSLT and XQuery Serialization 3.1].

If none of the xsl:output declarations within an output definition specifies a value for a particular attribute, then the corresponding serialization parameter takes a default value. The default value depends on the chosen output method.

There are some serialization parameters that apply to some output methods but not to others. For example, the indent attribute has no effect on the text output method. If a value is supplied for an attribute that is inapplicable to the output method, its value is not passed to the serializer. The processor may validate the value of such an attribute, but is not required to do so.

An implementation may allow the attributes of the xsl:output declaration to be overridden, or the default values to be changed, using the API that controls the transformation.

The location to which final result trees are serialized (whether in filestore or elsewhere) is implementation-defined (which in practice may mean that it is controlled using an implementation-defined API). However, these locations must satisfy the constraint that when two final result trees are both created (implicitly or explicitly) using relative URI references in the href attribute of the xsl:result-document instruction, then these relative URI references may be used to construct references from one tree to the other, and such references must remain valid when both result trees are serialized.

The method attribute on the xsl:output element identifies the overall method that is to be used for outputting the final result tree.

[ERR XTSE1570] The value must (if present) be a valid EQName. If it is a lexical QName with no a prefix, then it identifies a method specified in [XSLT and XQuery Serialization] and must be one of xml, html, xhtml, or text. If it is a lexical QName with a prefix, then the lexical QName is expanded into an expanded QName as described in 5.1.1 Qualified Names; the expanded QName identifies the output method; the behavior in this case is not specified by this document.

The default for the method attribute depends on the contents of the tree being serialized, and is chosen as follows. If the document node of the final result tree has an element child, and any text nodes preceding the first element child of the document node of the result tree contain only whitespace characters, then:

In all other cases, the default output method is xml.

The default output method is used if the selected output definition does not include a method attribute.

The other attributes on xsl:output provide parameters for the output method. The following attributes are allowed:

If the processor performs serialization, then it must signal any serialization errors that occur. These have the same effect as dynamic errors: that is, the processor must signal the error and must not finish as if the transformation had been successful.

26.1 Character Maps

[Definition: A character map allows a specific character appearing in a text or attribute node in the final result tree to be substituted by a specified string of characters during serialization.] The effect of character maps is defined in [XSLT and XQuery Serialization].

The character map that is supplied as a parameter to the serializer is determined from the xsl:character-map elements referenced from the xsl:output declaration for the selected output definition.

The xsl:character-map element is a declaration that may appear as a child of the xsl:stylesheet element.

<!-- Category: declaration -->
<xsl:character-map
  name = eqname
  use-character-maps? = eqnames >
  <!-- Content: (xsl:output-character*) -->
</xsl:character-map>

The xsl:character-map declaration declares a character map with a name and a set of character mappings. The character mappings are specified by means of xsl:output-character elements contained either directly within the xsl:character-map element, or in further character maps referenced in the use-character-maps attribute.

The required name attribute provides a name for the character map. When a character map is used by an output definition or another character map, the character map with the highest import precedence is used.

The name of a character map is local to the package in which its declaration appears; it may be referenced only from within the same package.

[ERR XTSE1580] It is a static error if a package contains two or more character maps with the same name and the same import precedence, unless it also contains another character map with the same name and higher import precedence.

The optional use-character-maps attribute lists the names of further character maps that are included into this character map.

[ERR XTSE1590] It is a static error if a name in the use-character-maps attribute of the xsl:output or xsl:character-map elements does not match the name attribute of any xsl:character-map in the containing package.

[ERR XTSE1600] It is a static error if a character map references itself, directly or indirectly, via a name in the use-character-maps attribute.

It is not an error if the same character map is referenced more than once, directly or indirectly.

An output definition, after recursive expansion of character maps referenced via its use-character-maps attribute, may contain several mappings for the same character. In this situation, the last character mapping takes precedence. To establish the ordering, the following rules are used:

The xsl:output-character element is defined as follows:

<xsl:output-character
  character = char
  string = string />

The character map that is passed as a parameter to the serializer contains a mapping for the character specified in the character attribute to the string specified in the string attribute.

Character mapping is not applied to characters for which output escaping has been disabled as described in 26.2 Disabling Output Escaping.

If a character is mapped, then it is not subjected to XML or HTML escaping.

Example: Using Character Maps to Generate Non-XML Output

Character maps can be useful when producing serialized output in a format that resembles, but is not strictly conformant to, HTML or XML. For example, when the output is a JSP page, there might be a need to generate the output:

<jsp:setProperty name="user" property="id" value='<%= "id" + idValue %>'/>

Although this output is not well-formed XML or HTML, it is valid in Java Server Pages. This can be achieved by allocating three Unicode characters (which are not needed for any other purpose) to represent the strings <%, %>, and ", for example:

<xsl:character-map name="jsp">
  <xsl:output-character character="«" string="&lt;%"/>   
  <xsl:output-character character="»" string="%&gt;"/>
  <xsl:output-character character="§" string='"'/>
</xsl:character-map>

When this character map is referenced in the xsl:output declaration, the required output can be produced by writing the following in the stylesheet:

<jsp:setProperty name="user" property="id" value='«= §id§ + idValue »'/>

This works on the assumption that when an apostrophe or quotation mark is generated as part of an attribute value by the use of character maps, the serializer will (where possible) use the other choice of delimiter around the attribute value.

 

Example: Constructing a Composite Character Map

The following example illustrates a composite character map constructed in a modular fashion:

<xsl:output name="htmlDoc" use-character-maps="htmlDoc"/>

<xsl:character-map name="htmlDoc"
  use-character-maps="html-chars doc-entities windows-format"/>
  
<xsl:character-map name="html-chars"
  use-character-maps="latin1 ..."/>

<xsl:character-map name="latin1">
  <xsl:output-character character="&#160;" string="&amp;nbsp;"/>
  <xsl:output-character character="&#161;" string="&amp;iexcl;"/>
  ...
</xsl:character-map>

<xsl:character-map name="doc-entities">
  <xsl:output-character character="&#xE400;" string="&amp;t-and-c;"/>
  <xsl:output-character character="&#xE401;" string="&amp;chap1;"/>
  <xsl:output-character character="&#xE402;" string="&amp;chap2;"/>
  ...
</xsl:character-map>

<xsl:character-map name="windows-format">
  <!-- newlines as CRLF -->
  <xsl:output-character character="&#xA;" string="&#xD;&#xA;"/>

  <!-- tabs as three spaces -->
  <xsl:output-character character="&#x9;" string="   "/>

  <!-- images for special characters -->
  <xsl:output-character character="&#xF001;"
    string="&lt;img src='special1.gif' /&gt;"/>
  <xsl:output-character character="&#xF002;"
    string="&lt;img src='special2.gif' /&gt;"/>
  ...
</xsl:character-map>

Note:

When character maps are used, there is no guarantee that the serialized output will be well-formed XML (or HTML). Furthermore, the fact that the result tree was validated against a schema gives no guarantee that the serialized output will still be valid against the same schema. Conversely, it is possible to use character maps to produce schema-valid output from a result tree that would fail validation.

26.2 Disabling Output Escaping

Normally, when using the XML, HTML, or XHTML output method, the serializer will escape special characters such as & and < when outputting text nodes. This ensures that the output is well-formed. However, it is sometimes convenient to be able to produce output that is almost, but not quite well-formed XML; for example, the output may include ill-formed sections which are intended to be transformed into well-formed XML by a subsequent non-XML-aware process. For this reason, XSLT defines a mechanism for disabling output escaping.

This feature is deprecated.

This is an optional feature: it is not required that an XSLT processor that implements the serialization option should offer the ability to disable output escaping, and there is no conformance level that requires this feature.

This feature that the serializer (described in [XSLT and XQuery Serialization]) be extended as follows. Conceptually, the final result tree provides an additional boolean property disable-escaping associated with every character in a text node. When this property is set, the normal action of the serializer to escape special characters such as & and < is suppressed.

An xsl:value-of or xsl:text element may have a disable-output-escaping attribute; the allowed values are yes or no. The default is no; if the value is yes, then every character in the text node generated by evaluating the xsl:value-of or xsl:text element should have the disable-escaping property set.

Example: Disable Output Escaping

For example,

<xsl:text disable-output-escaping="yes">&lt;</xsl:text>

should generate the single character <.

If output escaping is disabled for an xsl:value-of or xsl:text instruction evaluated when temporary output state is in effect, the request to disable output escaping is ignored.

Similarly, if an xsl:value-of or xsl:text instruction specifies that output escaping is to be disabled when writing to a final result tree that is not being serialized, the request to disable output escaping is ignored.

Note:

Furthermore, a request to disable output escaping has no effect when the newly constructed text node is used to form the value of an attribute, comment, processing instruction, or namespace node. This is because the rules for constructing such nodes (see 5.7.2 Constructing Simple Content) cause the text node to be atomized, and the process of atomizing a text node takes no account of the disable-escaping property.

If output escaping is disabled for text within an element that would normally be output using a CDATA section, because the element is listed in the cdata-section-elements, then the relevant text will not be included in a CDATA section. In effect, CDATA is treated as an alternative escaping mechanism, which is disabled by the disable-output-escaping option.

Example: Interaction of Output Escaping and CDATA

For example, if <xsl:output cdata-section-elements="title"/> is specified, then the following instructions:

<title>
  <xsl:text disable-output-escaping="yes">This is not &lt;hr/&gt; 
                                          good coding practice</xsl:text>
</title>

should generate the output:

<title><![CDATA[This is not ]]><hr/><![CDATA[ good coding practice]]></title>

The disable-output-escaping attribute may be used with the html output method as well as with the xml output method. The text output method ignores the disable-output-escaping attribute, since this method does not perform any output escaping.

A processor will only be able to disable output escaping if it controls how the final result tree is output. This might not always be the case. For example, the result tree might be used as a source tree for another XSLT transformation instead of being output. It is implementation-defined whether (and under what circumstances) disabling output escaping is supported. If disabling output escaping is not supported, any request to disable output escaping is ignored.

If output escaping is disabled for a character that is not representable in the encoding that the processor is using for output, the request to disable output escaping is ignored in respect of that character.

Since disabling output escaping might not work with all implementations and can result in XML that is not well-formed, it should be used only when there is no alternative.

Note:

When disable-output-escaping is used, there is no guarantee that the serialized output will be well-formed XML (or HTML). Furthermore, the fact that the result tree was validated against a schema gives no guarantee that the serialized output will still be valid against the same schema. Conversely, it is possible to use disable-output-escaping to produce schema-valid output from a result tree that would fail validation.

Note:

The facility to define character maps for use during serialization, as described in 26.1 Character Maps, has been produced as an alternative mechanism that can be used in many situations where disabling of output escaping was previously necessary, without the same difficulties.