The data model used by XSLT is the XPath 3.0 and XQuery 3.0 data model (XDM), as defined in [XDM 3.0]. XSLT operates on source, result and stylesheet documents using the same data model.
This section elaborates on some particular features of XDM as it is used by XSLT:
The rules in 4.3 Stripping Whitespace from the Stylesheet and 4.4.2 Stripping Whitespace from a Source Tree make use of the concept of a whitespace text node.
[Definition: A whitespace text node is a text node whose content consists entirely of whitespace characters (that is, #x09, #x0A, #x0D, or #x20).]
Note:
Features of a source XML document that are not represented in the XDM tree will have no effect on the operation of an XSLT stylesheet. Examples of such features are entity references, CDATA sections, character references, whitespace within element tags, and the choice of single or double quotes around attribute values.
The XDM data model defined in [XDM 3.0] is capable of representing either an XML 1.0 document (conforming to [XML 1.0] and [Namespaces in XML]) or an XML 1.1 document (conforming to [XML 1.1] and [Namespaces in XML 1.1]), and it makes no distinction between the two. In principle, therefore, XSLT 3.0 can be used with either of these XML versions.
Construction of the XDM tree is outside the scope of this specification, so XSLT 3.0 places no formal requirements on an XSLT processor to accept input from either XML 1.0 documents or XML 1.1 documents or both. This specification does define a serialization capability (see 26 Serialization), though from a conformance point of view it is an optional feature. Although facilities are described for serializing the XDM tree as either XML 1.0 or XML 1.1 (and controlling the choice), there is again no formal requirement on an XSLT processor to support either or both of these XML versions as serialization targets.
Because the XDM tree is the same whether the original document was XML 1.0 or XML 1.1, the semantics of XSLT processing do not depend on the version of XML used by the original document. There is no reason in principle why all the input and output documents used in a single transformation must conform to the same version of XML.
Some of the syntactic constructs in XSLT 3.0 and XPath 3.0, for example the productions CharXML and NCNameNames, are defined by reference to the XML and XML Namespaces specifications. There are slight variations between the XML 1.0 and XML 1.1 versions of these productions (and, indeed, between different editions of XML 1.0). Implementations may support any version; it is recommended that an XSLT 3.0 processor that implements the 1.1 versions should also provide a mode that supports the 1.0 versions. It is thus implementation-defined which versions and editions of XML and XML Namespaces are supported by the implementation.
Note:
The specification referenced as [Namespaces in XML] was actually published without a version number.
The current version of [XML Schema 1.1 Part 2] references the XML 1.1 specifications, but the previous version ([XML Schema Part 2]) (that is, XSD 1.0) remains in widespread use, and only
references XML 1.0. With processors lacking support for XSD 1.1,
therefore, datatypes such as xs:NCName
and xs:ID
may be
constrained by the XML 1.0 rules, and not allow the full range of values permitted
by
XML 1.1. It is recommended that implementers wishing to support
XML 1.1 should consult [XML Schema 1.0 and XML 1.1] for guidance.
XSLT 3.0 requires a processor to support XDM 3.0 as defined in [XDM 3.0], augmented with support for maps as described in 21 Maps.
A processor may also provide a user option to support XDM 3.1 as defined in [XDM 3.1], in which case it must do so as defined in 27.7 XPath 3.1 Feature.
Note:
The essential differences between XDM 3.0 (with the extensions defined in this
specification) and XDM 3.1 are that XDM 3.1 adds support for arrays, and for the
xs:numeric
union type.
A processor may also provide a user option to support versions of XDM later than 3.1, in which case the way it does so is implementation-defined.
The tree representing the stylesheet is preprocessed as follows:
All comments and processing instructions are removed.
Any text nodes that are now adjacent to each other are merged.
Any whitespace text node that satisfies both the following conditions is removed from the tree:
The parent of the text node is not an xsl:text
element
The text node does not have an ancestor element that has an
xml:space
attribute with a value of
preserve
, unless there is a closer ancestor element having
an xml:space
attribute with a value of
default
.
Any whitespace text node whose
parent is one of the following elements is removed from the tree, regardless of
any xml:space
attributes:
xsl:accumulator
xsl:analyze-string
xsl:apply-imports
xsl:apply-templates
xsl:attribute-set
xsl:call-template
xsl:character-map
xsl:choose
xsl:evaluate
xsl:fork
xsl:merge
xsl:merge-source
xsl:mode
xsl:next-iteration
xsl:next-match
xsl:override
xsl:package
xsl:stylesheet
xsl:transform
xsl:use-package
Any whitespace text node whose
immediate following-sibling node is an xsl:param
or
xsl:sort
or xsl:context-item
or
xsl:on-completion
element is removed from the
tree, regardless of any xml:space
attributes.
Any whitespace text
node whose immediate preceding-sibling node is an
xsl:catch
element is removed from the
tree, regardless of any xml:space
attributes.
[ERR XTSE0260] Within an XSLT element that is
required to be empty, any content other than comments or
processing instructions, including any whitespace text node preserved using the
xml:space="preserve"
attribute, is a static error.
Note:
Using xml:space="preserve"
in parts of the stylesheet that contain
sequence constructors will
cause whitespace text nodes in that part of the
stylesheet to be copied to the result of the sequence constructor.
When the result of the sequence constructor is used to form the content of an
element, this can cause errors if such text nodes are followed by attribute nodes
generated using xsl:attribute
.
Note:
If an xml:space
attribute is specified on a literal result element, it will be
copied to the result tree in the same way as any other attribute.
Source documents supplied as input to a transformation may be subject to preprocessing. Two kinds of preprocessing are defined: stripping of type annotations (see 4.4.1 Stripping Type Annotations from a Source Tree), and stripping of whitespace text nodes (see 4.4.2 Stripping Whitespace from a Source Tree).
Stripping of type annotations happens before stripping of whitespace text nodes.
The source documents to which this applies are as follows:
The document containing the global context item if it is a node;
Any documents containing a node present in the initial match selection;
Any document containing a node that is returned by the functions document
, doc
FO30,
or collection
FO30;
Any document read using xsl:source-document
.
Note:
This list excludes documents passed as the values of
stylesheet parameters or parameters
of the initial named template or initial function,
trees created by functions such as parse-xml
FO30, parse-xml-fragment
,
analyze-string
FO30, or json-to-xml
,
nor values returned from extension
functions.
If a node other than a document node is supplied (for example as the global context
item),
then the preprocessing is applied to the entire document containing that node. If
several nodes within the same
document are supplied (for example as nodes in the initial match selection, or as
nodes returned by the
collection
FO30 function), then the preprocessing is only applied to that document once.
If a whitespace text node is supplied (for example as the global context item) and
the rules cause this node
to be stripped from its containing tree, then the behavior is as if this node had
not been supplied
(which may cause an error, for example if a global context item is required.)
The rules determining whether or not stripping of annotations and/or whitespace
happens are defined at the level of a package. Declarations within a library package
only affect the handling of documents loaded using a call on the document
, doc
FO30,
or collection
FO30 functions or an evaluation of an xsl:source-document
instruction
appearing lexically within the same package. Declarations within the top-level package also affect the processing
of the global context item and the initial match selection.
The semantics of the document
, doc
FO30,
and collection
FO30 functions are formally defined in terms of mappings from URIs to document nodes
maintained within the dynamic context (see 5.3.3 Initializing the Dynamic Context). The effect of the
declarations that control stripping of type annotations
and whitespace is therefore to modify this mapping (so it now maps the URI to a stripped
document). The modification
applies to the dynamic context for calls to these function appearing within a particular
package; each package therefore
has a different set of mappings. This means that when two calls to the doc
FO30 function appear in
different packages, specifying the same absolute URI, then in general different documents
are returned. An implementation
may return the same document for two such calls if it is able to determine that the effect
of the annotation
and whitespace stripping rules in both packages is the same.
The effect of dynamic calls to the document
, doc
FO30,
and collection
FO30 functions is defined in the same way as for other functions with dependencies on
the dynamic context. As described in 5.3.4 Additional Dynamic Context Components used by XSLT, named function references
(such as doc#1
) and calls on function-lookup
FO30 (for example, function-lookup("doc", 1)
)
are defined to retain the XPath static and dynamic context at the point of invocation
as part of the closure of the
resulting function item, and to use this preserved context when a dynamic function
call is subsequently made using the function item.
[Definition: The term type
annotation is used in this specification to refer to the value returned
by the dm:type-name
accessor of a node: see Section
5.14 type-name Accessor
DM30.]
There is sometimes a requirement to write stylesheets that produce the same results
whether or not the source documents have been validated against a schema. To achieve
this, an option is provided to remove any type
annotations on element and attribute nodes in a source tree, replacing them with an annotation of
xs:untyped
in the case of element nodes, and
xs:untypedAtomic
in the case of attribute nodes.
Such stripping of type annotations can be
requested by specifying input-type-annotations="strip"
on the xsl:package
element. This
attribute has three permitted values: strip
, preserve
, and
unspecified
. The default value is unspecified
.
The input-type-annotations
attribute may also
be specified on the xsl:stylesheet
element; if it is specified at
this level then it must be consistent for all stylesheet modules within the same
package.
[ERR XTSE0265] It is a static error if there is a
stylesheet module in a
package that specifies
input-type-annotations="strip"
and another stylesheet module that specifies
input-type-annotations="preserve"
, or if a stylesheet module specifies the value
strip
or preserve
and the same value is not
specified on the xsl:package
element of the containing
package.
When type annotations are stripped, the following changes are made to the source tree:
The type annotation of every element node is changed to xs:untyped
The type annotation of every attribute node is changed to
xs:untypedAtomic
The typed value of every element and attribute node is set to be the same as
its string value, as an instance of xs:untypedAtomic
.
The is-nilled
property of every element node is set to
false
.
The values of the is-id
and is-idrefs
properties are not
changed.
Note:
Stripping type annotations does not
necessarily return the document to the state it would be in had validation not
taken place. In particular, any defaulted elements and attributes that were added
to the tree by the validation process will still be present, and elements and
attributes validated as IDs will still be accessible using the
id
FO30 function.
A source tree supplied as input to the transformation process may contain whitespace text nodes that are of no interest, and that do not need to be retained by the transformation. Conceptually, an XSLT processor makes a copy of the source tree from which unwanted whitespace text nodes have been removed. This process is referred to as whitespace stripping.
The stripping process takes as input a set of element names whose child whitespace text nodes are to be preserved.
The way in which this set of element names is established using the
xsl:strip-space
and xsl:preserve-space
declarations is described later in this section.
The stripping process that applies for a particular
package is determined by the xsl:strip-space
and xsl:preserve-space
declarations within that package.
A whitespace text node is preserved if either of the following apply:
The element name of the parent of the text node is in the set of whitespace-preserving element names.
An ancestor element of the text node has an xml:space
attribute
with a value of preserve
, and no closer ancestor element has
xml:space
with a value of default
.
Otherwise, the whitespace text node is stripped.
The xml:space
attributes are not removed from the tree.
<!-- Category: declaration -->
<xsl:strip-space
elements = tokens />
<!-- Category: declaration -->
<xsl:preserve-space
elements = tokens />
The set of whitespace-preserving element names is specified by
xsl:strip-space
and xsl:preserve-space
declarations. Whether an element name is
included in the set of whitespace-preserving names is determined by the best match
among all the xsl:strip-space
or xsl:preserve-space
declarations: it is included if and only if there is no match or the best match is
an
xsl:preserve-space
element. The xsl:strip-space
and xsl:preserve-space
elements each have an elements
attribute whose value is a whitespace-separated list of NameTestsXP30; an element name matches an
xsl:strip-space
or xsl:preserve-space
element if
it matches one of the NameTestsXP30.
An element matches a NameTestXP30 if
and only if the NameTestXP30 would be
true for the element as an XPath node test.
[ERR XTSE0270] It is a static error if within any package the same NameTestXP30 appears in both an
xsl:strip-space
and an xsl:preserve-space
declaration if both have the same import
precedence. Two NameTests are considered the same if they match
the same set of names (which can be determined by comparing them after
expanding namespace prefixes to URIs).
Otherwise, when more than one
xsl:strip-space
and xsl:preserve-space
element
within the relevant package matches, the best matching element is determined by
the best matching NameTestXP30.
The rules are similar to those for
template rules:
First, any match with lower import precedence than another match is ignored.
Next, any match that has a lower default priority than the default priority of another match is ignored.
If several matches have the same default priority (which can only happen
if one of the NameTests takes the form *:local
and the other takes
the form prefix:*
), then the declaration that appears last in
declaration order is used.
If an element in a source document has a type
annotation that is a simple type or a complex type with simple content,
then any whitespace text nodes among its children are preserved, regardless of any
xsl:strip-space
declarations. The reason for this is that
stripping a whitespace text node from an element with simple content could make the
element invalid: for example, it could cause the minLength
facet to be
violated.
Stripping of type annotations happens
before stripping of whitespace text nodes, so this situation will not occur if
input-type-annotations="strip"
is specified.
Note:
In [XDM 3.0], processes are described for constructing an
XDM tree from an Infoset or from a PSVI. Those processes deal with whitespace
according to their own rules, and the provisions in this section apply to the
resulting tree. In practice this means that elements that are defined in a DTD or
a Schema to contain element-only content will have whitespace text nodes stripped,
regardless of the xsl:strip-space
and
xsl:preserve-space
declarations in the stylesheet.
However, source trees are not necessarily constructed using those processes; indeed, they are not necessarily constructed by parsing XML documents. Nothing in the XSLT specification constrains how the source tree is constructed, or what happens to whitespace text nodes during its construction. The provisions in this section relate only to whitespace text nodes that are present in the tree supplied as input to the XSLT processor. The XSLT processor cannot preserve whitespace text nodes unless they were actually present in the supplied tree.
The mapping from the Infoset to the XDM data model, described in [XDM 3.0], does not retain attribute types. This means, for
example, that an attribute described in the DTD as having attribute type
NMTOKENS
will be annotated in the XDM tree as
xs:untypedAtomic
rather than xs:NMTOKENS
, and its typed
value will consist of a single xs:untypedAtomic
value rather than a
sequence of xs:NMTOKEN
values.
Attributes with a DTD-derived type of ID, IDREF, or IDREFS will be marked in the XDM
tree as having the is-id
or is-idrefs
properties. It is
these properties, rather than any type
annotation, that are examined by the functions id
FO30
and idref
FO30 described in [Functions and Operators 3.0].
The data model for nodes in a document that is being streamed is no different from the standard XDM data model, in that it contains the same objects (nodes) with the same properties and relationships. The facilities for streaming do not change the data model; instead they impose rules that limit the ability of stylesheets to navigate the data model.
A useful way to visualize streaming is to suppose that at any point in time, there is a current position in the streamed input document which may be the start or end of the document, the start or end tag of an element, or a text, comment, or processing instruction node. From this position, the stylesheet has access to the following information:
Properties intrinsic to the node, such as its name, its base URI, its type
annotation, and its is-id
and is-idref
properties.
The ancestors of the node (but navigation downwards from the ancestors is not permitted).
The attributes of the node, and the attributes of its ancestors. For each such attribute, all the properties of the node including its string value and typed value are available, but there are limitations that restrict navigation from the attribute node to other nodes in the document.
The in-scope namespace bindings of the node.
In the case of attributes, text nodes, comments, and processing instructions, the string value and typed value of the node.
In the case of element nodes, whether or not the element has children. This
information is obtained by calling the has-children
FO30
function. This implies that the processor performs look-ahead (limited to a
single token) to determine whether the start tag is immediately followed by a
matching end tag.
In the case of document nodes, details of unparsed entities in the document.
This information is obtained by calling the
unparsed-entity-uri
and
unparsed-entity-public-id
functions. A processor might
enable this by reading the DTD as soon as the document is opened. Since
comments and processing instructions that precede the DOCTYPE declaration are
available as children of the document node, this also implies that a streaming
processor needs sufficient memory to hold these comments and processing
instructions until the start tag of the first element is encountered.
Information about unparsed entities remains available for the duration of
processing, in the same way as attributes of ancestor elements.
The children and other descendants of a node are not accessible except as a by-product of changing the current position in the document. The same applies to properties of an element or document node that require examination of the node’s descendants, that is, the string value and typed value. This is enforced by means of a rule that only one expression requiring downward navigation from a node is permitted.
Information about the type of a node is in general
considered a property intrinsic to the node, and is available without advancing the
input stream. There is an exception for an expression of the form (/) instance
of document-node(element(invoice))
. This is not guaranteed streamable,
because it requires reading ahead to check that the document node has only one
element child. However, a processor that knows that the parser delivering the
document stream is only capable of delivering well-formed documents may use this
knowledge (along with the limited look-ahead needed to get the name of the outermost
element) to make this expression streamable.
A streaming processor is not required to read any more of the source document than is needed to generate correct stylesheet output. It is not required to read the full source document merely in order to satisfy the requirement imposed by the XML Recommendation that an XML Processor must report violations of well-formedness in the input.
More detailed rules are defined in 19 Streamability.
Two new data structures have been added to the data model: maps and arrays. Both are defined in XPath 3.1, but maps are also available in XSLT processors that only support XPath 3.0 (see 21 Maps).
Streaming facilities in this specification are, for the most part, relevant only to streamed processing of XML trees, and not to other structures such as sequences, maps and arrays, which will typically be held in memory unless the processor is capable of avoiding this.
Maps, however, play in important role in enabling streamed applications
to be written. For example, a map can be used as the data structure maintained
by an accumulator (see 18.2 Accumulators) to remember information
that has been retrieved from a streamed document, given that it is not possible to
revisit the same nodes later. There is also a special streamability rule for
map constructor expressions (see 21.6 Maps and Streaming) that allows
such an expression to make multiple downward selections in the streamed input
document: for example one can write map{'authors':data(author), 'editors':data(editor)}
,
which gathers the values of these these two elements, or sets of elements, from the
input
stream, regardless what order they appear in — even if they are interleaved.
The rules for creating maps and arrays are designed to ensure that the entries in a map, and the members of an array, cannot contain nodes from a streamed document. This is achieved by the way in which the streamability properties of the relevant expressions and functions are defined.
By contrast, sequences can and often do contain nodes from streamed documents, and a major purpose of the rules for streamability is to make this possible.
The XDM data model (see [XDM 3.0]) leaves it to the host language to define limits. This section describes the limits that apply to XSLT.
Limits on some primitive datatypes are defined in [XML Schema Part 2]. Other limits, listed below, are implementation-defined. Note that this does not necessarily mean that each limit must be a simple constant: it may vary depending on environmental factors such as available resources.
The following limits are implementation-defined:
For the xs:decimal
type, the maximum number of decimal digits (the
totalDigits
facet). This must be at least 18 digits. (Note,
however, that support for the full value range of xs:unsignedLong
requires 20 digits.)
For the types xs:date
, xs:time
,
xs:dateTime
, xs:gYear
, and
xs:gYearMonth
: the range of values of the year component, which
must be at least +0001 to +9999; and the maximum number of fractional second
digits, which must be at least 3.
For the xs:duration
type: the maximum absolute values of the
years, months, days, hours, minutes, and seconds components.
For the xs:yearMonthDuration
type: the maximum absolute value,
expressed as an integer number of months.
For the xs:dayTimeDuration
type: the maximum absolute value,
expressed as a decimal number of seconds.
For the types xs:string
, xs:hexBinary
,
xs:base64Binary
, xs:QName
, xs:anyURI
,
xs:NOTATION
, and types derived from them: the maximum length of
the value.
For sequences, the maximum number of items in a sequence.
For backwards compatibility reasons, XSLT 3.0
continues to support the disable-output-escaping
feature introduced in
XSLT 1.0. This is an optional feature and implementations are not
required to support it. A new facility, that of named character maps (see 26.1 Character Maps) was introduced in XSLT 2.0. It provides
similar capabilities to disable-output-escaping
, but without distorting
the data model.
If an implementation supports the
disable-output-escaping
attribute of xsl:text
and
xsl:value-of
, (see 26.2 Disabling Output Escaping),
then the data model for trees constructed by the processor is augmented with a boolean value representing the value of
this property. This boolean value, however, can be set only within a final result tree that is being passed to the
serializer.
Conceptually, each character in a text node on such a result tree has a boolean
property indicating whether the serializer is to disable the normal rules for
escaping of special characters (for example, outputting of &
as
&
) in respect of this character.
Note:
In practice, the nodes in a final result
tree will often be streamed directly from the XSLT processor to the
serializer. In such an implementation, disable-output-escaping
can be
viewed not so much a property stored with nodes in the tree, but rather as
additional information passed across the interface between the XSLT processor and
the serializer.