2 Concepts

2.1 Terminology

For a full glossary of terms, see C Glossary.

[Definition: The software responsible for transforming source trees into result trees using an XSLT stylesheet is referred to as the processor. This is sometimes expanded to XSLT processor to avoid any confusion with other processors, for example an XML processor.]

[Definition: A specific product that performs the functions of an XSLT processor is referred to as an implementation.]

[Definition: The term tree is used (as in [XDM 3.0]) to refer to the aggregate consisting of a parentless node together with all its descendant nodes, plus all their attributes and namespaces.]

Note:

The use of the term tree in this document does not imply the use of a data structure in memory that holds the entire contents of the document at one time. It implies rather a logical view of the XML input and output in which elements have a hierarchic relationship to each other. When a source document is being processed in a streaming manner, access to the nodes in this tree is constrained, but it is still viewed and described as a tree.

The output of a transformation consists of the following:

  1. [Definition: A principal result: this can be any sequence of items (as defined in [XDM 3.0]).] The principal result is the value returned by the function or template in the stylesheet that is nominated as the entry point, as described in 2.3 Initiating a Transformation.

  2. [Definition: Zero or more secondary results: each secondary result can be any sequence of items (as defined in [XDM 3.0]).] A secondary result is the value returned by evaluating the body of an xsl:result-document instruction.

  3. Zero or more messages. Messages are generated by the xsl:message and xsl:assert instructions, and are described in 23.1 Messages and 23.2 Assertions.

  4. Static or dynamic errors: see 2.14 Error Handling.

The principal result and the secondary results may be post-processed as described in 2.3.6 Post-processing the Raw Result.

[Definition: The term result tree is used to refer to any tree constructed by instructions in the stylesheet. A result tree is either a final result tree or a temporary tree.]

[Definition: A final result tree is a result tree that forms part of the output of a transformation: specifically, a tree built by post-processing the items in the principal result or in a secondary result. Once created, the contents of a final result tree are not accessible within the stylesheet itself.] Any final result tree may be serialized as described in 26 Serialization.

[Definition: The term source tree means any tree provided as input to the transformation. This includes the document containing the global context item if any, documents containing nodes present in the initial match selection, documents containing nodes supplied as the values of stylesheet parameters, documents obtained from the results of functions such as document, docFO30, and collectionFO30, documents read using the xsl:source-document instruction, and documents returned by extension functions or extension instructions. In the context of a particular XSLT instruction, the term source tree means any tree provided as input to that instruction; this may be a source tree of the transformation as a whole, or it may be a temporary tree produced during the course of the transformation.]

[Definition: The term temporary tree means any tree that is neither a source tree nor a final result tree.] Temporary trees are used to hold intermediate results during the execution of the transformation.

The use of the term “tree” in phrases such as source tree, result tree, and temporary tree is not confined to documents that the processor materializes in memory in their entirety. The processor may, and in some cases must, use streaming techniques to limit the amount of memory used to hold source and result documents. When streaming is used, the nodes of the tree may never all be in memory at the same time, but at an abstract level the information is still modeled as a tree of nodes, and the document is therefore still described as a tree. Unless otherwise stated, the term “tree” refers to a tree rooted at a parentless node: that is, the term does not include subtrees of larger trees. Every node therefore belongs to exactly one tree.

In this specification the phrases must, must not, should, should not, may, required, and recommended, when used in normative text and rendered in capitals, are to be interpreted as described in [RFC2119].

Where the phrase must, must not, or required relates to the behavior of the XSLT processor, then an implementation is not conformant unless it behaves as specified, subject to the more detailed rules in 27 Conformance.

Where the phrase must, must not, or required relates to a stylesheet then the processor must enforce this constraint on stylesheets by reporting an error if the constraint is not satisfied.

Where the phrase should, should not, or recommended relates to a stylesheet then a processor may produce warning messages if the constraint is not satisfied, but must not treat this as an error.

[Definition: In this specification, the term implementation-defined refers to a feature where the implementation is allowed some flexibility, and where the choices made by the implementation must be described in documentation that accompanies any conformance claim.]

[Definition: The term implementation-dependent refers to a feature where the behavior may vary from one implementation to another, and where the vendor is not expected to provide a full specification of the behavior.] (This might apply, for example, to limits on the size of source documents that can be transformed.)

In all cases where this specification leaves the behavior implementation-defined or implementation-dependent, the implementation has the option of providing mechanisms that allow the user to influence the behavior.

A paragraph labeled as a Note or described as an example is non-normative.

Many terms used in this document are defined in the XPath specification [XPath 3.0] or the XDM specification [XDM 3.0]. Particular attention is drawn to the following:

2.2 Notation

[Definition: An XSLT element is an element in the XSLT namespace whose syntax and semantics are defined in this specification.] For a non-normative list of XSLT elements, see D Element Syntax Summary.

In this document the specification of each XSLT element is preceded by a summary of its syntax in the form of a model for elements of that element type. A full list of all these specifications can be found in D Element Syntax Summary. The meaning of the syntax summary notation is as follows:

Example: Syntax Notation

This example illustrates the notation used to describe XSLT elements.

<!-- Category: instruction -->
<xsl:example-element
  select = expression
  debug? = boolean
  validation? = { "strict" | "lax" } >
  <!-- Content: ((xsl:variable | xsl:param)*, xsl:sequence) -->
</xsl:example-element>

This example defines a (non-existent) element xsl:example-element. The element is classified as an instruction. It takes the following attributes:

  1. A mandatory select attribute, whose value is an XPath expression

  2. An optional debug attribute, whose value must be yes, true, or 1 to indicate true, or no, false, or 0 to indicate false.

  3. An optional validation attribute, whose value must be strict or lax; the curly brackets indicate that the value can be defined as an attribute value template, allowing a value such as validation="{$val}", where the variable val is evaluated to yield "strict" or "lax" at run-time.

The content of an xsl:example-element instruction is defined to be a sequence of zero or more xsl:variable and xsl:param elements, followed by an xsl:sequence element.

[ERR XTSE0010] It is a static error if an XSLT-defined element is used in a context where it is not permitted, if a required attribute is omitted, or if the content of the element does not correspond to the content that is allowed for the element.

The rules in the element syntax summary (both for the element structure and for its attributes) apply to the stylesheet content after preprocessing as described in 3.13 Stylesheet Preprocessing.

Attributes are validated as follows. These rules apply to the value of the attribute after removing leading and trailing whitespace.

Special rules apply if the construct appears in part of the stylesheet that is processed with forwards compatible behavior: see 3.10 Forwards Compatible Processing.

[Definition: Some constructs defined in this specification are described as being deprecated. The use of this term implies that stylesheet authors should not use the construct, and that the construct may be removed in a later version of this specification.]

Note:

This specification includes a non-normative XML Schema for XSLT stylesheet modules (see H Schemas for XSLT 3.0 Stylesheets). The syntax summaries described in this section are normative.

XSLT defines a set of standard functions which are additional to those defined in [Functions and Operators 3.0]. A list of these functions appears in G.2 List of XSLT-defined functions. The signatures of these functions are described using the same notation as used in [Functions and Operators 3.0]. The names of many of these functions are in the standard function namespace.

2.3 Initiating a Transformation

This document does not specify any application programming interfaces or other interfaces for initiating a transformation. This section, however, describes the information that is supplied when a transformation is initiated. Except where otherwise indicated, the information is required.

The execution of a stylesheet necessarily involves two activities: static analysis and dynamic evaluation. Static analysis consists of those tasks that can be performed by inspection of the stylesheet alone, including the binding of static variables, the evaluation of [xsl:]use-when expressions (see 3.13.1 Conditional Element Inclusion), and shadow attributes (see 3.13.2 Shadow Attributes) and detection of static errors. Dynamic evaluation consists of tasks which in general cannot be carried out until a source document is available.

Dynamic evaluation is further divided into two activities: priming the stylesheet, and invoking a selected component.

Implementations may allow static analysis and dynamic evaluation to be initiated independently, so that the cost of static analysis can be amortized over multiple transformations using the same stylesheet. Implementations may also allow priming of a stylesheet and invocation of components to be initiated independently, in which case a single act of priming the stylesheet may be followed by a series of independent component invocations. Although this specification does not require such a separation, this section distinguishes information that is needed before static analysis can proceed, information that is needed to prime the stylesheet, and information that is needed when invoking components.

The language is designed to allow the static analysis of each package to be performed independently of other packages, with only basic knowledge of the properties of components made available by used packages. Beyond this, the specification leaves it to implementations to decide how to organize this process. When packages are not used explicitly, the entire stylesheet is treated as a single package.

2.3.1 Information needed for Static Analysis

The following information is needed prior to static analysis of a package:

Conceptually, the output of the static analysis of a package is an object which might be referred to (without constraining the implementation) as a compiled package. Prior to dynamic evaluation, all the compiled packages needed for execution must be checked for consistency, and component references must be resolved. This process may be referred to, again without constraining the implementation, as linking.

2.3.2 Priming a Stylesheet

The information needed when priming a stylesheet is as follows:

  • A set (possibly empty) of values for non-static stylesheet parameters (see 9.5 Global Variables and Parameters). These values are available for use within expressions in the stylesheet. As a minimum, values must be supplied for any parameters declared with the attribute required="yes".

    A supplied value is converted if necessary to the declared type of the stylesheet parameter using the function conversion rules.

    Note:

    Non-static stylesheet parameters are implicitly public, which ensures that all the parameters in the stylesheet for which values can be supplied externally have distinct names. Static parameters, by contrast, are local to a package.

  • [Definition: An item that acts as the global context item for the transformation. This item acts as the context item when evaluating the select expression or sequence constructor of a global variable declaration within the top-level package, as described in 5.3.3.1 Maintaining Position: the Focus. The global context item may also be available in a named template when the stylesheet is invoked as described in 2.3.4 Call-Template Invocation].

    Note:

    In previous releases of this specification, a single node was typically supplied to represent the source document for the transformation. This node was used as the target node for the implicit call on xsl:apply-templates used to start the transformation process (now called the initial match selection), and the root node of the containing tree was used as the context item for evaluation of global variables (now called the global context item). This relationship between the initial match selection and the global context item is likely to be found for compatibility reasons in a transformation API designed to work with earlier versions of this specification, but it is no longer a necessary relationship; the two values can in principle be completely independent of each other.

    Stylesheet authors wanting to write code that can be invoked using legacy APIs should not rely on the caller being able to supply different values for the initial match selection and the global context item.

    The value given to the global context item (and the values given to stylesheet parameters) cannot be nodes in a streamed document. This rule ensures that all global variables can freely navigate within the relevant tree, with no constraints imposed by the streamability rules.

    The global context item is potentially used when initializing global variables and parameters. If the initialization of any global variables or parameter depends on the context item, a dynamic error can occur if the context item is absent. It is implementation-defined whether this error occurs during priming of the stylesheet or subsequently when the variable is referenced; and it is implementation-defined whether the error occurs at all if the variable or parameter is never referenced. The error can be suppressed by use of xsl:try and xsl:catch within the sequence constructor used to initialize the variable or parameter. It cannot be suppressed by use of xsl:try around a reference to the global variable.

    In a library package, the context item, context position, and context size used for evaluation of global variables will be absent, and the evaluation of any expression that references these values will result in a dynamic error. This will also be the case in the top-level package if no global context item is supplied.

    Note:

    If a context item is available within a global variable declaration, then the context position and context size will always be 1 (one).

    Note:

    For maximum reusability of code, it is best to avoid use of the context item when initializing global variables and parameters. Instead, all external information should be supplied using named stylesheet parameters. Especially when these use namespaces to avoid conflicts, there is then no risk of confusion between the information supplied externally to different packages.

    When a stylesheet parameter is defined in a library package, it is possible for a using package to supply a value for the parameter by overriding the parameter declaration within an xsl:override element. If the using package is the top-level package then the overriding declaration can refer to the global context item.

  • A mechanism for obtaining a document node and a media type, given an absolute URI. The total set of available documents (modeled as a mapping from URIs to document nodes) forms part of the context for evaluating XPath expressions, specifically the docFO30 function. The XSLT document function additionally requires the media type of the resource representation, for use in interpreting any fragment identifier present within a URI Reference.

    Note:

    The set of documents that are available to the stylesheet is implementation-dependent, as is the processing that is carried out to construct a tree representing the resource retrieved using a given URI. Some possible ways of constructing a document (specifically, rules for constructing a document from an Infoset or from a PSVI) are described in [XDM 3.0].

Once a stylesheet is primed, the values of global variables remain stable through all component invocations. In addition, priming a stylesheet creates an execution scopeFO30 during which the dynamic context and all calls on deterministicFO30 functions remain stable; for example two calls on the current-dateTimeFO30 function within an execution scope are defined to return the same result.

Parameters passed to the transformation by the client application when a stylesheet is primed are matched against stylesheet parameters (see 9.5 Global Variables and Parameters), not against the template parameters of any template executed during the course of the transformation.

[ERR XTDE0050] It is a dynamic error if a stylesheet declares a visible stylesheet parameter that is explicitly or implicitly mandatory, and no value for this parameter is supplied when the stylesheet is primed. A stylesheet parameter is visible if it is not masked by another global variable or parameter with the same name and higher import precedence. If the parameter is a static parameter then the value must be supplied prior to the static analysis phase.

2.3.3 Apply-Templates Invocation

[Definition: A stylesheet may be evaluated by supplying a value to be processed, together with an initial mode. The value (which can be any sequence of items) is referred to as the initial match selection. The processing then corresponds to the effect of the xsl:apply-templates instruction.]

The initial match selection will often be a single document node, traditionally called the source document of the transformation; but in general, it can be any sequence. If the initial match selection is an empty sequence, the result of the transformation will be empty, since no template rules are evaluated.

Processing proceeds by finding the template rules that match the items in the initial match selection, and evaluating these template rules with a focus based on the initial match selection. The template rules are evaluated in final output state.

The following information is needed when dynamic evaluation is to start with a template rule:

  • The initial match selection. An API that chooses to maintain compatibility with previous versions of this specification should allow a method of invocation in which a singleton node is provided, which is then used in two ways: the node itself acts as the initial match selection, and the root node of the containing tree acts as the global context item.

  • Optionally, an initial mode.

    [Definition: The initial mode is the mode used to select template rules for processing items in the initial match selection when apply-templates invocation is used to initiate a transformation.]

    In searching for the template rule that best matches the items in the initial match selection, the processor considers only those rules that apply to the initial mode.

    If no initial mode is supplied explicitly, then the initial mode is that named in the default-mode attribute of the (explicit or implicit) xsl:package element of the top-level package or in the absence of such an attribute, the unnamed mode.

    [ERR XTDE0044] It is a dynamic error if the invocation of the stylesheet specifies an initial mode when no initial match selection is supplied (either explicitly, or defaulted to the global context item).

    A (named or unnamed) mode M is eligible as an initial mode if one of the following conditions applies, where P is the top-level package of the stylesheet:

    1. M is explicitly declared in an xsl:mode declaration within P, and has public or final visibility (either by virtue of its visibility attribute, or by virtue of an xsl:expose declaration).

    2. M is the unnamed mode.

    3. M is named in the default-mode attribute of the (explicit or implicit) xsl:package element of P.

    4. M is declared in a package used by P, and is given public or final visibility in P by means of an xsl:accept declaration.

    5. The effective value of the declared-modes attribute of the explicit or implicit xsl:package element of P is no, and M appears as a mode-name in the mode attribute of a template rule declared within P.

    [ERR XTDE0045] It is a dynamic error if the invocation of the stylesheet specifies an initial mode and the specified mode is not eligible as an initial mode (as defined above).

  • Parameters, which will be passed to the template rules used to process items in the input sequence. The parameters consist of two sets of (QName, value) pairs, one set for tunnel parameters and one for non-tunnel parameters, in which the QName identifies the name of a parameter and the value provides the value of the parameter. Either or both sets of parameters may be empty. The effect is the same as when a template is invoked using xsl:apply-templates with an xsl:with-param child specifying tunnel="yes" or tunnel="no" as appropriate. If a parameter is supplied that is not declared or used, the value is simply ignored. These parameters are not used to set stylesheet parameters.

    A supplied value is converted if necessary to the declared type of the template parameter using the function conversion rules.

  • Details of how the result of the initial template is to be returned. For details, see 2.3.6 Post-processing the Raw Result

The raw result of the invocation is the result of processing the supplied input sequence as if by a call on xsl:apply-templates in the specified mode: specifically, each item in the input sequence is processed by selecting and evaluating the best matching template rule, and converting the result (if necessary) to the type declared in the as attribute of that template using the function conversion rules; and the results of processing each item are then concatenated into a single sequence, respecting the order of items in the input sequence.

Note:

If the initial mode is declared-streamable, then a streaming processor should allow some or all of the items in the initial match selection to be nodes supplied in streamable form, and any nodes that are supplied in this form must then be processed using streaming.

Since the global context item cannot be a streamed node, in cases where the transformation is to proceed by applying streamable templates to a streamed input document, the global context item must either be absent, or must be something that differs from the initial match selection.

Note:

The design of the API for invoking a transformation should provide some means for users to designate the unnamed mode as the initial mode in cases where it is not the default mode.

It is a dynamic error [see ERR XTDE0700] if the template rule selected for processing any item in the initial match selection defines a template parameter that specifies required="yes" and no value is supplied for that parameter.

Note:

A stylesheet can process further source documents in addition to those supplied when the transformation is invoked. These additional documents can be loaded using the functions document (see 20.1 fn:document) or docFO30 or collectionFO30 (see [Functions and Operators 3.0]), or using the xsl:source-document instruction; alternatively, they can be supplied as stylesheet parameters (see 9.5 Global Variables and Parameters), or returned as the result of an extension function (see 24.1 Extension Functions).

2.3.4 Call-Template Invocation

[Definition: A stylesheet may be evaluated by selecting a named template to be evaluated; this is referred to as the initial named template.] The effect is analogous to the effect of executing an xsl:call-template instruction. The following information is needed in this case:

  • Optionally, the name of the initial named template which is to be executed as the entry point to the transformation. If no template name is supplied, the default template name is xsl:initial-template. The selected template must exist within the stylesheet.

  • Optionally, a context item for evaluation of this named template, defaulting to the global context item if it exists. This is constrained by any xsl:context-item element appearing within the selected xsl:template element. The initial named template is evaluated with a singleton focus based on this context item if it exists, or with an absent focus otherwise.

  • Parameters, which will be passed to the selected template rule. The parameters consist of two sets of (QName, value) pairs, one set for tunnel parameters and one for non-tunnel parameters, in which the QName identifies the name of a parameter and the value provides the value of the parameter. Either or both sets of parameters may be empty. The effect is the same as when a template is invoked using xsl:call-template with an xsl:with-param child specifying tunnel="yes" or tunnel="no" as appropriate. If a parameter is supplied that is not declared or used, the value is simply ignored. These parameters are not used to set stylesheet parameters.

    A supplied value is converted if necessary to the declared type of the template parameter using the function conversion rules.

  • Details of how the result of the initial named template is to be returned. For details, see 2.3.6 Post-processing the Raw Result

The raw result of the invocation is the result of evaluating the initial named template, after conversion of the result to the type declared in the as attribute of that template using the function conversion rules, if such conversion is necessary.

The initial named template is evaluated in final output state.

[ERR XTDE0040] It is a dynamic error if the invocation of the stylesheet specifies a template name that does not match the expanded QName of a named template defined in the stylesheet, whose visibility is public or final.

It is a dynamic error [see ERR XTDE0700] if the initial named template, or any of the template rules invoked to process items in the initial match selection, defines a template parameter that specifies required="yes" and no value is supplied for that parameter.

2.3.5 Function Call Invocation

[Definition: A stylesheet may be evaluated by calling a named stylesheet function, referred to as the initial function.] The following additional information is needed in this case:

  • The name and arity of a stylesheet function which is to be executed as the entry point to the transformation.

    Note:

    In the design of a concrete API, the arity may be inferred from the length of the parameter list.

  • A list of values to act as parameters to the initial function. The number of values in the list must be the same as the arity of the function.

    A supplied value is converted if necessary to the declared type of the function parameter using the function conversion rules.

  • Details of how the result of the initial function is to be returned. For details, see 2.3.6 Post-processing the Raw Result

The raw result of the invocation is the result of evaluating the initial function, after conversion of the result to the type declared in the as attribute of that function using the function conversion rules, if such conversion is necessary.

Note:

The initial function (like all stylesheet functions) is evaluated with an absent focus.

If the initial function is declared-streamable, a streaming processor should allow the value of the first argument to be supplied in streamable form, and if it is supplied in this form, then it must be processed using streaming.

[ERR XTDE0041] It is a dynamic error if the invocation of the stylesheet specifies a function name and arity that does not match the expanded QName and arity of a named stylesheet function defined in the stylesheet, whose visibility is public or final.

When a transformation is invoked by calling an initial function, the entire transformation executes in temporary output state, which means that calls on xsl:result-document are not permitted.

2.3.6 Post-processing the Raw Result

There are three ways the result of a transformation may be delivered. (This applies both to the principal result, described here, and also to secondary results, generated using xsl:result-document.)

  1. The raw result (a sequence of values) may be returned directly to the calling application.

  2. A result tree may be constructed from the raw result. By default, a result tree is constructed if the build-tree attribute of the unnamed output definition has the effective value yes. An API for invoking transformations may allow this setting to be overridden by the calling application. If result tree construction is requested, it is performed as described in 2.3.6.1 Result Tree Construction.

  3. Alternatively, the raw result may be serialized as described in 2.3.6.2 Serializing the Result. The decision whether or not to serialize the result is determined by the rules of transformation API provided by the processor, and is not influenced by anything in the stylesheet.

Note:

This specification does not constrain the design of application programming interfaces or the choice of defaults. In previous versions of this specification, result tree construction was a mandatory process, while serialization was optional. When invoking stylesheet functions directly, however, result tree construction and serialization may be inappropriate as defaults. These considerations may affect the design of APIs.

In previous versions of XSLT, results were delivered either in serialized form (as a character or byte stream), or as a tree. In the latter case processors typically would use either their own tree representation, or a standardized tree representation such as the W3C Document Object Model (DOM) (see [DOM Level 2]), adapted to the data structures offered by the programming language in which the API is defined. To deliver a raw result, processors need to define a representation not only of XDM nodes but also of sequences, atomic values, maps and even functions. As with the return of a simple tree, this may involve a trade-off between strict fidelity to the XDM data model and usability in the particular programming language environment. It is not a requirement that an API should return results in a way that exposes every property of the XDM data model; for example there may be APIs that do not expose the precise type annotation of a returned node or atomic value, or that fail to expose the base URI or document URI of a node, or that provide no way of determining whether two nodes in the result sequence are the same node in the sense of the XPath is operator. The way in which maps and functions (and where XPath 3.1 is supported, arrays) are returned requires careful design choices. It is recommended that an API should be capable of returning any XDM value without error, and that there should be minimal loss of information if the raw results output by one transformation are subsequently used as input to another transformation.

2.3.6.1 Result Tree Construction

If a result tree is to be constructed from the raw result, then this is done by applying the rules for the process of sequence normalizationSER30 as defined in [XSLT and XQuery Serialization]. This process takes as input the serialization parameters defined in the unnamed output definition of the top-level package; though the only parameter that is actually used by this process is item-separator. In particular, sequence normalization is carried out regardless of any method attribute in the unnamed output definition.

The sequence normalization process either returns a document node, or raises a serialization error. The content of the document node is not necessarily well-formed (the document node may have any number of element or text nodes among its children).

Note:

More specifically, the process raises a serialization error if any item in the raw result is an attribute node, a namespace node, or a function (including a map, but not an array: arrays are flattened).

The tree that is constructed is referred to as a final result tree.

If the raw result is an empty sequence, the final result tree will consist of a document node with no children.

The base URI of the document node is set to the base output URI.

Note:

The item-separator property has no effect if the raw result of the transformation is a sequence of length zero or one, which in practice will often be the case, especially in a traditional scenario such as transformation of an XML document to HTML.

If there is no item-separator, then a single space is inserted between adjacent atomic values; for example if the raw result is the sequence 1 to 5, then sequence normalization produces a tree comprising a document node with a single child, the child being a text node with the string value 1 2 3 4 5.

If there is an item-separator, then it is used not only between adjacent atomic values, but between any pair of items in the raw result. For example if the raw result is a sequence of two element nodes A and B, and the item-separator is a comma, then the result of sequence normalization will be a document node with three children: a copy of A, a text node whose string value is a single comma, and a copy of B.

2.3.6.2 Serializing the Result

See 2.7 Parsing and Serialization.

The raw result may optionally be serialized as described in 26 Serialization. The serialization is controlled by the serialization parameters defined in the unnamed output definition of the top-level package.

Note:

The first phase of serialization, called sequence normalizationSER30, takes place for some output methods but not others. For example, if the json output method (defined in [XSLT and XQuery Serialization 3.1]) is selected, then the process of constructing a tree is bypassed.

The effect of serialization is to generate a sequence of octets, representing the serialized result in some character encoding. The processor’s API may define mechanisms enabling this sequence of octets to be written to persistent storage at some location. The default location is the location identified by the base output URI.

In previous versions of this specification it was stated that when the raw result of the initial template or function is an empty sequence, a result tree should be produced if and only if the transformation generates no secondary results (that is, if it does not invoke xsl:result-document). This provision is most likely to have a noticeable effect if the transformation produces serialized results, and these results are written to persistent storage: the effect is then that a transformation producing an empty principal result will overwrite any existing content at the base output URI location if and only if the transformation produces no other output. Processor APIs offering backwards compatibility with earlier versions of XSLT must respect this behavior, but there is no requirement for new processor APIs to do so.

[Definition:  The base output URI is a URI to be used as the base URI when resolving a relative URI reference allocated to a final result tree. If the transformation generates more than one final result tree, then typically each one will be allocated a URI relative to this base URI.] The way in which a base output URI is established is implementation-defined. Each invocation of the stylesheet may supply a different base output URI. It is acceptable for the base output URI to be absent, provided no constructs (such as xsl:result-document) are evaluated that depend on the value of the base output URI.

Note:

It will often be convenient for the base output URI to be the same as the location to which the principal result document is serialized, but this relationship is not a necessary one.

2.4 Instructions

The main executable components of a stylesheet are templates and functions. The body of a template or function is a sequence constructor, which is a sequence of elements and text nodes that can be evaluated to produce a result.

A sequence constructor is a sequence of sibling nodes in the stylesheet, each of which is either an XSLT instruction, a literal result element, a text node, or an extension instruction.

[Definition: An instruction is either an XSLT instruction or an extension instruction.]

[Definition: An XSLT instruction is an XSLT element whose syntax summary in this specification contains the annotation <!-- category: instruction -->.]

Extension instructions are described in 24.2 Extension Instructions.

The main categories of XSLT instruction are as follows:

2.5 Rule-Based Processing

The classic method of executing an XSLT transformation is to apply template rules to the root node of an input document (see 2.3.3 Apply-Templates Invocation). The operation of applying templates to a node searches the stylesheet for the best matching template rule for that node. This template rule is then evaluated. A common coding pattern, especially when XSLT is used to convert XML documents into display formats such as HTML, is to have one template rule for each kind of element in the source document, and for that template rule to generate some appropriate markup elements, and to apply templates recursively to its own children. The effect is to perform a recursive traversal of the source tree, in which each node is processed using the best-fit template rule for that node. The final result of the transformation is then the tree produced by this recursive process. This result can then be optionally serialized (see 2.3.6 Post-processing the Raw Result).

Example: Example of Rule-Based Processing

This example uses rule-based processing to convert a simple XML input document into an HTML output document.

The input document takes the form:

<PERSONAE PLAY="OTHELLO">
    <TITLE>Dramatis Personae</TITLE>
    <PERSONA>DUKE OF VENICE</PERSONA>
    <PERSONA>BRABANTIO, a senator.</PERSONA>
    <PERSONA>Other Senators.</PERSONA>
    <PERSONA>GRATIANO, brother to Brabantio.</PERSONA>
    <PERSONA>LODOVICO, kinsman to Brabantio.</PERSONA>
    <PERSONA>OTHELLO, a noble Moor in the service of the Venetian state.</PERSONA>
    <PERSONA>CASSIO, his lieutenant.</PERSONA>
    <PERSONA>IAGO, his ancient.</PERSONA>
    <PERSONA>RODERIGO, a Venetian gentleman.</PERSONA>
    <PERSONA>MONTANO, Othello's predecessor in the government of Cyprus.</PERSONA>
    <PERSONA>Clown, servant to Othello. </PERSONA>
    <PERSONA>DESDEMONA, daughter to Brabantio and wife to Othello.</PERSONA>
    <PERSONA>EMILIA, wife to Iago.</PERSONA>
    <PERSONA>BIANCA, mistress to Cassio.</PERSONA>
    <PERSONA>Sailor, Messenger, Herald, Officers, 
             Gentlemen, Musicians, and Attendants.</PERSONA>
  </PERSONAE>

The stylesheet to render this as HTML can be written as a set of template rules:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="3.0"
    expand-text="yes">
    
 <xsl:strip-space elements="PERSONAE"/>   
    
 <xsl:template match="PERSONAE">
   <html>
     <head>
       <title>The Cast of {@PLAY}</title>
     </head>
     <body>
       <xsl:apply-templates/>
     </body>
   </html>
 </xsl:template>
 
 <xsl:template match="TITLE">
   <h1>{.}</h1>
 </xsl:template>
 
 <xsl:template match="PERSONA[count(tokenize(., ',') = 2]">
   <p><b>{substring-before(., ',')}</b>: {substring-after(., ',')}</p>
 </xsl:template> 

 <xsl:template match="PERSONA">
   <p><b>{.}</b></p>
 </xsl:template>

</xsl:stylesheet>

There are four template rules here:

  • The first rule matches the outermost element, named PERSONAE (it could equally have used match="/" to match the document node). The effect of this rule is to create the skeleton of the output HTML page. Technically, the body of the template is a sequence constructor comprising a single literal result element (the html element); this in turn contains a sequence constructor comprising two literal result elements (the head and body elements). The head element is populated with a literal title element whose content is computed as a mixture of fixed and variable text using a text value template. The body element is populated by evaluating an xsl:apply-templates instruction.

    The effect of the xsl:apply-templates instruction is to process the children of the PERSONAE element in the source tree: that is, the TITLE and PERSONA elements. (It would also process any whitespace text node children, but these have been stripped by virtue of the xsl:strip-space declaration.) Each of these child elements is processed by the best matching template rule for that element, which will be one of the other three rules in the stylesheet.

  • The template rule for the TITLE element outputs an h1 element to the HTML result document, and populates this with the value of ".", the context item. That is, it copies the text content of the TITLE element to the output h1 element.

  • The last two rules match PERSONA element. The first rule matches PERSONA elements whose text content contains exactly one comma; the second rule matches all PERSONA elements, but it has lower priority than the first rule, so in practice it only applies to PERSONA elements that contain no comma or multiple commas.

    For both rules the body of the rule is a sequence constructor containing a single literal result element, the p element. These literal result elements contain further sequence constructors comprising literal result elements and text nodes. In each of these examples the text nodes are in the form of a text value template: in general this is a combination of fixed text together with XPath expressions enclosed in curly braces, which are evaluated to form the content of the containing literal result element.

[Definition: A stylesheet contains a set of template rules (see 6 Template Rules). A template rule has three parts: a pattern that is matched against selected items (often but not necessarily nodes), a (possibly empty) set of template parameters, and a sequence constructor that is evaluated to produce a sequence of items.] In many cases these items are newly constructed nodes, which are then written to a result tree.

2.6 The Evaluation Context

The results of some expressions and instructions in a stylesheet may depend on information provided contextually. This context information is divided into two categories: the static context, which is known during static analysis of the stylesheet, and the dynamic context, which is not known until the stylesheet is evaluated. Although information in the static context is known at analysis time, it is sometimes used during stylesheet evaluation.

Some context information can be set by means of declarations within the stylesheet itself. For example, the namespace bindings used for any XPath expression are determined by the namespace declarations present in containing elements in the stylesheet. Other information may be supplied externally or implicitly: an example is the current date and time.

The context information used in processing an XSLT stylesheet includes as a subset all the context information required when evaluating XPath expressions. The XPath 3.0 specification defines a static and dynamic context that the host language (in this case, XSLT) may initialize, which affects the results of XPath expressions used in that context. XSLT augments the context with additional information: this additional information is used firstly by XSLT constructs outside the scope of XPath (for example, the xsl:sort element), and secondly, by functions that are defined in the XSLT specification (such as key and current-group) that are available for use in XPath expressions appearing within a stylesheet.

The static context for an expression or other construct in a stylesheet is determined by the place in which it appears lexically. The details vary for different components of the static context, but in general, elements within a stylesheet module affect the static context for their descendant elements within the same stylesheet module.

The dynamic context is maintained as a stack. When an instruction or expression is evaluated, it may add dynamic context information to the stack; when evaluation is complete, the dynamic context reverts to its previous state. An expression that accesses information from the dynamic context always uses the value at the top of the stack.

The most commonly used component of the dynamic context is the context item. This is an implicit variable whose value is the item currently being processed (it may be a node, an atomic value, or a function item). The value of the context item can be referenced within an XPath expression using the expression . (dot).

Full details of the static and dynamic context are provided in 5.3 The Static and Dynamic Context.

2.7 Parsing and Serialization

An XSLT stylesheet describes a process that constructs a set of results from a set of inputs. The inputs are the data provided at stylesheet invocation, as described in 2.3 Initiating a Transformation. The results include the principal result (an arbitrary sequence), which is the result of the initial component invocation, together with any secondary results produced using xsl:result-document instructions.

The stylesheet does not describe how a source tree is constructed. Some possible ways of constructing source trees are described in [XDM 3.0]. Frequently an implementation will operate in conjunction with an XML parser (or more strictly, in the terminology of [XML 1.0], an XML processor), to build a source tree from an input XML document. An implementation may also provide an application programming interface allowing the tree to be constructed directly, or allowing it to be supplied in the form of a DOM Document object (see [DOM Level 2]). This is outside the scope of this specification. Users should be aware, however, that since the input to the transformation is a tree conforming to the XDM data model as described in [XDM 3.0], constructs that might exist in the original XML document, or in the DOM, but which are not within the scope of the data model, cannot be processed by the stylesheet and cannot be guaranteed to remain unchanged in the transformation output. Such constructs include CDATA section boundaries, the use of entity references, and the DOCTYPE declaration and internal DTD subset.

[Definition: A frequent requirement is to output a final result tree as an XML document (or in other formats such as HTML). This process is referred to as serialization.]

Like parsing, serialization is not part of the transformation process, and it is not required that an XSLT processor must be able to perform serialization. However, for pragmatic reasons, this specification describes declarations (the xsl:output element and the xsl:character-map declarations, see 26 Serialization), and attributes on the xsl:result-document instruction, that allow a stylesheet to specify the desired properties of a serialized output file. When serialization is not being performed, either because the implementation does not support the serialization option, or because the user is executing the transformation in a way that does not invoke serialization, then the content of the xsl:output and xsl:character-map declarations has no effect. Under these circumstances the processor may report any errors in an xsl:output or xsl:character-map declaration, or in the serialization attributes of xsl:result-document, but is not required to do so.

2.8 Packages and Modules

In previous versions of the XSLT language, it has been possible to structure a stylesheet as a collection of modules, using the xsl:include and xsl:import declarations to express the dependency of one module on others.

In XSLT 3.0 an additional layer of modularization of stylesheet code is enabled through the introduction of packages. A package is a collection of stylesheet modules with a controlled interface to the packages that use it: for example, it defines which functions and templates defined in the package are visible to callers, which are purely internal, and which are not only public but capable of being overridden by other functions and templates supplied by the using package.

Packages are introduced with several motivations, which broadly divide into two categories:

  1. Software engineering benefits: greater re-use of code, greater robustness through ease of testing, controlled evolution of code in response to new requirements, ability to deliver code that users cannot see or modify.

  2. Efficiency benefits: the ability to avoid compiling libraries repeatedly when they are used in multiple stylesheets, and to avoid holding multiple copies of the same library in memory simultaneously.

Packages are designed to allow separate compilation: that is, a package can be compiled independently of the packages that use it. This specification does not define a process model for compilation, or expand on what it means to compile different packages independently. Nor does it mandate that implementations offer any feature along these lines. It merely defines language features that are designed to make separate compilation of packages possible.

To achieve this, packages (unlike modules):

A package is defined in XSLT by means of an XML document whose outermost element is an xsl:package element. This is referred to as the package manifest. The xsl:package element has optional child elements xsl:use-package and xsl:expose describing properties of the package. The package manifest may refer to an external top-level stylesheet module using an xsl:include or xsl:import declaration, or it may contain the body of a stylesheet module inline (the two approaches can also be mixed).

Although this specification defines packages as constructs written using a defined XSLT syntax, implementations may provide mechanisms that allow packages to be written using other languages (for example, XQuery).

When no packages are explicitly defined, the entire stylesheet is treated as a single package; the effect is as if the xsl:stylesheet or xsl:transform element of the principal stylesheet module were replaced by an xsl:package element with no other information in the package manifest.

2.9 Extensibility

XSLT defines a number of features that allow the language to be extended by implementers, or, if implementers choose to provide the capability, by users. These features have been designed, so far as possible, so that they can be used without sacrificing interoperability. Extensions other than those explicitly defined in this specification are not permitted.

These features are all based on XML namespaces; namespaces are used to ensure that the extensions provided by one implementer do not clash with those of a different implementer.

The most common way of extending the language is by providing additional functions, which can be invoked from XPath expressions. These are known as extension functions, and are described in 24.1 Extension Functions.

It is also permissible to extend the language by providing new instructions. These are referred to as extension instructions, and are described in 24.2 Extension Instructions. A stylesheet that uses extension instructions in a particular namespace must declare that it is doing so by using the [xsl:]extension-element-prefixes attribute.

Extension instructions and extension functions defined according to these rules may be provided by the implementer of the XSLT processor, and the implementer may also provide facilities to allow users to create further extension instructions and extension functions.

This specification defines how extension instructions and extension functions are invoked, but the facilities for creating new extension instructions and extension functions are implementation-defined. For further details, see 24 Extensibility and Fallback.

The XSLT language can also be extended by the use of extension attributes (see 3.2 Extension Attributes), and by means of user-defined data elements (see 3.7.3 User-defined Data Elements).

2.10 Stylesheets and XML Schemas

An XSLT stylesheet can make use of information from a schema. An XSLT transformation can take place in the absence of a schema (and, indeed, in the absence of a DTD), but where the source document has undergone schema validity assessment, the XSLT processor has access to the type information associated with individual nodes, not merely to the untyped text.

Information from a schema can be used both statically (when the stylesheet is compiled), and dynamically (during evaluation of the stylesheet to transform a source document).

There are places within a stylesheet, and within XPath expressions and patterns in a stylesheet, where it is possible to refer to named type definitions in a schema, or to element and attribute declarations. For example, it is possible to declare the types expected for the parameters of a function. This is done using a SequenceType.

[Definition: A SequenceType constrains the type and number of items in a sequence. The term is used both to denote the concept, and to refer to the syntactic form in which sequence types are expressed in the XPath grammar: specifically SequenceTypeXP30 in [XPath 3.0], or SequenceTypeXP31 in [XPath 3.1], depending on whether or not the XPath 3.1 Feature is implemented.]

[Definition: Type definitions and element and attribute declarations are referred to collectively as schema components.]

[Definition: The schema components that may be referenced by name in a package are referred to as the in-scope schema components.]

The set of in-scope schema components may vary between one package and another, but as explained in 3.15 Importing Schema Components, the schema components used in different packages must be consistent with each other.

The conformance rules for XSLT 3.0, defined in 27 Conformance, distinguish between a basic XSLT processor and a schema-aware XSLT processor. As the names suggest, a basic XSLT processor does not support the features of XSLT that require access to schema information, either statically or dynamically. A stylesheet that works with a basic XSLT processor will produce the same results with a schema-aware XSLT processor provided that the source documents are untyped (that is, they are not validated against a schema). However, if source documents are validated against a schema then the results may be different from the case where they are not validated. Some constructs that work on untyped data may fail with typed data (for example, an attribute of type xs:date cannot be used as an argument of the substringFO30 function) and other constructs may produce different results depending on the datatype (for example, given the element <product price="10.00" discount="2.00"/>, the expression @price gt @discount will return true if the attributes have type xs:decimal, but will return false if they are untyped).

There is a standard set of type definitions that are always available as in-scope schema components in every stylesheet. These are defined in 3.14 Built-in Types.

The remainder of this section describes facilities that are available only with a schema-aware XSLT processor.

Additional schema components (type definitions, element declarations, and attribute declarations) may be added to the in-scope schema components by means of the xsl:import-schema declaration in a stylesheet.

The xsl:import-schema declaration may reference an external schema document by means of a URI, or it may contain an inline xs:schema element.

It is only necessary to import a schema explicitly if one or more of its schema components are referenced explicitly by name in the stylesheet; it is not necessary to import a schema merely because the stylesheet is used to process a source document that has been assessed against that schema. It is possible to make use of the information resulting from schema assessment (for example, the fact that a particular attribute holds a date) even if no schema has been imported by the stylesheet.

Importing a schema does not of itself say anything about the type of the source document that the stylesheet is expected to process. The imported type definitions can be used for temporary nodes or for nodes on a result tree just as much as for nodes in source documents. It is possible to make assertions about the type of an input document by means of tests within the stylesheet. For example:

Example: Asserting the Required Type of the Source Document
<xsl:mode typed="lax"/>
<xsl:global-context-item use="required"
            as="document-node(schema-element(my:invoice))"/>

This example will cause the transformation to fail with an error message, unless the global context item is valid against the top-level element declaration my:invoice, and has been annotated as such.

The setting typed="lax" further ensures that in any match pattern for a template rule in this mode, an element name that corresponds to the name of an element declaration in the schema is taken as referring to elements validated against that declaration: for example, match="employee" will only match a validated employee element. Selecting this option enables the XSLT processor to do more compile-time type-checking against the schema, for example it allows the processor to produce warning or error messages when path expressions contain misspelt element names, or confuse an element with an attribute.

It is also true that importing a schema does not of itself say anything about the structure of the result tree. It is possible to request validation of a result tree against the schema by using the xsl:result-document instruction, for example:

Example: Requesting Validation of the Result Document
<xsl:template match="/">
  <xsl:result-document validation="strict">
    <xhtml:html>
      <xsl:apply-templates/>
    </xhtml:html>
  </xsl:result-document>
</xsl:template>
               

This example will cause the transformation to fail with an error message unless the document element of the result document is valid against the top-level element declaration xhtml:html.

It is possible that a source document may contain nodes whose type annotation is not one of the types imported by the stylesheet. This creates a potential problem because in the case of an expression such as data(.) instance of xs:integer the system needs to know whether the type named in the type annotation of the context node is derived by restriction from the type xs:integer. This information is not explicitly available in an XDM tree, as defined in [XDM 3.0]. The implementation may choose one of several strategies for dealing with this situation:

  1. The processor may signal a dynamic error if a source document is found to contain a type annotation that is not known to the processor.

  2. The processor may maintain additional metadata, beyond that described in [XDM 3.0], that allows the source document to be processed as if all the necessary schema information had been imported using xsl:import-schema. Such metadata might be held in the data structure representing the source document itself, or it might be held in a system catalog or repository.

  3. The processor may be configured to use a fixed set of schemas, which are automatically used to validate all source documents before they can be supplied as input to a transformation. In this case it is impossible for a source document to have a type annotation that the processor is not aware of.

  4. The processor may be configured to treat the source document as if no schema processing had been performed, that is, effectively to strip all type annotations from elements and attributes on input, marking them instead as having type xs:untyped and xs:untypedAtomic respectively.

Where a stylesheet author chooses to make assertions about the types of nodes or of variables and parameters, it is possible for an XSLT processor to perform static analysis of the stylesheet (that is, analysis in the absence of any source document). Such analysis may reveal errors that would otherwise not be discovered until the transformation is actually executed. An XSLT processor is not required to perform such static type-checking. Under some circumstances (see 2.14 Error Handling) type errors that are detected early may be reported as static errors. In addition an implementation may report any condition found during static analysis as a warning, provided that this does not prevent the stylesheet being evaluated as described by this specification.

A stylesheet can also control the type annotations of nodes that it constructs in a result tree. This can be done in a number of ways.

When nodes in a temporary tree are validated, type information is available for use by operations carried out on the temporary tree, in the same way as for a source document that has undergone schema assessment.

For details of how validation of element and attribute nodes works, see 25.4 Validation.

2.11 Streaming

[Definition: The term streaming refers to a manner of processing in which XML documents (such as source and result documents) are not represented by a complete tree of nodes occupying memory proportional to document size, but instead are processed “on the fly” as a sequence of events, similar in concept to the stream of events notified by an XML parser to represent markup in lexical XML.]

[Definition: A streamed document is a source tree that is processed using streaming, that is, without constructing a complete tree of nodes in memory.]

[Definition: A streamed node is a node in a streamed document.]

Many processors implementing earlier versions of this specification have adopted an architecture that allows streaming of the result tree directly to a serializer, without first materializing the complete result tree in memory. Streaming of the source tree, however, has proved to be more difficult without subsetting the language. This has created a situation where documents exceeding the capacity of virtual memory could not be transformed. XSLT 3.0 therefore introduces facilities allowing stylesheets to be written in a way that makes streaming of source documents possible, without excessive reliance on processor-specific optimization techniques.

Streaming achieves two important objectives: it allows large documents to be transformed without requiring correspondingly large amounts of memory; and it allows the processor to start producing output before it has finished receiving its input, thus reducing latency.

This specification does not attempt to legislate precisely which implementation techniques fall under the definition of streaming, and which do not. A number of techniques are available that reduce memory requirements, while still requiring a degree of buffering, or allocation of memory to partial results. A stylesheet that requests streaming of a source document is indicating that the processor should avoid assuming that the entire source document will fit in memory; in return, the stylesheet must be written in a way that makes streaming possible. This specification does not attempt to describe the algorithms that the processor should actually use, or to impose quantitative constraints on the resources that these algorithms should consume.

Nothing in this specification, nor in its predecessors [XSLT 1.0] and [XSLT 2.0], prevents a processor using streaming whenever it sees an opportunity to do so. However, experience has shown that in order to achieve streaming, it is often necessary to write stylesheet code in such a way as to make this possible. Therefore, XSLT 3.0 provides explicit constructs allowing the stylesheet author to request streaming, and defines explicit static constraints on the structure of the code which are designed to make streaming possible.

A processor that claims conformance with the streaming option offers a guarantee that when streaming is requested for a source document, and when the stylesheet conforms to the rules that make the processing guaranteed-streamable, then an algorithm will be adopted in which memory consumption is either completely independent of document size, or increases only very slowly as document size increases, allowing documents to be processed that are orders-of-magnitude larger than the physical memory available. A processor that does not claim conformance with the streaming option must still process a stylesheet and deliver the correct results, but is not required to use streaming algorithms, and may therefore fail with out-of-memory errors when presented with large source documents.

Apart from the fact that there are constructs to request streaming, and rules that must be followed to guarantee that streaming is possible, the language has been designed so there are as few differences as possible between streaming and non-streaming evaluation. The semantics of the language continue to be expressed in terms of the XDM data model, which is substantively unchanged; but readers must take care to observe that when terms like “node” and “axis” are used, the concepts are completely abstract and may have no direct representation in the run-time execution environment.

Streamed processing of a document can be initiated in one of three ways:

The rules for streamability, which are defined in detail in 19 Streamability, impose two main constraints:

The second restriction, that only one visit to the children is allowed, means that XSLT code that was not designed with streaming in mind will often need to be rewritten to make it streamable. In many cases it is possible to do this using a technique sometimes called windowing or burst-mode streaming (note this is not quite the same meaning as windowing in XQuery 3.0). Many XML documents consist of a large number of elements, each of manageable size, representing transactions or business objects where each such element can be processed independently: in such cases, an effective design pattern is to write a streaming transformation that takes a snapshot of each element in turn, processing the snapshot using the full power of the XSLT language. Each snapshot is a tree built in memory and is therefore fully navigable. For details see the snapshot and copy-of functions.

The new facility of accumulators allows applications complete control over how much information is retained (and by implication, how much memory is required) in the course of a pass over a streamed document. An accumulator computes a value for every node in a streamed document: or more accurately, two values, one for the first visit to a node (before visiting its descendants), and a second value for the second visit to the node (after visiting the descendants). The computation is structured in such a way that the value for a given node can depend only on the value for the previous node in document order together with the data available when positioned at the current node (for example, the attribute values). Based on the well-established fold operation of functional programming languages, accumulators provide the convenience and economy of mutable variables while remaining within the constraints of a purely declarative processing model.

When streaming is initiated, for example using the xsl:source-document instruction, it is necessary to declare which accumulators are applicable to the streamed document.

Streaming applications often fall into one of the following categories:

There are important classes of application in which streaming is possible only if multiple streams can be processed in parallel. This specification therefore provides facilities:

  1. allowing multiple sorted input sequences to be merged into one sorted output sequence (the xsl:merge instruction)

  2. allowing multiple output sequences to be generated during a single pass of an input sequence (the xsl:fork instruction).

These facilities have been designed in such a way that they can readily be implemented using streaming, that is, without materializing the input or output sequences in memory.

2.12 Streamed Validation

Streaming can be combined with schema-aware processing: that is, the streamed input to a transformation can be subjected to on-the-fly validation, a process which typically accepts an input stream from the XML parser and delivers an output stream (of type-annotated nodes) to the transformation processor. The XSD specification is designed so that validation is, with one or two exceptions, a streamable process. The exceptions include:

Applications that need to run in finite memory may therefore need to avoid these XSD features, or to use them with care.

XSD is designed so that the intended type of an element (the “governing type”) can be determined as soon as the start tag of the element is encountered: the process of validation checks whether the content of the element actually conforms to this type, and by the time the end tag is encountered, the process will have established either that the element is valid against the governing type, or that it is invalid.

By default, dynamic errors occurring during streamed processing are fatal: they typically cause the transformation to fail immediately. XSLT 3.0 introduces the ability to catch dynamic errors and recover from them. Schema invalidity, however, is treated as a dynamic error of the instruction that processes the entire input stream, so after a validation failure, no further processing of that input stream is possible.

In consequence, a streamed validator that is running in tandem with a streamed transformation can present the transformer with element nodes that carry a provisional type annotation representing the type that the element will have if it turns out to be valid. As soon as a node is encountered that violates this assumption, the validator should stop the flow of data to the transformer, so that the transformer never sees invalid data. This allows the stylesheet code to be compiled with the assumption of type-safety: at run-time, all nodes seen by the transformation will conform to their XSLT-declared types (for example, a type declared implicitly using match="schema-element(invoice)" on an xsl:template element).

A streamed transformation that only accesses part of the input document (for example, a header at the start of a document) is not required to continue reading once the data it needs has been read. This means that XML well-formedness or validity errors occurring in the unread part of the input stream may go undetected.

2.13 Streaming of non-XML data

The facilities in this specification designed to enable large data sets to be processed in a streaming manner are oriented almost entirely to XML data. This does not mean that there is never a requirement to stream non-XML data, or that the Working Group has ignored this requirement; rather, the Working Group has concluded that for the most part, streaming of non-XML data can be achieved by implementations without the need for specific language features in XSLT.

To make streamed processing of unparsed text files easier, the function unparsed-text-linesFO30 has been introduced. This is not only more convenient for stylesheet authors than reading the entire input using the unparsed-textFO30 function and then tokenizing the result, it is also easier for implementations to optimize, allowing each line of text to be discarded from memory after it has been processed.

For all functions that access external data, including document, docFO30, collectionFO30, unparsed-textFO30, unparsed-text-linesFO30, and (in XPath 3.1) json-docFO31, the requirements on determinism can now be relaxed using implementation-defined configuration options. This is significant because it means that when a transformation reads the same external resource more than once, it becomes legitimate for the contents of the resource to be different on different invocations, and this eliminates the need for the processor to cache the contents of the resource in memory.

In the XDM data model, every value is a sequence, and (as with most functional programming languages), processing of sequences of items is pervasive throughout the XSLT and XPath languages and their function library. Good performance of a functional programming language often depends on sequence-based operations being pipelined, and being evaluated in a lazy fashion (that is, many operations process items in a sequence one at a time, in order; and many operations can deliver a result without processing the entire sequence). The semantics of XSLT and XPath permit pipelined and lazy evaluation (for example, the error handling semantics are carefully written to ensure this), but they do not require it: the details are left to implementations. Pipelined processing of a sequence is not the same thing as streamed processing of a tree, and where the XSLT specification talks of operations being “guaranteed streamable”, this is always referring to processing of trees, not of sequences.

The facilities for streaming of XML trees include operations such as copy-of and snapshot which are able to take a sequence of streamed nodes as input, and produce a sequence of in-memory (unstreamed) nodes as output. It is also possible to generate a sequence of strings or other atomic values through the process of atomization. The actual memory usage of a streamed XSLT application may depend significantly on whether the processing of the resulting sequence of in-memory nodes or atomic values is pipelined or not. The specification, however, has nothing to say on this matter: it is considered an area where implementers can exercise their discretion and ingenuity.

Streaming of JSON input receives little attention in this specification. One can envisage an implementation of the json-to-xml function in which the XML delivered by the function consists of streamed nodes; but the Working Group has not researched the feasibility of such an implementation in any detail.

2.14 Error Handling

[Definition: An error that can be detected by examining a stylesheet before execution starts (that is, before the source document and values of stylesheet parameters are available) is referred to as a static error.]

Generally, errors in the structure of the stylesheet, or in the syntax of XPath expressions contained in the stylesheet, are classified as static errors. Where this specification states that an element in the stylesheet must or must not appear in a certain position, or that it must or must not have a particular attribute, or that an attribute must or must not have a value satisfying specified conditions, then any contravention of this rule is a static error unless otherwise specified.

A processor must provide a mode of operation that takes a (possibly erroneous) stylesheet package as input and enables the user to determine whether or not that package contains any static errors.

Note:

The manner in which static errors are reported, and the behavior when there are multiple static errors, are left as design choices for the implementer. It is recommended that the error codes defined in this specification should be made available in any diagnostics.

A processor may also provide a mode of operation in which static errors in parts of the stylesheet that are not evaluated can go unreported.

Note:

For example, when operating in this mode, a processor might report static errors in a template rule only if the input document contains nodes that match that template rule. Such a mode of operation can provide performance benefits when large and well-tested stylesheets are used to process source documents that might only use a small part of the XML vocabulary that the stylesheet is designed to handle.

[Definition: An error that is not capable of detection until a source document is being transformed is referred to as a dynamic error.]

When a dynamic error occurs, and is not caught using xsl:catch, the processor must signal the error, and the transformation fails.

Because different implementations may optimize execution of the stylesheet in different ways, the detection of dynamic errors is to some degree implementation-dependent. In cases where an implementation is able to produce a principal result or secondary result without evaluating a particular construct, the implementation is never required to evaluate that construct solely in order to determine whether doing so causes a dynamic error. For example, if a variable is declared but never referenced, an implementation may choose whether or not to evaluate the variable declaration, which means that if evaluating the variable declaration causes a dynamic error, some implementations will signal this error and others will not.

There are some cases where this specification requires that a construct must not be evaluated: for example, the content of an xsl:if instruction must not be evaluated if the test condition is false. This means that an implementation must not signal any dynamic errors that would arise if the construct were evaluated.

An implementation may signal a dynamic error before any source document is available, but only if it can determine that the error would be signaled for every possible source document and every possible set of parameter values. For example, some circularity errors fall into this category: see 9.11 Circular Definitions.

There are also some dynamic errors where the specification gives a processor license to signal the error during the analysis phase even if the construct might never be executed; an example is the use of an invalid QName as a literal argument to a function such as key, or the use of an invalid regular expression in the regex attribute of the xsl:analyze-string instruction.

A dynamic error is also signaled during the static analysis phase if the error occurs during evaluation of a static expression.

The XPath specification states (see Section 2.3.1 Kinds of Errors XP30) that if any expression (at any level) can be evaluated during the analysis phase (because all its explicit operands are known and it has no dependencies on the dynamic context), then any error in performing this evaluation may be reported as a static error. For XPath expressions used in an XSLT stylesheet, however, any such errors must not be reported as static errors in the stylesheet unless they would occur in every possible evaluation of that stylesheet; instead, they must be signaled as dynamic errors, and signaled only if the XPath expression is actually evaluated.

Example: Errors in Constant Subexpressions

An XPath processor may report statically that the expression 1 div 0 fails with a “divide by zero” error. But suppose this XPath expression occurs in an XSLT construct such as:

<xsl:choose>
  <xsl:when test="system-property('xsl:version') = '1.0'">
    <xsl:value-of select="1 div 0"/>
  </xsl:when>
  <xsl:otherwise>
    <xsl:value-of select="xs:double('INF')"/>
  </xsl:otherwise>
</xsl:choose>

Then the XSLT processor must not report an error, because the relevant XPath construct appears in a context where it will never be executed by an XSLT 2.0 or 3.0 processor. (An XSLT 1.0 processor will execute this code successfully, returning positive infinity, because it uses double arithmetic rather than decimal arithmetic.)

[Definition: Certain errors are classified as type errors. A type error occurs when the value supplied as input to an operation is of the wrong type for that operation, for example when an integer is supplied to an operation that expects a node.] If a type error occurs in an instruction that is actually evaluated, then it must be signaled in the same way as a dynamic error. Alternatively, an implementation may signal a type error during the analysis phase in the same way as a static error, even if it occurs in part of the stylesheet that is never evaluated, provided it can establish that execution of a particular construct would never succeed.

It is implementation-defined whether type errors are signaled statically.

Example: A Type Error

The following construct contains a type error, because 42 is not allowed as the value of the select expression of the xsl:number instruction (it must be a node). An implementation may optionally signal this as a static error, even though the offending instruction will never be evaluated, and the type error would therefore never be signaled as a dynamic error.

<xsl:if test="false()">
  <xsl:number select="42"/>
</xsl:if>

On the other hand, in the following example it is not possible to determine statically whether the operand of xsl:number will have a suitable dynamic type. An implementation may produce a warning in such cases, but it must not treat it as an error.

<xsl:template match="para">
  <xsl:param name="p" as="item()"/>
  <xsl:number select="$p"/>
</xsl:template>

If more than one error arises, an implementation is not required to signal any errors other than the first one that it detects. It is implementation-dependent which of the several errors is signaled. This applies both to static errors and to dynamic errors. An implementation is allowed to signal more than one error, but if any errors have been signaled, it must not finish as if the transformation were successful.

When a transformation signals one or more dynamic errors, the final state of any persistent resources updated by the transformation is implementation-dependent. Implementations are not required to restore such resources to their initial state. In particular, where a transformation produces multiple result documents, it is possible that one or more serialized result documents may be written successfully before the transformation terminates, but the application cannot rely on this behavior.

Everything said above about error handling applies equally to errors in evaluating XSLT instructions, and errors in evaluating XPath expressions. Static errors and dynamic errors may occur in both cases.

[Definition: If a transformation has successfully produced a principal result or secondary result, it is still possible that errors may occur in serializing that result . For example, it may be impossible to serialize the result using the encoding selected by the user. Such an error is referred to as a serialization error.] If the processor performs serialization, then it must do so as specified in 26 Serialization, and in particular it must signal any serialization errors that occur.

Errors are identified by a QName. For errors defined in this specification, the namespace of the QName is always http://www.w3.org/2005/xqt-errors (and is therefore not given explicitly), while the local part is an 8-character code in the form PPSSNNNN. Here PP is always XT (meaning XSLT), and SS is one of SE (static error), DE (dynamic error), or TE (type error). Note that the allocation of an error to one of these categories is purely for convenience and carries no normative implications about the way the error is handled. Many errors, for example, can be reported either dynamically or statically. These error codes are used to label error conditions in this specification, and are summarized in E Summary of Error Conditions.

Errors defined in related specifications ([XPath 3.0], [Functions and Operators 3.0] [XSLT and XQuery Serialization]) use QNames with a similar structure, in the same namespace. When errors occur in processing XPath expressions, an XSLT processor should use the original error code reported by the XPath processor, unless a more specific XSLT error code is available.

Implementations must use the codes defined in these specifications when signaling dynamic errors, to ensure that xsl:catch behaves in an interoperable way across implementations. Stylesheet authors should note, however, that there are many examples of errors where more than one rule in this specification is violated, and where the processor therefore has discretion in deciding which error code to associate with the condition: there is therefore no guarantee that different processors will always use the same error code for the same erroneous input.

Additional errors defined by an implementation (or by an application) may use QNames in an implementation-defined (or user-defined) namespace without risk of collision.