XSLT 3.0 introduces a number of constructs that are specifically designed to enable streamed applications to be written, but which are also useful in their own right; it also includes some features that are very specialized to streaming.
xsl:source-document
Instruction<!-- Category: instruction -->
<xsl:source-document
href = { uri }
streamable? = boolean
use-accumulators? = tokens
validation? = "strict" | "lax" | "preserve" | "strip"
type? = eqname >
<!-- Content: sequence-constructor -->
</xsl:source-document>
The xsl:source-document
instruction reads a source document whose URI is
supplied, and processes the content of the document by evaluating the
contained sequence constructor.
The streamable
attribute (default "no"
)
allows streamed processing to be requested.
For example, if a document represents a book holding a sequence of chapters, then the following code can be used to split the book into multiple XML files, one per chapter, without allocating memory to hold the entire book in memory at one time:
<xsl:source-document streamable="yes" href="book.xml"> <xsl:for-each select="book"> <xsl:for-each select="chapter"> <xsl:result-document href="chapter{position()}.xml"> <xsl:copy-of select="."/> </xsl:result-document> </xsl:for-each> </xsl:for-each> </xsl:source-document>
Note:
In earlier drafts of this specification the xsl:source-document
element was named xsl:stream
. The instruction has been generalised to handle both streamed and unstreamed
input.
The document to be read is determined by the effective value of the href
attribute (which is defined as
an attribute value template).
This must be a valid URI reference.
If it is an absolute URI reference, it is used as is; if it is a relative URI
reference, it is made absolute by resolving it against the base URI of the
xsl:source-document
element. The process of obtaining a
document node given a URI is the same as for the doc
FO30 function.
However, unlike the doc
FO30 function, the
xsl:source-document
instruction offers no guarantee that the resulting
document will be stable (that is, that multiple calls specifying the same URI will
return the same document).
Specifically, if an xsl:source-document
instruction is evaluated several
times (or if different xsl:source-document
instructions are evaluated) with
the same URI (after making it absolute) as the
value of the href
attribute, it is implementation-dependent whether the
same nodes or different nodes are returned on each occasion; it is also possible that
the actual document content will be different.
Note:
A different node will necessarily be returned if there
are differences in attributes such as validation
, type
,
streamable
, or use-accumulators
, or if the calls are in different
packages with variations in the rules for whitespace
stripping or stripping of type annotations.
The result of the xsl:source-document
instruction is the same as the result
of the following (non-streaming) process:
The source document is read from the supplied URI and parsed to form an tree of nodes in the XDM data model.
The contained sequence constructor is evaluated with the root node of this tree
as the context item, and with the context
position and context size set to one; and the resulting sequence is returned as
the result of the xsl:source-document
instruction.
The xsl:source-document
instruction is guaranteed-streamable if both the following conditions are satisfied:
It is declared-streamable, by specifying
streamable="yes"
.
the contained sequence constructor is grounded, as assessed using the streamability analysis in 19 Streamability. The consequences of being or not being guaranteed streamable depend on the processor conformance level, and are explained in 19.10 Streamability Guarantees.
The use-accumulators
attribute defines the
set of accumulators that are applicable to the document, as explained in
18.2.2 Applicability of Accumulators.
Note:
The following notes apply specifically to streamed processing.
The rules for guaranteed streamability
ensure that the sequence constructor (and therefore the
xsl:source-document
instruction) cannot return any nodes from the
streamed document. For example,
it cannot contain the instruction <xsl:sequence
select="//chapter"/>
. If nodes from this document are to be returned,
they must first be copied, for example by using the
xsl:copy-of
instruction or by calling the
copy-of
or snapshot
functions.
Because the xsl:source-document
instruction cannot (if it satisfies the rules for guaranteed
streamability) return nodes from the streamed document, any nodes it
does return will be conventional (unstreamed) nodes that can be processed without
restriction. For example, if xsl:source-document
is invoked within a
stylesheet function
f:firstChapter
, and the sequence constructor consists of the
instruction <xsl:copy-of select="//chapter"/>
, then the calling
code can manipulate the resulting chapter
elements as ordinary trees
rooted at parentless element nodes.
If the sequence constructor in an
xsl:source-document
instruction were to return nodes from the document
for which streaming has been requested, the instruction would not be guaranteed
streamable. Processors which support the streaming feature would then not be
required to process it in a streaming manner, and this specification imposes no
restrictions on the processing of the nodes returned. (The ability of a streaming
processor to handle such stylesheets in a streaming manner might, of course,
depend on how the nodes returned are processed, but those details are out of scope
for this specification.)
The validation
and type
attributes of
xsl:source-document
may be used to control schema validation of the
input document. They have the same effect as the
corresponding attributes of the xsl:copy-of
instruction when
applied to a document node, except that
when streamable="yes"
is specified,
the copy that is produced is itself a
streamed document. The process is described in more detail in 25.4.2 Validating Document Nodes.
These two attributes are both optional, and if one is specified then the other must be omitted ([see ERR XTSE1505]).
The presence of a validation
or type
attribute on an
xsl:source-document
instruction causes any
input-type-annotations
attribute to have no effect on any document
read using that instruction.
Note:
In effect, setting validation
to strict
or
lax
, or supplying the type
attribute, requests
document-level validation of the input as it is read. Setting
validation="preserve"
indicates that if the incoming document
contains type annotations (for example, produced by validating the output of a
previous step in a streaming pipeline) then they should be retained, while the
value strip
indicates that any such type annotations should be
dropped.
It is a consequence of the way validation is defined in XSD that the type
annotation of an element node can be determined during the processing of its
start tag, although the actual validity of the element is not known until the
end tag is encountered. When validation is requested, a streamed document
should not present data to the stylesheet except to the extent that such data
could form the leading part of a valid document. If the document proves to be
invalid, the processor should not pass invalid data to the stylesheet to be
processed, but should immediately signal the appropriate error. For the
purposes of xsl:try
and xsl:catch
, this error
can only be caught at the level of the xsl:source-document
instruction
that initiated validation, not at a finer level. If validation errors are
caught in this way, any output that has been computed up to the point of the
error is not added to the final result tree; the mechanisms to achieve this may
use memory, which may reduce the efficacy of streaming.
The analysis of guaranteed streamability (see 19 Streamability)
takes no account of information that might be obtained from a schema-aware
static analysis of the stylesheet. Implementations may, however, be able to use
streaming strategies for stylesheets that are not guaranteed-streamable, by
taking advantage of such information. For example, an implementation might be
able to treat the expression .//title
as striding rather than crawling if it can
establish from knowledge of the schema that two title
elements
will never be nested one inside the
other.
xsl:source-document
The xsl:source-document
instruction can be used to initiate processing of
a document using streaming with a variety of coding styles, illustrated in the
examples below.
xsl:source-document
with Aggregate Functions
The following example computes the number of transactions in a transaction file
Input:
<transactions> <transaction value="12.51"/> <transaction value="3.99"/> </transactions>
Stylesheet code:
<xsl:source-document streamable="yes" href="transactions.xml"> <count> <xsl:value-of select="count(transactions/transaction)"/> </count> </xsl:source-document>
Result:
<count>2</count>
Analysis:
The literal result element count
has the same sweep as the
xsl:value-of
instruction.
The xsl:value-of
instruction has the same sweep as its
select
expression.
The call to count
has the same sweep as its argument.
The argument to count
is a RelativePathExpr
.
Under the rules in 19.8.8.8 Streamability of Path Expressions,
this expression is striding and consuming. The
call on count
is therefore grounded and consuming.
The entire body of the xsl:source-document
instruction is therefore
grounded and consuming.
The following example computes the highest-value transaction in the same input file:
<xsl:source-document streamable="yes" href="transactions.xml"> <maxValue> <xsl:value-of select="max(transactions/transaction/@value)"/> </maxValue> </xsl:source-document>
Result:
<maxValue>12.51</maxValue>
Analysis:
The literal result element maxValue
has the same sweep as
the xsl:value-of
instruction.
The xsl:value-of
instruction has the same sweep as its
select
expression.
The call to max
has the same sweep as its argument.
The argument to max
is a RelativePathExpr
whose
two operands are the RelativePathExpr
transactions/transaction
and the AxisStep
@value
. The left-hand operand transactions/transaction
has
striding
posture. The right-hand operand @value
, given
that it appears in a node value context, is motionless. The RelativePathExpr
argument to max
is
therefore consuming.
The entire body of the xsl:source-document
instruction is
therefore consuming.
To compute both the count and the maximum value in a single pass over the input, several approaches are possible. The simplest is to use maps (map constructors are exempt from the usual rule that multiple downward selections are not allowed):
<xsl:source-document streamable="yes" href="transactions.xml"> <xsl:variable name="tally" select="map{ 'count': count(transactions/transaction), 'max': max(transactions/transaction/@value)}"/> <value count="{$tally('count')}" max="{$tally('max')}"/> </xsl:source-document>
Other options include the use of xsl:fork
, or multiple xsl:accumulator
declarations, one for each value to be computed.
This example displays a list of the chapter titles extracted from each book in a collection of books.
Each input document is assumed to have a structure such as:
<book> <chapter number-of-pages="18"> <title>The first chapter of book A</title> ... </chapter> <chapter number-of-pages="15"> <title>The second chapter of book A</title> ... </chapter> <chapter number-of-pages="12"> <title>The third chapter of book A</title> ... </chapter> </book>
Stylesheet code:
<chapter-titles> <xsl:for-each select="uri-collection('books')"> <xsl:source-document streamable="yes" href="{.}"> <xsl:for-each select="book"> <xsl:for-each select="chapter"> <title><xsl:value-of select="title"/></title> </xsl:for-each> </xsl:for-each> </xsl:source-document> </xsl:for-each> </chapter-titles>
Output:
<chapter-titles> <title>The first chapter of book A</title> <title>The second chapter of book A</title> ... <title>The first chapter of book B</title> ... </chapter-titles>
Note:
This example uses the function uri-collection
FO30 to
obtain the document URIs of all the documents in a collection, so that each
one can be processed in turn using xsl:source-document
.
This example assumes that the input is a book with multiple chapters, as shown in the previous example, with the page count for each chapter given as an attribute of the chapter. The transformation determines the starting page number for each chapter by accumulating the page counts for previous chapters, and rounding up to an odd number if necessary.
<chapter-start-page> <xsl:source-document streamable="yes" href="book.xml"> <xsl:iterate select="book/chapter"> <xsl:param name="start-page" select="1"/> <chapter title="{title}" start-page="{$start-page}"/> <xsl:next-iteration> <xsl:with-param name="start-page" select="$start-page + @number-of-pages + (@number-of-pages mod 2)"/> </xsl:next-iteration> </xsl:iterate> </xsl:source-document> </chapter-start-page>
Output:
<chapter-start-page> <chapter title="The first chapter of book A" start-page="1"/> <chapter title="The second chapter of book A" start-page="19"/> <chapter title="The third chapter of book A" start-page="35"/> ... </chapter-start-page>
This example assumes that the input is a book with multiple chapters, and that each chapter belongs to a part, which is present as an attribute of the chapter (for example, chapters 1-4 might constitute Part 1, the next three chapters forming Part 2, and so on):
<book> <chapter part="1"> <title>The first chapter of book A</title> ... </chapter> <chapter part="1"> <title>The second chapter of book A</title> ... </chapter> ... <chapter part="2"> <title>The fifth chapter of book A</title> ... </chapter> </book>
The transformation copies the full text of the chapters, creating an extra level of hierarchy for the parts.
<book> <xsl:source-document streamable="yes" href="book.xml"> <xsl:for-each select="book"> <xsl:for-each-group select="chapter" group-adjacent="data(@part)"> <part number="{current-grouping-key()}"> <xsl:copy-of select="current-group()"/> </part> </xsl:for-each-group> </xsl:for-each> </xsl:source-document> </book>
Output:
<book> <part number="1"> <chapter part="1"> <title>The first chapter of book A</title> ... </chapter> <chapter part="1"> <title>The second chapter of book A</title> ... </chapter> ... </part> <part number="2"> <chapter part="2"> <title>The fifth chapter of book A</title> ... </chapter> ... </part> </book>
This example copies an XML document while deleting all the ednote
elements at any level of the tree, together with their descendants. This
example is a complete stylesheet, which is intended to be evaluated by
nominating main
as the initial named template.
The use of on-no-match="deep-copy"
in the
xsl:mode
declaration means that the built-in template rule
copies nodes unchanged, except where overridden by a user-defined template
rule.
<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:mode name="delete-ednotes" streamable="yes" on-no-match="shallow-copy"/> <xsl:template name="main"> <xsl:source-document streamable="yes" href="book.xml"> <xsl:apply-templates mode="delete-ednotes"/> </xsl:source-document> </xsl:template> <xsl:template match="ednote" mode="delete-ednotes"/> </xsl:transform>
Additional template rules could be added to process other elements and
attributes in the same pass through the data: for example, to modify the value
of a last-updated
attribute (wherever it appears) to the current
date and time, the following rule suffices:
<xsl:template match="@last-updated" mode="delete-ednotes"> <xsl:attribute name="last-updated" select="current-dateTime()"/> </xsl:template>
Determines, as far as possible, whether a document is available for streamed processing
using xsl:source-document
.
This function is nondeterministicFO30, context-dependentFO30, and focus-independentFO30. It depends on available documents.
The intent of the stream-available
function is to allow a stylesheet author to determine,
before calling xsl:source-document
with streamable="yes"
and
with a particular URI as the value of its href
attribute, whether a document is available at that location for streamed processing.
If the $uri
argument is an empty sequence then the function returns false
.
If the function returns true
then the caller can conclude that the following conditions are true:
The supplied URI is valid;
A resource can be retrieved at that URI;
An XML representation of the resource can be delivered, which is well-formed at least to the extent that some initial sequence of octets can be decoded into characters and matched against the production:
prolog (EmptyElemTag | STag )
as defined in the XML 1.0 or XML 1.1 Recommendation.
Note:
That is, the XML is well-formed at least as far as the end of the first element start tag; to establish this, a parser will typically retrieve any external entities referenced in the Doctype declaration or DTD.If the function returns false
, the caller can conclude that either one of the above conditions is not satisfied,
or
the processor detected some other condition that would prevent a call on xsl:source-document
with
streamable="yes"
executing successfully.
Like xsl:source-document
itself, the function is not deterministic, which means that multiple calls during
the execution
of a stylesheet will not necessarily return the same result. The caller cannot make
any inferences about the point in time at which
the input conditions for stream-available
are present, and in particular there is no guarantee that because
stream-available
returns true, xsl:source-document
will necessarily succeed.
The value of the $uri
argument must be a URI in the form of a string. If it is a relative URI,
it is resolved relative to the static base URI of the function call.
If the URI is invalid, such that a call on doc-available
FO30 would signal an error, then
stream-available
signals the same error: [ERR FODC0005] FO30.
Accumulators are introduced in XSLT 3.0 to enable data that is read during streamed processing of a document to be accumulated, processed or retained for later use. However, they may equally be used with non-streamed processing.
[Definition: An
accumulator defines a series of
values associated with the nodes of the tree. If an accumulator is
applicable to a particular tree, then for each node in the tree, other than
attribute and namespace nodes, there will be two values available, called the
pre-descent and post-descent values. These two values are available via a pair of
functions, accumulator-before
and
accumulator-after
.]
There are two ways the values of an accumulator can be
established for a given tree: they can be computed by evaluating the rules appearing
in the xsl:accumulator
declaration, or they can be copied from the
corresponding nodes in a different tree. The second approach (copying the values)
is
available via the snapshot
and copy-of
functions, or by use of the xsl:copy-of
instruction specifying
copy-accumulators="yes"
. Accumulator values are also copied during
the implicit invocation of the snapshot function performed by the
xsl:merge
instruction.
Note:
Accumulators can apply to trees rooted at any kind of node. But because they are most often applied to trees rooted at a document node, this section sometimes refers to the “document” to which an accumulator applies; use of this term should be taken to include all trees whether or not they are rooted at a document node.
Accumulators can apply to trees rooted at nodes (such as text nodes) that cannot have children, though this serves no useful purpose. In the case of a tree rooted at an attribute or namespace node, there is no way to obtain the value of the accumulator.
The following sections give first, the syntax rules for defining an accumulator; then an informal description of the semantics; then a more formal definition; and finally, examples. But to illustrate the concept intuitively, the following simple example shows how an accumulator can be used for numbering of nodes:
This example assumes document input in which figure
elements can
appear within chapter
elements (which we assume are not nested), and
the requirement is to render the figures with a caption that includes the figure
number within its containing chapter.
When the document is processed using streaming, the xsl:number
instruction is not available, so a solution using accumulators is needed.
The required accumulator can be defined and used like this:
<xsl:accumulator name="figNr" as="xs:integer" initial-value="0" streamable="yes"> <xsl:accumulator-rule match="chapter" select="0"/> <xsl:accumulator-rule match="figure" select="$value + 1"/> </xsl:accumulator> <xsl:mode streamable="yes"/> <xsl:template match="figure"> <xsl:apply-templates/> <p>Figure <xsl:value-of select="accumulator-before('figNr')"/></p> </xsl:template>
<!-- Category: declaration -->
<xsl:accumulator
name = eqname
initial-value = expression
as? = sequence-type
streamable? = boolean >
<!-- Content: xsl:accumulator-rule+ -->
</xsl:accumulator>
<xsl:accumulator-rule
match = pattern
phase? = "start" | "end"
select? = expression >
<!-- Content: sequence-constructor -->
</xsl:accumulator-rule>
An xsl:accumulator
element is a declaration of an accumulator. The
name
attribute defines the name of the accumulator. The value of
the name
attribute is an EQName,
which is expanded as described in 5.1.1 Qualified Names.
An xsl:accumulator
declaration can only appear as a top-level element in a stylesheet module.
The functions accumulator-before
and accumulator-after
return, respectively, the
value of the accumulator before visiting the descendants of a given node, and the
value after visiting the descendants of a node. Each of these functions takes a single argument, the name of the
accumulator, and the function applies implicitly to the context node. The
type of the return value (for both functions) is determined by the as
attribute of the xsl:accumulator
element.
[Definition: The functions
accumulator-before
and
accumulator-after
are referred to as the
accumulator functions.]
For constructs that use accumulators to be guaranteed-streamable:
The accumulator-before
function for a streamed node can
be called at any time the node is available (it behaves like other
properties of the node such as name
FO30 or
base-uri
FO30).
The accumulator-after
function, however, is restricted
to appear after any instruction that reads the descendants
of the node in question. The constraints are expressed as static rules: see
19.8.9.1 Streamability of the accumulator-after Function for more details.
The initial value of the accumulator is obtained by evaluating the expression in
the initial-value
attribute. This
attribute is mandatory. The expression in the
initial-value
attribute is evaluated with a singleton focus based on the root node of
the streamed input tree to which the accumulator is being applied.
The values of the accumulator for individual nodes in a tree are obtained by
applying the xsl:accumulator-rule
rules contained within the
xsl:accumulator
declaration, as described in subsequent
sections. The match
attribute of
xsl:accumulator-rule
is a pattern which
determines which nodes trigger execution of the rule; the phase
attribute indicates whether the rule fires before descendants are processed
(phase="start"
, which is the default), or after descendants are
processed (phase="end"
).
The select
attribute and the contained sequence constructor of the
xsl:accumulator-rule
element are mutually exclusive: if the
select
attribute is present then the sequence constructor must be
empty. The expression in the select
attribute of xsl:accumulator-rule
or the contained sequence constructor
is evaluated with a static context that follows the normal rules for expressions
in stylesheets, except that:
An additional variable is present in the context. The name of this variable
is value
(in no namespace), and its type is the type that
appears in the as
attribute of the
xsl:accumulator
declaration.
The context item for evaluation of the expression or sequence constructor will always be a node
that matches the pattern in the
match
attribute.
The result of both the initial-value
and select
expressions (or contained sequence
constructor) is converted to the type declared in the as
attribute by applying the function conversion rules. A
type error occurs if conversion is not
possible. The as
attribute defaults to item()*
.
The effect of the streamable
attribute is defined in 18.2.9 Streamability of Accumulators.
It is not the case that every accumulator is applicable to every tree. The details depend on how the accumulator is declared, and how the tree is created. The rules are as follows:
An accumulator is applicable to a tree unless otherwise specified in these rules.
(For example, when a document is read using the document
,
doc
FO30, or collection
FO30 functions,
all accumulators are applicable. Similarly, all accumulators are applicable
to a temporary tree created using xsl:variable
.)
Regardless of the rules below, an accumulator is not applicable to a streamed document
unless the accumulator is declared with streamable="yes"
. (The converse
does not apply: for unstreamed documents, accumulators are applicable regardless
of the value of the streamable
attribute.)
For a document read using the
xsl:source-document
instruction, the accumulators that are applicable
are those determined by the use-accumulators
attribute of that instruction.
For a document read using the for-each-source
attribute of an
xsl:merge-source
child of an xsl:merge
instruction,
the accumulators that are applicable are those determined by the use-accumulators
attribute of the xsl:merge-source
element.
For a document containing nodes supplied in the
initial match selection, the accumulators that are
applicable are those determined by the xsl:mode
declaration of the initial mode. This means that in the
absence of an xsl:mode
declaration, no accumulators are applicable.
For a tree T created by copying a node in a tree S
using the copy-of
or snapshot
functions, or the instruction xsl:copy-of
with
copy-accumulators="yes"
, an accumulator is applicable to
T if and only if it is applicable to S.
If an accumulator is not applicable to the tree containing the context item, calls
to the functions accumulator-before
and
accumulator-after
, supplying the name of that accumulator,
will fail with a dynamic error.
Note:
The reason that accumulators are not automatically applicable to every streamed document is to avoid the cost of evaluating them, and to avoid the possibility of dynamic errors occuring if they are not designed to work with a particular document structure.
In the case of unstreamed documents, there are no compelling reasons to restrict which accumulators are applicable, because an implementation can avoid the cost of evaluating every accumulator against every document by evaluating the accumulator lazily, for example, by only evaluating the accumulator for a particular tree the first time its value is requested for a node in that tree. In the interests of orthogonality, however, restricting the applicable accumulators works in the same way for streamable and non-streamable documents.
The value of the use-accumulators
attribute of
xsl:source-document
, xsl:merge-source
,
or xsl:mode
must either a
whitespace-separated list of EQNames, or the
special token #all
. The list may be empty, and the default value is
an empty list. Every EQName
in the list must be the name of an
accumulator, visible in the containing package, and declared with
streamable="yes"
. The value #all
indicates that all
accumulators that are visible in the containing package are applicable (except
that for a streamable input document, an accumulator is not applicable unless
it specifies streamable="yes"
).
[ERR XTSE3300] It is a static error if the list of
accumulator names contains an invalid token, contains the same
token more than once, or contains the token #all
along with any
other value; or if any token (other than
#all
) is not the name of a declared-streamable accumulator visible in the containing
package.
This section describes how accumulator values are
established by evaluating the rules in an xsl:accumulator
declaration. This process does not apply to trees created with accumulator
values copied from another document, for example by using the
copy-of
or snapshot
functions.
Informally, an accumulator is evaluated by traversing a tree, as follows.
Each node is visited twice, once before processing its descendants, and once after processing its descendants. For consistency, this applies even to leaf nodes: each is visited twice. Attribute and namespace nodes, however, are not visited.
Before the traversal starts, a variable (called the accumulator variable) is
initialized to the value of the expression given as the initial-value
attribute.
On each node visit, the xsl:accumulator-rule
elements are
examined to see if there is a matching rule. For a match to occur, the pattern in
the match
attribute must match the node, and the phase
attribute must be start
if this is the first visit, and
end
if it is the second visit. If there is a matching rule, then a
new value is computed for the accumulator variable using the expression contained
in that rule’s select
attribute or the contained sequence constructor. If there is more than
one matching rule, the last in document order is used. If there is no matching
rule, the value of the accumulator variable does not change.
Each node is labeled with a pre-descent value for the accumulator, which is the value of the accumulator variable immediately after processing the first visit to that node, and with a post-descent value for the accumulator, which is the value of the accumulator variable immediately after processing the second visit.
The function accumulator-before
delivers
the pre-descent value of the accumulator at the context node; the function
accumulator-after
delivers the post-descent value of the
accumulator at the context node.
Although this description is expressed in procedural terms, it can be seen that the two values of the accumulator for any given node depend only on the node and its preceding and (in the case of the post-descent value) descendant nodes. Calculation of both values is therefore deterministic and free of side-effects; moreover, it is clear that the values can be computed during a streaming pass of a document, provided that the rules themselves use only information that is available without repositioning the input stream.
It is permitted for the select
expression of an accumulator rule, or the contained
sequence constructor, to invoke an accumulator function. For a streamable accumulator, the rules ensure that
a rule with phase="start"
cannot call the
accumulator-after
function. When such function calls
exist in an accumulator rule, they impose a dependency of one accumulator on
another, and create the possibility of cyclic dependencies. Processors are
allowed to report the error statically if they can detect it statically.
Failing this, processors are allowed to fail catastrophically in the event of a
cycle, in the same way as they might fail in the event of infinite function or
template recursion. Catastrophic failure might manifest itself, for example, as
a stack overflow, or as non-termination of the transformation.
This section describes how accumulator values are
established by evaluating the rules in an xsl:accumulator
declaration. This process does not apply to trees created with accumulator
values copied from another document, for example by using the
copy-of
or snapshot
functions.
[Definition: A traversal of a tree is a sequence of traversal events.]
[Definition: a traversal
event (shortened to event in this section) is a pair
comprising a phase (start or end) and a node.] It is modelled as a map
with two entries: map{"phase": p, "node": n}
where p is the string
"start"
or "end"
and n
is a node.
The traversal of a tree contains two traversal events for each node in the tree, other than attribute and namespace nodes. One of these events (the “start event”) has phase = "start", the other (the "end event") has phase = "end".
The order of traversal events within a traversal is such that, given any two nodes M and N with start/end events denoted by M0, M1, N0, and N1, :
For any node N, N0 precedes N1;
If M is an ancestor of N then M0 precedes N0 and N1 precedes M1;
If M is on the preceding axis of N then M1 precedes N0.
The accumulator defines a (private) delta function Δ. The delta function computes the value of the accumulator for one traversal event in terms of its value for the previous traversal event. The function is defined as follows:
The signature of Δ is function ($old-value as T,
$event as map(*)) as T
, where T is the sequence type
declared in the as
attribute of the accumulator
declaration;
The implementation of the function is equivalent to the following algorithm:
Let R be the set of xsl:accumulator-rule
elements among the children of the accumulator declaration whose
phase
attribute equals $event("phase")
and whose match
attribute is a pattern that matches $event("node")
If R is empty, return $old-value
Let Q be the xsl:accumulator-rule
in
R that is last in document order
Return the value of the expression in the select
attribute of Q, or the
contained sequence constructor, evaluating this with a
singleton focus set to
$event("node")
and with a dynamic context that binds
the variable whose name is $value
(in no namespace) to the value
$old-value
.
Note:
The argument names old-value
and event
are used here purely for definitional purposes; these names are not
available for use within the select
expression or contained sequence
constructor.
For every node N, other than attribute and namespace nodes, the accumulator defines a pre-descent value BN and a post-descent value AN whose values are as follows:
Let T be the traversal of
the tree rooted at fn:root(N)
.
Let SB be the subsequence of T starting at the first
event in T and ending with the start event for node N
(that is, the event map{ "phase":"start", "node":N }
).
Let SA be the subsequence of T starting at the first
event in T, and ending with the end event
for node N (that is, the event map{ "phase":"end", "node":N
}
).
Let Z be the result of evaluating the expression contained in the
initial-value
attribute of the
xsl:accumulator
declaration, evaluated with a singleton focus
based on root(N)
.
Then the pre-descent value BN is the value of
fn:fold-left(SB, Z, Δ)
, and the post-descent value
AN is the value of fn:fold-left(SA, Z,
Δ)
.
If a dynamic error occurs when evaluating the initial-value
expression
of xsl:accumulator
, or the select
expression of xsl:accumulator-rule
,
then the error is signaled as an error from any subsequent call on accumulator-before
or accumulator-after
that references the accumulator. If no such call on accumulator-before
or accumulator-after
happens, then the error goes unreported.
Note:
In the above rule, the phrase subsequent call is to be understood in terms of functional dependency; that is, a call to
accumulator-before
or accumulator-after
signals an error if the accumulator value at the node in question is
functionally dependent on a computation that fails with a dynamic error.
Note:
Particularly in the case of streamed accumulators, this may mean that the implementation
has to “hold back” the error
until the next time the accumulator is referenced, to give applications the opportunity
to catch the error using xsl:try
and xsl:catch
in a predictable way.
Note:
Errors that occur during the evaluation of the pattern in the match
attribute of
xsl:accumulator-rule
are handled as described in 5.5.4 Errors in Patterns:
specifically, the pattern does not match the relevant node, and no error is signaled.
Returns the pre-descent value of the selected accumulator at the context node.
This function is deterministicFO30, context-dependentFO30, and focus-dependentFO30.
The $name
argument specifies the name of the accumulator. The value of the argument must be a string containing an EQName. If it is a lexical QName, then it is expanded as described in
5.1.1 Qualified Names (no prefix means no namespace).
The function returns the pre-descent value B(N)of the selected accumulator where N is the context node, as defined in 18.2.4 Formal Model for Accumulators.
If the context item is a node in a streamed document, then the accumulator
must be declared with streamable="yes"
.
Note:
The converse is not true: an accumulator declared to be streamable is available on both streamed and unstreamed nodes.
[ERR XTDE3340] It is a dynamic error if the value of the first
argument to the accumulator-before
or
accumulator-after
function is not a valid
EQName, or if there is no namespace declaration in scope
for the prefix of the QName, or if the name obtained by expanding the QName is not
the same as the expanded name of any xsl:accumulator
declaration
appearing in the package in which the function
call appears. If the processor is able to detect the error statically (for
example, when the argument is supplied as a string literal), then the processor
may optionally signal this as a static error.
[ERR XTDE3350] It is a dynamic error to call the
accumulator-before
or
accumulator-after
function when there is no context item.
[ERR XTTE3360] It is a type error to call the
accumulator-before
or
accumulator-after
function when the context item is not a node, or when it is an attribute or namespace
node.
[ERR XTDE3362] It is a dynamic error to call the
accumulator-before
or
accumulator-after
function when the context
item is a node in a tree to which the selected accumulator is not
applicable (including the case where it is not applicable
because the document is streamed and the accumulator is not
declared with streamable="yes"
). Implementations
may raise this error but are not required to do so,
if they are capable of streaming documents without imposing this restriction.
[ERR XTDE3400] It is an error if there is a cyclic set of dependencies among accumulators such that the (pre- or post-descent) value of an accumulator depends directly or indirectly on itself. A processor may report this as a static error if it can be detected statically. Alternatively a processor may report this as a dynamic error. As a further option, a processor may fail catastrophically when this error occurs.
The accumulator-before
function can be applied to a node whether or not the accumulator
has a phase="start"
rule for that node. In effect, there is a phase="start"
rule
for every node, where the default rule is to leave the accumulator value unchanged;
the
accumulator-before
function delivers the value of the accumulator after processing
the explicit or implicit phase="start"
rule.
Given the accumulator:
<xsl:accumulator name="a" initial-value="0"> <xsl:accumulator-rule match="section" select="$value + 1"/> </xsl:accumulator>
and the template rule:
<xsl:template match="section"> <xsl:value-of select="accumulator-before('a')"/> <xsl:apply-templates/> </xsl:template>
The stylesheet will precede the output from processing each section with a section number that runs sequentially 1, 2, 3... irrespective of the nesting of sections.
Returns the post-descent value of the selected accumulator at the context node.
This function is deterministicFO30, context-dependentFO30, and focus-dependentFO30.
The $name
argument specifies the name of the accumulator. The value of the argument must be a string containing an EQName. If it is a
lexical QName, then it is expanded as
described in 5.1.1 Qualified Names (no prefix means no namespace).
The function returns the post-descent value A(N) of the selected accumulator where N is the context node, as defined in 18.2.4 Formal Model for Accumulators.
If the context item is a node in a streamed document, then the accumulator
must be declared with streamable="yes"
.
Note:
The converse is not true: an accumulator declared to be streamable is available on both streamed and unstreamed nodes.
The following errors apply: [see ERR XTDE3340], [see ERR XTDE3350], [see ERR XTTE3360], [see ERR XTDE3362], [see ERR XTDE3400].
For constraints on the use of accumulator-after
when streaming, see
19.8.9.1 Streamability of the accumulator-after Function.
The accumulator-after
function can be applied to a node whether or not the accumulator
has a phase="end"
rule for that node. In effect, there is a phase="end"
rule
for every node, where the default rule is to leave the accumulator value unchanged;
the
accumulator-after
function delivers the value of the accumulator after processing
the explicit or implicit phase="end"
rule.
Given the accumulator:
<xsl:accumulator name="w" initial-value="0" streamable="true" as="xs:integer"> <xsl:accumulator-rule match="text()" select="$value + count(tokenize(.))"/> </xsl:accumulator>
and the template rule:
<xsl:template match="section"> <xsl:apply-templates/> (words: <xsl:value-of select="accumulator-after('w') - accumulator-before('w')"/>) </xsl:template>
The stylesheet will output at the end of each section a (crude) count of the number of words in that section.
Note: the call on tokenize(.)
relies on XPath 3.1
If a package contains more than one
xsl:accumulator
declaration with a particular name, then the
one with the highest import precedence is used.
[ERR XTSE3350] It is a static error for a package to contain two or more non-hidden accumulators with the same expanded QName and the same import precedence, unless there is another accumulator with the same expanded QName, and a higher import precedence.
Accumulators cannot be referenced from, or overridden in, a different package from the one in which they are declared.
An accumulator is guaranteed-streamable if it satisfies all the following conditions:
The xsl:accumulator
declaration has the attribute
streamable="yes"
.
In every contained xsl:accumulator-rule
, the pattern in the match
attribute is
a motionless pattern (see 19.8.10 Classifying Patterns).
The expression in the
initial-value
attribute is grounded and
motionless.
The expression in the select
attribute or contained
sequence constructor is grounded and
motionless.
Specifying streamable="yes"
on an
xsl:accumulator
element declares an intent that the
accumulator should be streamable, either
because it is guaranteed-streamable, or because it takes
advantage of streamability extensions offered by a particular
processor. The consequences of declaring the accumulator to be
streamable when it is not in fact guaranteed streamable depend on the conformance
level of the processor, and are explained in 19.10 Streamability Guarantees.
When an accumulator is declared to be streamable, the
stylesheet author must ensure that the accumulator function
accumulator-after
is only called at appropriate points in
the processing, as explained in 19.8.9.1 Streamability of the accumulator-after Function.
When nodes (including streamed nodes) are copied using the
snapshot
or copy-of
functions, or
using the xsl:copy-of
instruction with the attribute
copy-accumulators="yes"
, then the pre-descent and post-descent
values of accumulators for that tree are not determined by traversing the tree as
described in 18.2.3 Informal Model for Accumulators and 18.2.4 Formal Model for Accumulators. Instead the values are the same as the values
on the corresponding nodes of the source tree.
This applies also to the implicit invocation of the snapshot
function that happens during the evaluation of xsl:merge
.
If an accumulator is not applicable to a tree S, then it is also not applicable to any tree formed by copying nodes from S using the above methods.
Note:
During streamed processing, accumulator values will typically be computed “on
the fly”; when the copy-of
or
snapshot
functions are applied to a streamed node, the
computed accumulator values for the streamed document will typically be
materialized and saved as part of the copy.
Accumulator values for a non-streamed document will often be computed lazily,
that is, they will not be computed unless and until they are needed. A call on
copy-of
or snapshot
on a
non-streamed document whose accumulator values have not yet been computed can
then be handled in a variety of ways. The implementation might interpret the
call on copy-of
or snapshot
as a
trigger causing the accumulator values to be computed; or it might retain a
link between the nodes of the copied tree and the nodes of the original tree,
so that a request for accumulator values on the copied tree can trigger
computation of accumulator values for the original tree.
Consider an XHTML document in which the title of the document is represented by
the content of a title
element appearing as a child of the
head
element, which in turn appears as a child of the
html
element. Suppose that we want to process the document in
streaming mode, and that we want to avoid outputting the content of the
h1
element if it is the same as the document title.
This can be achieved by remembering the value of the title in an accumulator variable.
<xsl:accumulator name="firstTitle" as="xs:string?" initial-value="()" streamable="yes"> <xsl:accumulator-rule match="/html/head/title/text()" select="string(.)"/> </xsl:accumulator>
Subsequently, while processing an h1
element appearing later in
the document, the value can be referenced:
<xsl:template match="h1"> <xsl:variable name="firstTitle" select="accumulator-before('firstTitle')"/> <xsl:variable name="thisTitle" select="string(.)"/> <xsl:if test="$thisTitle ne $firstTitle"> <div class="heading-1"><xsl:value-of select="$thisTitle"/></div> </xsl:if> </xsl:template>
Suppose that there is a requirement to output, at the end of the HTML rendition of a document, a paragraph giving the total number of words in the document.
An accumulator can be used to maintain a (crude) word count as follows:
<xsl:accumulator name="word-count" as="xs:integer" initial-value="0"> <xsl:accumulator-rule match="text()" select="$value + count(tokenize(.))"/> </xsl:accumulator>
Note: the call on tokenize#1
relies on XPath 3.1
The final value can be output at the end of the document:
<xsl:template match="/"> <xsl:apply-templates/> <p>Word count: <xsl:value-of select="accumulator-after('word-count')"/></p> </xsl:template>
Consider a document in which section
elements are nested within
section
elements to arbitrary depth, and there is a requirement
to render the document with hierarchic section numbers of the form
3.5.1.4
.
The current section number can be maintained in an accumulator in the form of a sequence of integers, managed as a stack. The number of integers represents the current level of nesting, and the value of each integer represents the number of preceding sibling sections encountered at that level. For convenience the first item in the sequence represents the top of the stack.
<xsl:accumulator name="section-nr" as="xs:integer*" initial-value="0"> <xsl:accumulator-rule match="section" phase="start" select="0, head($value)+1, tail($value)"/> <xsl:accumulator-rule match="section" phase="end" select="tail($value) (:pop:)"/> </xsl:accumulator>
To illustrate this, consider the values after processing a series of start and end tags:
events | accumulator value | required section number |
---|---|---|
<section> |
0, 1 |
1 |
<section> |
0, 1, 1 |
1.1 |
</section> |
1, 1 |
|
<section> |
0, 2, 1 |
1.2 |
</section> |
2, 1 |
|
<section> |
0, 3, 1 |
1.3 |
<section> |
0, 1, 3, 1 |
1.3.1 |
</section> |
1, 3, 1 |
|
<section> |
0, 2, 3, 1 |
1.3.2 |
</section> |
2, 3, 1 |
|
</section> |
3, 1 |
|
</section> |
1 |
The section number for a section can thus be generated as:
<xsl:template match="section"> <p> <xsl:value-of select="reverse(tail(accumulator-before('section-nr')))" separator="."/> </p> <xsl:apply-templates/> </xsl:template>
<xsl:accumulator name="histogram" as="map(xs:string, xs:integer)" initial-value="map{}"> <xsl:accumulator-rule match="book"> <xsl:choose> <xsl:when test="map:contains($value, @publisher)"> <xsl:sequence select="map:put($value, string(@publisher), $value(@publisher)+1)"/> </xsl:when> <xsl:otherwise> <xsl:sequence select="map:put($value, string(@publisher), 1)"/> </xsl:otherwise> </xsl:choose> </xsl:accumulator-rule> </xsl:accumulator>
The contained sequence constructor
is
evaluated with the variable $value
set to the current value, and
with the context node as the node being visited.
Note:
In the two calls on map:put()
, it is necessary to explicitly
convert @publisher
to an xs:string
value, because
this is the declared type of the keys in the result map. Relying on
atomization would produce keys of type xs:untypedAtomic
, which
would not satisfy the declared type of the map.
The accumulated histogram might be displayed as follows:
<xsl:source-document streamable="yes" href="booklist.xml"> ..... <h1>Number of books, by publisher</h1> <table> <thead> <th>Publisher</th> <th>Number of books</th> </thead> <tbody> <xsl:variable name="histogram" select="accumulator-after('histogram')"/> <xsl:for-each select="map:keys($histogram)"> <tr> <td><xsl:value-of select="."/></td> <td><xsl:value-of select="$histogram(.)"/></td> </tr> </xsl:for-each> </tbody> </table> </xsl:source-document>
Returns a deep copy of the sequence supplied as the $input
argument, or of the
context item if the argument is absent.
fn:copy-of
($input
as
item()*
) as
item()*
The zero-argument form of this function is nondeterministicFO30, focus-dependentFO30, and context-independentFO30.
The one-argument form of this function is nondeterministicFO30, focus-independentFO30, and context-independentFO30.
The zero-argument form of this function is defined so that copy-of()
returns the value of internal:copy-item(.)
, where internal:copy-item
(which
exists only for the purpose of this exposition) is defined below. Informally, copy-of()
copies the context item.
The single argument form of this function is defined in terms of the
internal:copy-item
as follows: copy-of($input)
is equivalent
to $input ! internal:copy-item(.)
. Informally, copy-of($input)
copies each item in the
input sequence in turn.
The internal:copy-item
function is defined as follows:
<xsl:function name="internal:copy-item" as="item()" new-each-time="maybe"> <xsl:param name="input" as="item()"/> <xsl:copy-of select="$input" copy-namespaces="yes" copy-accumulators="yes" validation="preserve"/> </xsl:function>
The streamability analysis, however, is different: see 19.8.9 Classifying Calls to Built-In Functions.
The use of new-each-time="maybe"
in the above definition means that
if the internal:copy-item
function is called more than once with the same node as argument
(whether or not these calls are part of the same call on copy-of
), then it is implementation-dependent whether each
call returns the same node, or whether multiple calls return different nodes.
Returning the original node, however, is not allowed, except as an optimization when
the processor
can determine that this is equivalent.
Note:
One case where such optimization might be possible is when the copy is immediately atomized.
The copy-of
function is available for use (and is primarily
intended for use) when a source document is processed using streaming. It can also
be
used when not streaming. The effect,
when applied to element and document nodes,
is to take a copy of the subtree rooted at the
current node, and to make this available as a normal tree: one that can be processed
without
any of the restrictions that apply while streaming, for example only being able to
process children once. The copy, of course, does not include siblings or ancestors
of
the context node, so any attempt to navigate to siblings or ancestors will result
in an
empty sequence being returned.
All nodes in the result sequence will be parentless.
If atomic values or functions (including maps and arrays) are present in the input sequence, they will be included unchanged at the corresponding position of the result sequence.
Accumulator values are taken from the copied document as described in 18.2.10 Copying Accumulator Values.
Using copy-of()
while streaming:
This example copies from the source document all employees who work in marketing and
are based in Dubai. Because there are two accesses using the child axis, it is not
possible to do this without buffering each employee in memory, which can be achieved
using the copy-of
function.
<xsl:source-document streamable="yes" href="employees.xml"> <xsl:sequence select="copy-of(employees/employee) [department='Marketing' and location='Dubai']"/> </xsl:source-document>
Returns a copy of a sequence, retaining copies of the ancestors and descendants of any node in the input sequence, together with their attributes and namespaces.
fn:snapshot
($input
as
item()*
) as
item()*
The zero-argument form of this function is nondeterministicFO30, focus-dependentFO30, and context-independentFO30.
The one-argument form of this function is nondeterministicFO30, focus-independentFO30, and context-independentFO30.
The zero-argument form of this function is defined so that snapshot()
returns the value of internal:snaphot-item(.)
, where internal:snapshot-item
(which
exists only for the purpose of this exposition) is defined below. Informally, snapshot()
takes a snapshot of the context item.
The single argument form of this function is defined in terms of the
internal:snapshot-item
as follows: snapshot($input)
is equivalent
to $input ! internal:snapshot-item(.)
. Informally, snapshot($input)
takes a snapshot of each item in the
input sequence in turn.
The internal:snapshot-item
function behaves as follows:
If the supplied item is an atomic value or a function item (including maps and arrays), then it returns that item unchanged.
If the supplied item is a node, then it returns a snapshot of that node, as defined below.
[Definition: A snapshot of a node N
is a deep copy of N, as produced by the xsl:copy-of
instruction with copy-namespaces
set to yes
,
copy-accumulators
set to yes
, and
validation
set to preserve
, with the additional property
that for every ancestor of N, the copy also has a corresponding ancestor
whose name, node-kind, and base URI are the same as the corresponding ancestor of
N, and that has copies of the attributes, namespaces and accumulator values of the
corresponding ancestor of N. But the ancestor has a type annotation of
xs:anyType
, has the properties nilled
,
is-id
, and is-idref
set to false, and has no children
other than the child that is a copy of N or one of its
ancestors.]
If the function is called more than once with the same argument, it is implementation-dependent whether each
call returns the same node, or whether multiple calls return different nodes. That
is,
the result of the expression snapshot($X) is snapshot($X)
is implementation-dependent.
Except for the effect on accumulators, the internal:snapshot-item
function can be expressed
as follows:
<xsl:function name="internal:snapshot-item" as="item()"> <xsl:param name="input" as="item()"/> <xsl:apply-templates select="$input" mode="internal:snapshot"/> </xsl:function> <!-- for atomic values and function items, return the item unchanged --> <xsl:template match="." mode="internal:snapshot" priority="1"> <xsl:sequence select="."/> </xsl:template> <!-- for a document node, or any other root node, return a deep copy --> <xsl:template match="root()" mode="internal:snapshot" priority="5"> <xsl:copy-of select="."/> </xsl:template> <!-- for an element, comment, text node, or processing instruction: --> <xsl:template match="node()" mode="internal:snapshot" as="node()" priority="3"> <xsl:sequence select="internal:graft-to-parent( ., .., function($n){$n/node()})"/> </xsl:template> <!-- for an attribute: --> <xsl:template match="@*" mode="internal:snapshot" as="attribute()" priority="3"> <xsl:variable name="name" select="node-name(.)"/> <xsl:sequence select="internal:graft-to-parent(., .., function($n){$n/@*[node-name(.) = $name]})"/> </xsl:template> <!-- for a namespace node: --> <xsl:template match="namespace-node()" mode="internal:snapshot" as="namespace-node()" priority="3"> <xsl:variable name="name" select="local-name(.)"/> <xsl:sequence select="internal:graft-to-parent(., .., function($n){$n/namespace-node()[local-name(.) = $name]})"/> </xsl:template> <!-- make a copy C of a supplied node N, grafting it to a shallow copy of C's original parent, and returning the copy C --> <xsl:function name="internal:graft-to-parent" as="node()"> <xsl:param name="n" as="node()"/> <xsl:param name="original-parent" as="node()?"/> <xsl:param name="down-function" as="function(node()) as node()"/> <xsl:choose> <xsl:when test="exists($original-parent)"> <xsl:variable name="p" as="node()"> <xsl:copy select="$original-parent"> <xsl:copy-of select="@*"/> <xsl:copy-of select="$n"/> </xsl:copy> </xsl:variable> <xsl:variable name="copied-parent" select="internal:graft-to-parent( $p, $original-parent/.., function($n){$n/node()}))"/> <xsl:sequence select="$down-function($copied-parent)"/> </xsl:when> <xsl:otherwise> <xsl:sequence select="$n"/> </xsl:otherwise> </xsl:choose> </xsl:function>
The snapshot
function is available for use (and is primarily
intended for use) when a source document is processed using streaming. It can also
be
used when not streaming. The effect is to take a copy of the subtree rooted at the
current node, along with copies of the ancestors and their attributes, and to make
this
available as a normal tree, that can be processed without any of the restrictions
that
apply while streaming, for example only being able to process children once. The copy,
of course, does not include siblings of the context node or of its ancestors, so any
attempt to navigate to these siblings will result in an empty sequence being
returned.
For parentless nodes, the effect of snapshot($x)
is identical to the effect
of copy-of($x)
.
Using snapshot()
while streaming:
This example copies from the source document all employees who work in marketing and
are based in Dubai. It assumes that employees are grouped by location. Because there
are two accesses using the child axis (referencing department
and
salary
), it is not possible to do this without buffering each
employee in memory. The snapshot
function is used in preference
to the simpler copy-of
so that access to attributes of the
parent location
element remains possible.
<xsl:source-document streamable="yes" href="employees.xml"> <xsl:for-each select="snapshot(locations/location[@name='Dubai'] /employee)[department='Marketing']"> <employee> <location code="{../@code}"/> <salary value="{salary}"/> </employee> </xsl:for-each> </xsl:source-document>