[Definition: A sort key
specification is a sequence of one or more adjacent
xsl:sort
elements which together define rules for sorting the
items in an input sequence to form a sorted sequence.]
[Definition: Within a sort key specification, each
xsl:sort
element defines one sort key
component.] The first xsl:sort
element specifies the
primary component of the sort key specification, the second xsl:sort
element specifies the secondary component of the sort key specification, and so on.
A sort key specification may occur immediately within an
xsl:apply-templates
, xsl:for-each
,
xsl:perform-sort
, or xsl:for-each-group
element.
Note:
When used within xsl:for-each
, xsl:for-each-group
,
or xsl:perform-sort
, xsl:sort
elements must occur
before any other children.
xsl:sort
Element<xsl:sort
select? = expression
lang? = { language }
order? = { "ascending" | "descending" }
collation? = { uri }
stable? = { boolean }
case-order? = { "upper-first" | "lower-first" }
data-type? = { "text" | "number" | eqname } >
<!-- Content: sequence-constructor -->
</xsl:sort>
The xsl:sort
element defines a sort key component. A sort key component specifies how a sort key value is to be computed for each item
in the sequence being sorted, and also how two sort key values are to be
compared.
The value of a sort key component is
determined either by its select
attribute or by the contained sequence constructor. If neither is
present, the default is select="."
, which has the effect of sorting on
the actual value of the item if it is an atomic value, or on the typed-value of the
item if it is a node. If a select
attribute is present, its value
must be an XPath expression.
[ERR XTSE1015] It is a static error if an
xsl:sort
element with a select
attribute has
non-empty content.
Those attributes of the xsl:sort
elements whose values are attribute value templates are
evaluated using the same focus as is used to
evaluate the select
attribute of the containing instruction
(specifically, xsl:apply-templates
, xsl:for-each
,
xsl:for-each-group
, or xsl:perform-sort
).
The stable
attribute is permitted only on the first
xsl:sort
element within a sort key specification.
[ERR XTSE1017] It is a static error if an
xsl:sort
element other than the first in a sequence of
sibling xsl:sort
elements has a stable
attribute.
[Definition: A sort
key specification is said to be stable if its first
xsl:sort
element has no stable
attribute, or has
a stable
attribute whose effective
value is yes
.]
[Definition: The sequence to be sorted is referred to as the initial sequence.]
[Definition: The sequence after sorting
as defined by the xsl:sort
elements is referred to as the
sorted sequence.]
[Definition: For each item in the initial sequence, a value is computed for each sort key component within the sort key specification. The value computed for an item by using the Nth sort key component is referred to as the Nth sort key value of that item.]
The items in the initial sequence are
ordered into a sorted sequence by
comparing their sort key values. The
relative position of two items A and B in the sorted
sequence is determined as follows. The first sort key value of A is
compared with the first sort key value of B, according to the rules of
the first sort key component. If,
under these rules, A is less than B, then A will
precede B in the sorted sequence, unless the order
attribute of this sort key
component specifies descending
, in which case
B will precede A in the sorted sequence. If, however, the
relevant sort key values compare equal, then the second sort key value of
A is compared with the second sort key value of B,
according to the rules of the second sort key
component. This continues until two sort key values are found that
compare unequal. If all the sort key values compare equal, and the sort key specification is stable, then A will precede B
in the sorted sequence if and only if
A preceded B in the initial sequence. If all the sort key values compare equal, and the
sort key specification is
not stable, then the relative order of
A and B in the sorted
sequence is implementation-dependent.
Note:
If two items have equal sort key
values, and the sort is stable,
then their order in the sorted
sequence will be the same as their order in the initial sequence, regardless of whether
order="descending"
was specified on any or all of the sort key components.
The Nth sort key value is computed by evaluating either the
select
attribute or the contained sequence constructor of the
Nth xsl:sort
element, or the expression
.
(dot) if neither is present. This evaluation is done with the
focus set as follows:
The context item is the item in the initial sequence whose sort key value is being computed.
The context position is the position of that item in the initial sequence.
The context size is the size of the initial sequence.
Note:
As in any other XPath expression, the current
function may
be used within the select
expression of xsl:sort
to refer to the item that is the context item for the expression as a whole;
that is, the item whose sort key
value is being computed.
The sort key values are atomized, and are then compared. The way they are compared depends on their datatype, as described in the next section.
It is possible to force the system to compare sort key values using the rules for a particular datatype by
including a cast as part of the sort key
component. For example, <xsl:sort
select="xs:date(@dob)"/>
will force the attributes to be compared as
dates. In the absence of such a cast, the sort key values are compared using the
rules appropriate to their datatype. Any values of type
xs:untypedAtomic
are cast to xs:string
.
For backwards compatibility with XSLT 1.0, the data-type
attribute
remains available. If this has the effective
value
text
, the atomized sort key
values are converted to strings before being compared. If it has the
effective value number
, the atomized sort key values are converted to
doubles before being compared. The conversion is done by using the
string
FO30 or number
FO30 function as
appropriate. If the data-type
attribute has
any other effective value, then
this value must be an EQName denoting an expanded
QName with a non-absent namespace, and the effect of the
attribute is implementation-defined.
[ERR XTTE1020] If any sort key value, after
atomization and any type
conversion required by the data-type
attribute, is a sequence containing more than one item, then the effect
depends on whether the xsl:sort
element is processed with XSLT 1.0
behavior. With XSLT 1.0 behavior, the effective
sort key value is the first item in the sequence. In other cases, this is a
type error.
The set of sort key values (after any conversion) is first divided into two categories: empty values, and ordinary values. The empty sort key values represent those items where the sort key value is an empty sequence. These values are considered for sorting purposes to be equal to each other, but less than any other value. The remaining values are classified as ordinary values.
[ERR XTDE1030] It is a dynamic error if, for any sort key component, the set of
sort key values evaluated for
all the items in the initial
sequence, after any type conversion requested, contains a pair
of ordinary values for which the result of the XPath lt
operator is an error. If the processor is
able to detect the error statically, it may optionally
signal it as a static
error.
Note:
The above error condition may occur if the values to be sorted are of a type
that does not support ordering (for example, xs:QName
) or if the
sequence is heterogeneous (for example, if it contains both strings and
numbers). The error can generally be prevented by invoking a cast or
constructor function within the sort key component.
The error condition is subject to the usual caveat that a processor is not required to evaluate any expression solely in order to determine whether it raises an error. For example, if there are several sort key components, then a processor is not required to evaluate or compare minor sort key values unless the corresponding major sort key values are equal.
In general, comparison of two ordinary values is performed according to the rules
of the XPath lt
operator. To ensure a total ordering, the same
implementation of the lt
operator must be used for
all the comparisons: the one that is chosen is the one appropriate to the most
specific type to which all the values can be converted by subtype substitution
and/or type promotion. For example, if the sequence contains both
xs:decimal
and xs:double
values, then the values are
compared using xs:double
comparison, even when comparing two
xs:decimal
values. NaN values, for sorting purposes, are
considered to be equal to each other, and less than any other numeric value.
Special rules also apply to the xs:string
and xs:anyURI
types, and types derived by restriction therefrom, as described in the next
section.
The rules given in this section apply when comparing values whose type is
xs:string
or a type derived by restriction from
xs:string
, or whose type is xs:anyURI
or a type
derived by restriction from xs:anyURI
.
[Definition: Facilities in XSLT 3.0 and XPath 3.0 that require strings to be ordered rely on the concept of a named collation. A collation is a set of rules that determine whether two strings are equal, and if not, which of them is to be sorted before the other.] A collation is identified by a URI, but the manner in which this URI is associated with an actual rule or algorithm is largely implementation-defined.
For more information about collations, see Section
5.3 Comparison of strings
FO30 in [Functions and Operators 3.0]. Some
specifications, for example [UNICODE TR10], use the term “collation”
to describe rules that can be tailored or parameterized for various purposes. In
this specification, a collation URI refers to a collation in which all such
parameters have already been fixed. Therefore, if a collation URI is specified,
other attributes such as case-order
and lang
are
ignored.
Every implementation must
recognize the collation URI
http://www.w3.org/2005/xpath-functions/collation/codepoint
, which
provides the ability to compare strings based on the Unicode codepoint values of
the characters in the string.
Furthermore, every implementation must recognize collation URIs representing tailorings of the Unicode Collation Algorithm (UCA), as described in 13.4 The Unicode Collation Algorithm. Although this form of collation URI must be recognized, implementations are not required to support every possible tailoring.
If the xsl:sort
element has a collation
attribute,
then the strings are compared according to the rules for the named collation: that is, they are compared using the
XPath function call compare($a, $b, $collation)
.
If the effective value of the
collation
attribute of xsl:sort
is a relative
URI, then it is resolved against the base URI of the xsl:sort
element.
[ERR XTDE1035] It is a dynamic error if the
collation
attribute of xsl:sort
(after
resolving against the base URI) is not a URI that is recognized by the
implementation as referring to a collation.
Note:
It is entirely for the implementation to determine whether it recognizes a particular collation URI. For example, if the implementation allows collation URIs to contain parameters in the query part of the URI, it is the implementation that determines whether a URI containing an unknown or invalid parameter is or is not a recognized collation URI. The fact that this situation is described as an error thus does not prevent an implementation applying a fallback collation if it chooses to do so.
The lang
and case-order
attributes are ignored if a
collation
attribute is present. But in the absence of a
collation
attribute, these attributes provide input to an implementation-defined algorithm to
locate a suitable collation:
The lang
attribute indicates that a collation suitable for a
particular natural language should be used. The effective value of the attribute
must
either be a string in the value space of
xs:language
, or a zero-length string. Supplying the
zero-length string has the same effect as omitting the attribute. If a
language is requested that is not supported, the processor
may use a fallback language identified by removing
successive hyphen-separated suffixes from the supplied value until a
supported language code is obtained; failing this, the processor behaves
as if the lang
attribute were omitted.
Note:
The fallback algorithm described above is identical to the rules in RFC4647 Basic Filtering used in BCP 47, and is specified in [RFC4647] in greater detail.
The case-order
attribute indicates whether the desired
collation should sort upper-case letters before
lower-case or vice versa. The effective
value of the attribute must be either
lower-first
(indicating that lower-case letters precede
upper-case letters in the collating sequence) or upper-first
(indicating that upper-case letters precede lower-case).
When lower-first
is requested, the returned collation
should have the property that when two strings differ
only in the case of one or more characters, then a string in which the first
differing character is lower-case should precede a string in which the
corresponding character is title-case, which should in turn precede a string
in which the corresponding character is upper-case. When upper-first is
requested, the returned collation should have the
property that when two strings differ only in the case of one or more
characters, then a string in which the first differing character is
upper-case should precede a string in which the corresponding character is
title-case, which should in turn precede a string in which the corresponding
character is lower-case.
So, for example, if lang="en"
, then A a B b
are
sorted with case-order="upper-first"
and a A b B
are sorted with case-order="lower-first"
.
As a further example, if lower-first
is requested, then a sorted sequence
might be “MacAndrew, macintosh, macIntosh, Macintosh, MacIntosh,
macintoshes, Macintoshes, McIntosh”. If upper-first
is requested, the same
sequence would sort as “MacAndrew, MacIntosh, Macintosh, macIntosh,
macintosh, MacIntoshes, macintoshes, McIntosh”.
If none of the collation
, lang
, or
case-order
attributes is present, the collation is chosen in an
implementation-defined way. It is not
required that the default collation for sorting should be
the same as the default collation
used when evaluating XPath expressions, as described in 5.3.1 Initializing the Static Context and 3.7.1 The default-collation Attribute.
Note:
It is usually appropriate, when sorting, to use a strong collation, that is, one that takes account of secondary differences (accents) and tertiary differences (case) between strings that are otherwise equal. A weak collation, which ignores such differences, may be more suitable when comparing strings for equality.
Useful background information on international sorting is provided in [UNICODE TR10]. The case-order
attribute may be
interpreted as described in section 6.6 of [UNICODE TR10].
<!-- Category: instruction -->
<xsl:perform-sort
select? = expression >
<!-- Content: (xsl:sort+, sequence-constructor) -->
</xsl:perform-sort>
The xsl:perform-sort
instruction is used to return a sorted sequence.
The initial sequence is obtained either
by evaluating the select
attribute or by evaluating the contained
sequence constructor (but not both). If there is no select
attribute and
no sequence constructor then the initial
sequence (and therefore, the sorted
sequence) is an empty sequence.
[ERR XTSE1040] It is a static error if an
xsl:perform-sort
instruction with a select
attribute has any content other than xsl:sort
and
xsl:fallback
instructions.
The result of the xsl:perform-sort
instruction is the result of
sorting its initial sequence using its
contained sort key
specification.
The following stylesheet function sorts a sequence of atomic values using the value itself as the sort key.
<xsl:function name="local:sort" as="xs:anyAtomicType*"> <xsl:param name="in" as="xs:anyAtomicType*"/> <xsl:perform-sort select="$in"> <xsl:sort select="."/> </xsl:perform-sort> </xsl:function>
The following example defines a function that sorts books by price, and uses this function to output the five books that have the lowest prices:
<xsl:function name="bib:books-by-price" as="schema-element(bib:book)*"> <xsl:param name="in" as="schema-element(bib:book)*"/> <xsl:perform-sort select="$in"> <xsl:sort select="xs:decimal(bib:price)"/> </xsl:perform-sort> </xsl:function> ... <xsl:copy-of select="bib:books-by-price(//bib:book) [position() = 1 to 5]"/>
When used within xsl:for-each
or
xsl:apply-templates
, a sort key specification indicates that the sequence of items selected by
that instruction is to be processed in sorted order, not in the order of the supplied
sequence.
For example, suppose an employee database has the form
<employees> <employee> <name> <given>James</given> <family>Clark</family> </name> ... </employee> </employees>
Then a list of employees sorted by name could be generated using:
<xsl:template match="employees"> <ul> <xsl:apply-templates select="employee"> <xsl:sort select="name/family"/> <xsl:sort select="name/given"/> </xsl:apply-templates> </ul> </xsl:template> <xsl:template match="employee"> <li> <xsl:value-of select="name/given"/> <xsl:text> </xsl:text> <xsl:value-of select="name/family"/> </li> </xsl:template>
When used within xsl:for-each-group
, a sort key specification indicates the
order in which the groups are to be processed. For the effect of
xsl:for-each-group
, see 14 Grouping.
The description of the Unicode Collation Algorithm in this section is technically identical to the description found in [XPath 3.1]. The description here is to be used by a processor that does not implement the XPath 3.1 Feature; if the processor does implement the XPath 3.1 Feature, the description in [XPath 3.1] applies.
XSLT 3.0 defines a family of collation URIs representing tailorings of the Unicode Collation Algorithm (UCA) as defined in [UNICODE TR10]. The parameters used for tailoring the UCA are based on the parameters defined in the Locale Data Markup Language (LDML), defined in [UNICODE TR35].
This family of URIs use the scheme and path
http://www.w3.org/2013/collation/UCA
followed by an optional query
part. The query part, if present, consists of a question mark followed by a sequence
of zero or more semicolon-separated parameters. Each parameter is a keyword-value
pair, the keyword and value being separated by an equals sign.
All implementations must recognize URIs in this family. This applies to all places
where collations are used, including (for example) the xsl:sort
,
xsl:key
, xsl:for-each-group
, and
xsl:merge-key
elements, the [xsl:]default-collation
attribute, and the collation
argument of functions such as contains
FO30,
max
FO30,
and collation-key
.
If the fallback
parameter is present with the
value no
, then the implementation must either use a
collation that conforms with the rules in the Unicode specifications for the
requested tailoring, or fail with a static or dynamic error indicating that it does
not provide the collation (the error code should be the same as if the collation URI
were not recognized). If the fallback
parameter is omitted or takes the
value yes
, and if the collation URI is well-formed according to the
rules in this section, then the implementation must accept the
collation URI, and should use the available collation that most
closely reflects the user’s intentions. For example, if the collation URI
requested is http://www.w3.org/2013/collation/UCA?lang=se;fallback=yes
and the implementation does not include a fully conformant version of the UCA
tailored for Swedish, then it may choose to use a Swedish
collation that is known to differ from the UCA definition, or one whose conformance
has not been established. It might even, as a last resort, fall back to using
codepoint collation.
If two query parameters use the same keyword then the last one wins. If a query
parameter uses a keyword or value which is not defined in this specification then
the
meaning is implementation-defined. If the implementation
recognizes the meaning of the keyword and value then it should
interpret it accordingly; if it does not recognize the keyword or value then if the
fallback
parameter is present with the value no
it
should reject the collation as unsupported, otherwise it should ignore the
unrecognized parameter.
The following query parameters are defined. If any parameter is absent, the default is implementation-defined except where otherwise stated. The meaning given for each parameter is non-normative; the normative specification is found in [UNICODE TR35].
Keyword | Values | Meaning |
---|---|---|
fallback | yes | no (default yes) | Determines whether the processor uses a fallback collation if a conformant collation is not available. |
lang | language code, as defined for the lang attribute of
xsl:sort |
The language whose collation conventions are to be used. |
version | string | The version number of the UCA to be used. |
strength | primary | secondary | tertiary | quaternary | identical, or 1|2|3|4|5 as synonyms | The collation strength as defined in UCA. Primary strength takes only the
base form of the character into account (so A=a=Â=â); secondary strength
ignores case but considers accents and diacritics as significant (so A=a and
Â=â but â!=a); tertiary considers case as significant (A!=a!=Â!=â);
quaternary considers spaces and punctuation that would otherwise be ignored
(for example data-base =database ).
|
maxVariable | space | punct | symbol | currency (default punct) |
Indicates that all characters in the specified group and earlier groups are treated
as "noise" characters
to be handled as defined by the alternate parameter.
For example, maxVariable=punct indicates
that characters classified as whitespace or punctuation get this treatment.
|
alternate | non-ignorable | shifted | blanked (default non-ignorable) | Controls the handling of characters such as spaces and hyphens;
specifically, the “noise” characters in the groups selected by the maxVariable parameter.
The value non-ignorable
indicates that such characters are treated as distinct at the primary level
(so data base sorts before datatype );
shifted indicates that they are are used to differentiate two strings only at the
quaternary level,
and blanked indicates that they are taken into account only at the identical level.
|
backwards | yes | no (default no) | The value backwards=yes indicates that the last accent in the
search term is the most significant.
|
normalization | yes | no (default no) | Indicates whether search terms are converted to normalization form D. |
caseLevel | yes | no (default no) | When used with primary strength, setting caseLevel=yes has the
effect of ignoring accents while taking account of case.
|
caseFirst | upper | lower | Indicates whether upper-case precedes lower-case or vice versa. |
numeric | yes | no (default no) | When numeric=yes is specified, a sequence of consecutive
digits is interpreted as a number, for example chap2 sorts
before chap12 .
|
reorder | a comma-separated sequence of reorder codes, where a reorder code is one of
space , punct , symbol ,
currency , digit , or a four-letter script code
defined in [ISO 15924 Register], the register of scripts
maintained by the Unicode Consortium in its capacity as registration
authority for [ISO 15924].
|
Determines the relative ordering of text in different scripts; for example
the value digit,Grek,Latn indicates that digits precede Greek
letters, which precede Latin letters.
|
Note:
This list excludes parameters that are inconvenient to express in a URI, or that are applicable only to substring matching.