13 Sorting

[Definition: A sort key specification is a sequence of one or more adjacent xsl:sort elements which together define rules for sorting the items in an input sequence to form a sorted sequence.]

[Definition: Within a sort key specification, each xsl:sort element defines one sort key component.] The first xsl:sort element specifies the primary component of the sort key specification, the second xsl:sort element specifies the secondary component of the sort key specification, and so on.

A sort key specification may occur immediately within an xsl:apply-templates, xsl:for-each, xsl:perform-sort, or xsl:for-each-group element.

Note:

When used within xsl:for-each, xsl:for-each-group, or xsl:perform-sort, xsl:sort elements must occur before any other children.

13.1 The xsl:sort Element

<xsl:sort
  select? = expression
  lang? = { language }
  order? = { "ascending" | "descending" }
  collation? = { uri }
  stable? = { boolean }
  case-order? = { "upper-first" | "lower-first" }
  data-type? = { "text" | "number" | eqname } >
  <!-- Content: sequence-constructor -->
</xsl:sort>

The xsl:sort element defines a sort key component. A sort key component specifies how a sort key value is to be computed for each item in the sequence being sorted, and also how two sort key values are to be compared.

The value of a sort key component is determined either by its select attribute or by the contained sequence constructor. If neither is present, the default is select=".", which has the effect of sorting on the actual value of the item if it is an atomic value, or on the typed-value of the item if it is a node. If a select attribute is present, its value must be an XPath expression.

[ERR XTSE1015] It is a static error if an xsl:sort element with a select attribute has non-empty content.

Those attributes of the xsl:sort elements whose values are attribute value templates are evaluated using the same focus as is used to evaluate the select attribute of the containing instruction (specifically, xsl:apply-templates, xsl:for-each, xsl:for-each-group, or xsl:perform-sort).

The stable attribute is permitted only on the first xsl:sort element within a sort key specification.

[ERR XTSE1017] It is a static error if an xsl:sort element other than the first in a sequence of sibling xsl:sort elements has a stable attribute.

[Definition: A sort key specification is said to be stable if its first xsl:sort element has no stable attribute, or has a stable attribute whose effective value is yes.]

13.1.1 The Sorting Process

[Definition: The sequence to be sorted is referred to as the initial sequence.]

[Definition: The sequence after sorting as defined by the xsl:sort elements is referred to as the sorted sequence.]

[Definition:  For each item in the initial sequence, a value is computed for each sort key component within the sort key specification. The value computed for an item by using the Nth sort key component is referred to as the Nth sort key value of that item.]

The items in the initial sequence are ordered into a sorted sequence by comparing their sort key values. The relative position of two items A and B in the sorted sequence is determined as follows. The first sort key value of A is compared with the first sort key value of B, according to the rules of the first sort key component. If, under these rules, A is less than B, then A will precede B in the sorted sequence, unless the order attribute of this sort key component specifies descending, in which case B will precede A in the sorted sequence. If, however, the relevant sort key values compare equal, then the second sort key value of A is compared with the second sort key value of B, according to the rules of the second sort key component. This continues until two sort key values are found that compare unequal. If all the sort key values compare equal, and the sort key specification is stable, then A will precede B in the sorted sequence if and only if A preceded B in the initial sequence. If all the sort key values compare equal, and the sort key specification is not stable, then the relative order of A and B in the sorted sequence is implementation-dependent.

Note:

If two items have equal sort key values, and the sort is stable, then their order in the sorted sequence will be the same as their order in the initial sequence, regardless of whether order="descending" was specified on any or all of the sort key components.

The Nth sort key value is computed by evaluating either the select attribute or the contained sequence constructor of the Nth xsl:sort element, or the expression . (dot) if neither is present. This evaluation is done with the focus set as follows:

Note:

As in any other XPath expression, the current function may be used within the select expression of xsl:sort to refer to the item that is the context item for the expression as a whole; that is, the item whose sort key value is being computed.

The sort key values are atomized, and are then compared. The way they are compared depends on their datatype, as described in the next section.

13.1.2 Comparing Sort Key Values

It is possible to force the system to compare sort key values using the rules for a particular datatype by including a cast as part of the sort key component. For example, <xsl:sort select="xs:date(@dob)"/> will force the attributes to be compared as dates. In the absence of such a cast, the sort key values are compared using the rules appropriate to their datatype. Any values of type xs:untypedAtomic are cast to xs:string.

For backwards compatibility with XSLT 1.0, the data-type attribute remains available. If this has the effective value text, the atomized sort key values are converted to strings before being compared. If it has the effective value number, the atomized sort key values are converted to doubles before being compared. The conversion is done by using the stringFO30 or numberFO30 function as appropriate. If the data-type attribute has any other effective value, then this value must be an EQName denoting an expanded QName with a non-absent namespace, and the effect of the attribute is implementation-defined.

[ERR XTTE1020] If any sort key value, after atomization and any type conversion required by the data-type attribute, is a sequence containing more than one item, then the effect depends on whether the xsl:sort element is processed with XSLT 1.0 behavior. With XSLT 1.0 behavior, the effective sort key value is the first item in the sequence. In other cases, this is a type error.

The set of sort key values (after any conversion) is first divided into two categories: empty values, and ordinary values. The empty sort key values represent those items where the sort key value is an empty sequence. These values are considered for sorting purposes to be equal to each other, but less than any other value. The remaining values are classified as ordinary values.

[ERR XTDE1030] It is a dynamic error if, for any sort key component, the set of sort key values evaluated for all the items in the initial sequence, after any type conversion requested, contains a pair of ordinary values for which the result of the XPath lt operator is an error. If the processor is able to detect the error statically, it may optionally signal it as a static error.

Note:

The above error condition may occur if the values to be sorted are of a type that does not support ordering (for example, xs:QName) or if the sequence is heterogeneous (for example, if it contains both strings and numbers). The error can generally be prevented by invoking a cast or constructor function within the sort key component.

The error condition is subject to the usual caveat that a processor is not required to evaluate any expression solely in order to determine whether it raises an error. For example, if there are several sort key components, then a processor is not required to evaluate or compare minor sort key values unless the corresponding major sort key values are equal.

In general, comparison of two ordinary values is performed according to the rules of the XPath lt operator. To ensure a total ordering, the same implementation of the lt operator must be used for all the comparisons: the one that is chosen is the one appropriate to the most specific type to which all the values can be converted by subtype substitution and/or type promotion. For example, if the sequence contains both xs:decimal and xs:double values, then the values are compared using xs:double comparison, even when comparing two xs:decimal values. NaN values, for sorting purposes, are considered to be equal to each other, and less than any other numeric value. Special rules also apply to the xs:string and xs:anyURI types, and types derived by restriction therefrom, as described in the next section.

13.1.3 Sorting Using Collations

The rules given in this section apply when comparing values whose type is xs:string or a type derived by restriction from xs:string, or whose type is xs:anyURI or a type derived by restriction from xs:anyURI.

[Definition: Facilities in XSLT 3.0 and XPath 3.0 that require strings to be ordered rely on the concept of a named collation. A collation is a set of rules that determine whether two strings are equal, and if not, which of them is to be sorted before the other.] A collation is identified by a URI, but the manner in which this URI is associated with an actual rule or algorithm is largely implementation-defined.

For more information about collations, see Section 5.3 Comparison of strings FO30 in [Functions and Operators 3.0]. Some specifications, for example [UNICODE TR10], use the term “collation” to describe rules that can be tailored or parameterized for various purposes. In this specification, a collation URI refers to a collation in which all such parameters have already been fixed. Therefore, if a collation URI is specified, other attributes such as case-order and lang are ignored.

Every implementation must recognize the collation URI http://www.w3.org/2005/xpath-functions/collation/codepoint, which provides the ability to compare strings based on the Unicode codepoint values of the characters in the string.

Furthermore, every implementation must recognize collation URIs representing tailorings of the Unicode Collation Algorithm (UCA), as described in 13.4 The Unicode Collation Algorithm. Although this form of collation URI must be recognized, implementations are not required to support every possible tailoring.

If the xsl:sort element has a collation attribute, then the strings are compared according to the rules for the named collation: that is, they are compared using the XPath function call compare($a, $b, $collation).

If the effective value of the collation attribute of xsl:sort is a relative URI, then it is resolved against the base URI of the xsl:sort element.

[ERR XTDE1035] It is a dynamic error if the collation attribute of xsl:sort (after resolving against the base URI) is not a URI that is recognized by the implementation as referring to a collation.

Note:

It is entirely for the implementation to determine whether it recognizes a particular collation URI. For example, if the implementation allows collation URIs to contain parameters in the query part of the URI, it is the implementation that determines whether a URI containing an unknown or invalid parameter is or is not a recognized collation URI. The fact that this situation is described as an error thus does not prevent an implementation applying a fallback collation if it chooses to do so.

The lang and case-order attributes are ignored if a collation attribute is present. But in the absence of a collation attribute, these attributes provide input to an implementation-defined algorithm to locate a suitable collation:

  • The lang attribute indicates that a collation suitable for a particular natural language should be used. The effective value of the attribute must either be a string in the value space of xs:language, or a zero-length string. Supplying the zero-length string has the same effect as omitting the attribute. If a language is requested that is not supported, the processor may use a fallback language identified by removing successive hyphen-separated suffixes from the supplied value until a supported language code is obtained; failing this, the processor behaves as if the lang attribute were omitted.

    Note:

    The fallback algorithm described above is identical to the rules in RFC4647 Basic Filtering used in BCP 47, and is specified in [RFC4647] in greater detail.

  • The case-order attribute indicates whether the desired collation should sort upper-case letters before lower-case or vice versa. The effective value of the attribute must be either lower-first (indicating that lower-case letters precede upper-case letters in the collating sequence) or upper-first (indicating that upper-case letters precede lower-case).

    When lower-first is requested, the returned collation should have the property that when two strings differ only in the case of one or more characters, then a string in which the first differing character is lower-case should precede a string in which the corresponding character is title-case, which should in turn precede a string in which the corresponding character is upper-case. When upper-first is requested, the returned collation should have the property that when two strings differ only in the case of one or more characters, then a string in which the first differing character is upper-case should precede a string in which the corresponding character is title-case, which should in turn precede a string in which the corresponding character is lower-case.

    So, for example, if lang="en", then A a B b are sorted with case-order="upper-first" and a A b B are sorted with case-order="lower-first".

    As a further example, if lower-first is requested, then a sorted sequence might be “MacAndrew, macintosh, macIntosh, Macintosh, MacIntosh, macintoshes, Macintoshes, McIntosh”. If upper-first is requested, the same sequence would sort as “MacAndrew, MacIntosh, Macintosh, macIntosh, macintosh, MacIntoshes, macintoshes, McIntosh”.

If none of the collation, lang, or case-order attributes is present, the collation is chosen in an implementation-defined way. It is not required that the default collation for sorting should be the same as the default collation used when evaluating XPath expressions, as described in 5.3.1 Initializing the Static Context and 3.7.1 The default-collation Attribute.

Note:

It is usually appropriate, when sorting, to use a strong collation, that is, one that takes account of secondary differences (accents) and tertiary differences (case) between strings that are otherwise equal. A weak collation, which ignores such differences, may be more suitable when comparing strings for equality.

Useful background information on international sorting is provided in [UNICODE TR10]. The case-order attribute may be interpreted as described in section 6.6 of [UNICODE TR10].

13.2 Creating a Sorted Sequence

<!-- Category: instruction -->
<xsl:perform-sort
  select? = expression >
  <!-- Content: (xsl:sort+, sequence-constructor) -->
</xsl:perform-sort>

The xsl:perform-sort instruction is used to return a sorted sequence.

The initial sequence is obtained either by evaluating the select attribute or by evaluating the contained sequence constructor (but not both). If there is no select attribute and no sequence constructor then the initial sequence (and therefore, the sorted sequence) is an empty sequence.

[ERR XTSE1040] It is a static error if an xsl:perform-sort instruction with a select attribute has any content other than xsl:sort and xsl:fallback instructions.

The result of the xsl:perform-sort instruction is the result of sorting its initial sequence using its contained sort key specification.

Example: Sorting a Sequence of Atomic Values

The following stylesheet function sorts a sequence of atomic values using the value itself as the sort key.

<xsl:function name="local:sort" 
          as="xs:anyAtomicType*">
  <xsl:param name="in" as="xs:anyAtomicType*"/>
  <xsl:perform-sort select="$in">
    <xsl:sort select="."/>
  </xsl:perform-sort>
</xsl:function>

 

Example: Writing a Function to Perform a Sort

The following example defines a function that sorts books by price, and uses this function to output the five books that have the lowest prices:

<xsl:function name="bib:books-by-price" 
          as="schema-element(bib:book)*">
  <xsl:param name="in" as="schema-element(bib:book)*"/>
  <xsl:perform-sort select="$in">
    <xsl:sort select="xs:decimal(bib:price)"/>
  </xsl:perform-sort>
</xsl:function>
   ...
   <xsl:copy-of select="bib:books-by-price(//bib:book)
                             [position() = 1 to 5]"/>

 

13.3 Processing a Sequence in Sorted Order

When used within xsl:for-each or xsl:apply-templates, a sort key specification indicates that the sequence of items selected by that instruction is to be processed in sorted order, not in the order of the supplied sequence.

Example: Processing Elements in Sorted Order

For example, suppose an employee database has the form

<employees>
  <employee>
    <name>
      <given>James</given>
      <family>Clark</family>
    </name>
    ...
  </employee>
</employees>

Then a list of employees sorted by name could be generated using:

<xsl:template match="employees">
  <ul>
    <xsl:apply-templates select="employee">
      <xsl:sort select="name/family"/>
      <xsl:sort select="name/given"/>
    </xsl:apply-templates>
  </ul>
</xsl:template>

<xsl:template match="employee">
  <li>
    <xsl:value-of select="name/given"/>
    <xsl:text> </xsl:text>
    <xsl:value-of select="name/family"/>
  </li>
</xsl:template>

When used within xsl:for-each-group, a sort key specification indicates the order in which the groups are to be processed. For the effect of xsl:for-each-group, see 14 Grouping.

13.4 The Unicode Collation Algorithm

The description of the Unicode Collation Algorithm in this section is technically identical to the description found in [XPath 3.1]. The description here is to be used by a processor that does not implement the XPath 3.1 Feature; if the processor does implement the XPath 3.1 Feature, the description in [XPath 3.1] applies.

XSLT 3.0 defines a family of collation URIs representing tailorings of the Unicode Collation Algorithm (UCA) as defined in [UNICODE TR10]. The parameters used for tailoring the UCA are based on the parameters defined in the Locale Data Markup Language (LDML), defined in [UNICODE TR35].

This family of URIs use the scheme and path http://www.w3.org/2013/collation/UCA followed by an optional query part. The query part, if present, consists of a question mark followed by a sequence of zero or more semicolon-separated parameters. Each parameter is a keyword-value pair, the keyword and value being separated by an equals sign.

All implementations must recognize URIs in this family. This applies to all places where collations are used, including (for example) the xsl:sort, xsl:key, xsl:for-each-group, and xsl:merge-key elements, the [xsl:]default-collation attribute, and the collation argument of functions such as containsFO30, maxFO30, and collation-key. If the fallback parameter is present with the value no, then the implementation must either use a collation that conforms with the rules in the Unicode specifications for the requested tailoring, or fail with a static or dynamic error indicating that it does not provide the collation (the error code should be the same as if the collation URI were not recognized). If the fallback parameter is omitted or takes the value yes, and if the collation URI is well-formed according to the rules in this section, then the implementation must accept the collation URI, and should use the available collation that most closely reflects the user’s intentions. For example, if the collation URI requested is http://www.w3.org/2013/collation/UCA?lang=se;fallback=yes and the implementation does not include a fully conformant version of the UCA tailored for Swedish, then it may choose to use a Swedish collation that is known to differ from the UCA definition, or one whose conformance has not been established. It might even, as a last resort, fall back to using codepoint collation.

If two query parameters use the same keyword then the last one wins. If a query parameter uses a keyword or value which is not defined in this specification then the meaning is implementation-defined. If the implementation recognizes the meaning of the keyword and value then it should interpret it accordingly; if it does not recognize the keyword or value then if the fallback parameter is present with the value no it should reject the collation as unsupported, otherwise it should ignore the unrecognized parameter.

The following query parameters are defined. If any parameter is absent, the default is implementation-defined except where otherwise stated. The meaning given for each parameter is non-normative; the normative specification is found in [UNICODE TR35].

Options for the Unicode Collation Algorithm
Keyword Values Meaning
fallback yes | no (default yes) Determines whether the processor uses a fallback collation if a conformant collation is not available.
lang language code, as defined for the lang attribute of xsl:sort The language whose collation conventions are to be used.
version string The version number of the UCA to be used.
strength primary | secondary | tertiary | quaternary | identical, or 1|2|3|4|5 as synonyms The collation strength as defined in UCA. Primary strength takes only the base form of the character into account (so A=a=Â=â); secondary strength ignores case but considers accents and diacritics as significant (so A=a and Â=â but â!=a); tertiary considers case as significant (A!=a!=Â!=â); quaternary considers spaces and punctuation that would otherwise be ignored (for example data-base=database).
maxVariable space | punct | symbol | currency (default punct) Indicates that all characters in the specified group and earlier groups are treated as "noise" characters to be handled as defined by the alternate parameter. For example, maxVariable=punct indicates that characters classified as whitespace or punctuation get this treatment.
alternate non-ignorable | shifted | blanked (default non-ignorable) Controls the handling of characters such as spaces and hyphens; specifically, the “noise” characters in the groups selected by the maxVariable parameter. The value non-ignorable indicates that such characters are treated as distinct at the primary level (so data base sorts before datatype); shifted indicates that they are are used to differentiate two strings only at the quaternary level, and blanked indicates that they are taken into account only at the identical level.
backwards yes | no (default no) The value backwards=yes indicates that the last accent in the search term is the most significant.
normalization yes | no (default no) Indicates whether search terms are converted to normalization form D.
caseLevel yes | no (default no) When used with primary strength, setting caseLevel=yes has the effect of ignoring accents while taking account of case.
caseFirst upper | lower Indicates whether upper-case precedes lower-case or vice versa.
numeric yes | no (default no) When numeric=yes is specified, a sequence of consecutive digits is interpreted as a number, for example chap2 sorts before chap12.
reorder a comma-separated sequence of reorder codes, where a reorder code is one of space, punct, symbol, currency, digit, or a four-letter script code defined in [ISO 15924 Register], the register of scripts maintained by the Unicode Consortium in its capacity as registration authority for [ISO 15924]. Determines the relative ordering of text in different scripts; for example the value digit,Grek,Latn indicates that digits precede Greek letters, which precede Latin letters.

Note:

This list excludes parameters that are inconvenient to express in a URI, or that are applicable only to substring matching.