14 Grouping

The facilities described in this section are designed to allow items in a sequence to be grouped based on common values; for example it allows grouping of elements having the same value for a particular attribute, or elements with the same name, or elements with common values for any other expression. Since grouping identifies items with duplicate values, the same facilities also allow selection of the distinct values in a sequence of items, that is, the elimination of duplicates.

Note:

Simple elimination of duplicates can also be achieved using the function distinct-valuesFO30: see [Functions and Operators 3.0].

In addition these facilities allow grouping based on sequential position, for example selecting groups of adjacent para elements. The facilities also provide an easy way to do fixed-size grouping, for example identifying groups of three adjacent nodes, which is useful when arranging data in multiple columns.

For each group of items identified, it is possible to evaluate a sequence constructor for the group. Grouping is nestable to multiple levels so that groups of distinct items can be identified, then from among the distinct groups selected, further sub-grouping of distinct items in the current group can be done.

It is also possible for one item to participate in more than one group.

14.1 The xsl:for-each-group Element

<!-- Category: instruction -->
<xsl:for-each-group
  select = expression
  group-by? = expression
  group-adjacent? = expression
  group-starting-with? = pattern
  group-ending-with? = pattern
  composite? = boolean
  collation? = { uri } >
  <!-- Content: (xsl:sort*, sequence-constructor) -->
</xsl:for-each-group>

This element is an instruction that may be used anywhere within a sequence constructor.

[Definition: The xsl:for-each-group instruction allocates the items in an input sequence into groups of items (that is, it establishes a collection of sequences) based either on common values of a grouping key, or on a pattern that the initial or final item in a group must match.] The sequence constructor that forms the content of the xsl:for-each-group instruction is evaluated once for each of these groups.

[Definition: The sequence of items to be grouped, which is referred to as the population, is determined by evaluating the XPath expression contained in the select attribute.]

[Definition: The population is treated as a sequence; the order of items in this sequence is referred to as population order ].

A group is never empty. If the population is empty, the number of groups will be zero.

The assignment of items to groups depends on the group-by, group-adjacent, group-starting-with, and group-ending-with attributes.

[ERR XTSE1080] These four attributes are mutually exclusive: it is a static error if none of these four attributes is present or if more than one of them is present.

[ERR XTSE1090] It is a static error to specify the collation attribute or the composite attribute if neither the group-by attribute nor group-adjacent attribute is specified.

[Definition: If either of the group-by or group-adjacent attributes is present, then for each item in the population a set of grouping keys is calculated, as follows: the expression contained in the group-by or group-adjacent attribute is evaluated; the result is atomized; and any xs:untypedAtomic values are cast to xs:string. If composite="yes" is specified, there is a single grouping key whose value is the resulting sequence; otherwise, there is a set of grouping keys, consisting of the distinct atomic values present in the result sequence. ]

When calculating grouping keys for an item in the population, the expression contained in the group-by or group-adjacent attribute is evaluated with that item as the context item, with its position in population order as the context position, and with the size of the population as the context size.

If the group-by attribute is present, and if the composite attribute is omitted or takes the value no, then an item in the population may have multiple grouping keys: that is, the group-by expression evaluates to a sequence, and each item in the sequence is treated as a separate grouping key. The item is included in as many groups as there are distinct grouping keys (which may be zero).

If the group-adjacent attribute is used, and if the composite attribute is omitted or takes the value no, then each item in the population must have exactly one grouping key value.

[ERR XTTE1100] It is a type error if the result of evaluating the group-adjacent expression is an empty sequence or a sequence containing more than one item, unless composite="yes" is specified.

Grouping keys are compared using the rules for the deep-equalFO30 function. This means that values of type xs:untypedAtomic will be cast to xs:string before the comparison, and that items that are not comparable using the eq operator are considered to be not equal, that is, they are allocated to different groups. It also means that the value NaN is considered equal to itself. If the values are strings, or untyped atomic values, then if there is a collation attribute the values are compared using the collation specified as the effective value of the collation attribute, resolved if relative against the base URI of the xsl:for-each-group element. If there is no collation attribute then the default collation is used.

[ERR XTDE1110] It is a dynamic error if the collation URI specified to xsl:for-each-group (after resolving against the base URI) is a collation that is not recognized by the implementation. (For notes, [see ERR XTDE1035].)

For more information on collations, see 13.1.3 Sorting Using Collations.

The way in which an xsl:for-each-group element is evaluated depends on which of the four group-defining attributes is present:

In all cases the order of items within each group is predictable, and reflects the original population order, in that the items are processed in population order and each item is appended at the end of zero or more groups.

Note:

As always, a different algorithm may be used if it achieves the same effect.

[Definition: For each group, the item within the group that is first in population order is known as the initial item of the group.]

The sequence constructor contained in the xsl:for-each-group element is evaluated once for each of the groups, in processing order. The sequences that result are concatenated, in processing order, to form the result of the xsl:for-each-group element. Within the sequence constructor, the context item is the initial item of the relevant group, the context position is the position of this group in the processing order of the groups, and the context size is the number of groups This has the effect that within the sequence constructor, a call on position() takes successive values 1, 2, ... last().

14.2 Accessing Information about the Current Group Value

Two pieces of information are available during the processing of each group (that is, while evaluating the sequence constructor contained in the xsl:for-each-group instruction, and also while evaluating the sort key of a group as expressed by the select attribute or sequence constructor of an xsl:sort child of the xsl:for-each-group element):

Information about the current group and the current grouping key is held in the dynamic context, and is available using the current-group and current-grouping-key functions respectively.

In XSLT 2.0, the current group and the current grouping key were passed unchanged through calls of xsl:apply-templates and xsl:call-template, and also xsl:apply-imports and xsl:next-match. This behavior is retained in XSLT 3.0 except in the case where streaming is in use: specifically, if the xsl:apply-templates, xsl:call-template, xsl:apply-imports, or xsl:next-match instruction occurs within a declared-streamable construct (typically, within an xsl:source-document instruction, or within a streamable template rule), then the current group and current grouping key are set to absent in the called template. The reason for this is to allow the streamability of an xsl:for-each-group instruction to be assessed statically, as described in 19.8.4.19 Streamability of xsl:for-each-group.

14.2.1 fn:current-group

Summary

Returns the group currently being processed by an xsl:for-each-group instruction.

Signature
fn:current-group() as item()*
Properties

This function is deterministicFO30, context-dependentFO30, and focus-independentFO30.

Rules

The evaluation context for XPath expressions includes a component called the current group, which is a sequence.

The function current-group returns the sequence of items making up the current group.

The current group is bound during evaluation of the xsl:for-each-group instruction. If no xsl:for-each-group instruction is being evaluated, the current group will be absent: that is, any reference to it will cause a dynamic error.

The effect of invocation constructs on the current group is as follows:

  • If the invocation construct is contained within a declared-streamable construct (for example, if it is within an xsl:source-document instruction with the attribute streamable="yes", or within a streamable template), then the invocation construct sets the current group to absent. In this situation the scope of the current group is effectively static; it can only be referenced within the body of the xsl:for-each-group instruction to which it applies.

  • If the invocation construct is a (static or dynamic) function call, then the invocation construct sets the current group to absent.

  • Otherwise the invocation construct leaves the current group unchanged. In this situation the scope of the current group is effectively dynamic: it can be referenced within called templates and attribute sets.

The current group is initially absent during the evaluation of global variables and stylesheet parameters, during the evaluation of the use attribute or contained sequence constructor of xsl:key, and during the evaluation of the initial-value attribute of xsl:accumulator and the select attribute of contained sequence constructor of xsl:accumulator-rule.

Error Conditions

[ERR XTSE1060] It is a static error if the current-group function is used within a pattern.

[ERR XTDE1061] It is a dynamic error if the current-group function is used when the current group is absent , or when it is invoked in the course of evaluating a pattern. The error may be reported statically if it can be detected statically.

Notes

Like other XSLT extensions to the dynamic evaluation context, the current group is not retained as part of the closure of a function value. This means that the expression current-group#0 is valid and returns a function value, but any invocation of this function will fail with a dynamic error [see ERR XTDE1061].

14.2.2 fn:current-grouping-key

Summary

Returns the grouping key of the group currently being processed using the xsl:for-each-group instruction.

Signature
fn:current-grouping-key() as xs:anyAtomicType*
Properties

This function is deterministicFO30, context-dependentFO30, and focus-independentFO30.

Rules

The evaluation context for XPath expressions includes a component called the current grouping key, which is a sequence of atomic values. The current grouping key is the grouping key shared in common by all the items within the current group.

The function current-grouping-key returns the current grouping key.

The current grouping key is bound during evaluation of an xsl:for-each-group instruction that has a group-by or group-adjacent attribute. If no xsl:for-each-group instruction is being evaluated, the current grouping key will be absent, which means that any reference to it causes a dynamic error. The current grouping key is also set to absent during the evaluation of an xsl:for-each-group instruction with a group-starting-with or group-ending-with attribute.

The effect of invocation constructs on the current grouping key is as follows:

  • If the invocation construct is contained within a declared-streamable construct (for example, if it is within an xsl:source-document instruction with the attribute streamable="yes", or within a streamable template), then the invocation construct sets the current grouping key to absent. In this situation the scope of the current group is effectively static; it can only be referenced within the body of the xsl:for-each-group instruction to which it applies.

  • If the invocation construct is a (static or dynamic) function call, then the invocation construct sets the current grouping key to absent.

  • Otherwise the invocation construct leaves the current grouping key unchanged. In this situation the scope of the current group is effectively dynamic: it can be referenced within called templates and attribute sets.

The current grouping key is initially absent during the evaluation of global variables and stylesheet parameters, during the evaluation of the use attribute or contained sequence constructor of xsl:key, and during the evaluation of the initial-value attribute of xsl:accumulator and the select attribute of contained sequence constructor of xsl:accumulator-rule.

While an xsl:for-each-group instruction with a group-by or group-adjacent attribute is being evaluated, the current grouping key will be a single atomic value if composite="no" is specified (explicitly or implicitly), or a sequence of atomic values if composite="yes" is specified.

At other times, the current grouping key will be absent.

The grouping keys of all items in a group are not necessarily identical. For example, one might be an xs:float while another is a numerically equal xs:decimal. The current-grouping-key function returns the grouping key of the initial item in the group, after atomization and casting of xs:untypedAtomic values to xs:string.

The function takes no arguments.

Error Conditions

[ERR XTSE1070] It is a static error if the current-grouping-key function is used within a pattern.

[ERR XTDE1071] It is a dynamic error if the current-grouping-key function is used when the current grouping key is absent, or when it is invoked in the course of evaluating a pattern. The error may be reported statically if it can be detected statically.

Notes

Like other XSLT extensions to the dynamic evaluation context, the current grouping key is not retained as part of the closure of a function value. This means that the expression current-grouping-key#0 is valid and returns a function value, but any invocation of this function will fail with a dynamic error [see ERR XTDE1071].

14.3 Ordering among Groups

[Definition: There is a total ordering among groups referred to as the order of first appearance. A group G is defined to precede a group H in order of first appearance if the initial item of G precedes the initial item of H in population order. If two groups G and H have the same initial item (because the item is in both groups) then G precedes H if the grouping key of G precedes the grouping key of H in the sequence that results from evaluating the group-by expression of this initial item.]

[Definition: There is another total ordering among groups referred to as processing order. If group R precedes group S in processing order, then in the result sequence returned by the xsl:for-each-group instruction the items generated by processing group R will precede the items generated by processing group S.]

If there are no xsl:sort elements immediately within the xsl:for-each-group element, the processing order of the groups is the order of first appearance.

Otherwise, the xsl:sort elements immediately within the xsl:for-each-group element define the processing order of the groups (see 13 Sorting). They do not affect the order of items within each group. Multiple sort key components are allowed, and are evaluated in major-to-minor order. If two groups have the same values for all their sort key components, they are processed in order of first appearance if the sort key specification is stable, otherwise in an implementation-dependent order.

The select expression of an xsl:sort element is evaluated once for each group. During this evaluation, the context item is the initial item of the group, the context position is the position of this item within the set of initial items (that is, one item for each group in the population) in population order, the context size is the number of groups, the current group is the group whose sort key value is being determined, and the current grouping key is the grouping key for that group. If the xsl:for-each-group instruction uses the group-starting-with or group-ending-with attributes, then the current grouping key is absent.

Example: Sorting Groups

For example, this means that if the grouping key is @category, you can sort the groups in order of their grouping key by writing <xsl:sort select="current-grouping-key()"/>; or you can sort the groups in order of size by writing <xsl:sort select="count(current-group())"/>

14.4 Examples of Grouping

Example: Grouping Nodes based on Common Values

The following example groups a list of nodes based on common values. The resulting groups are numbered and sorted, and a total is calculated for each group.

Source XML document:

<cities>
  <city name="Milano"  country="Italia"      pop="5"/>
  <city name="Paris"   country="France"      pop="7"/>
  <city name="München" country="Deutschland" pop="4"/>
  <city name="Lyon"    country="France"      pop="2"/>
  <city name="Venezia" country="Italia"      pop="1"/>
</cities>

More specifically, the aim is to produce a four-column table, containing one row for each distinct country. The four columns are to contain first, a sequence number giving the number of the row; second, the name of the country, third, a comma-separated alphabetical list of the city names within that country, and fourth, the sum of the pop attribute for the cities in that country.

Desired output:

<table>
  <tr>
    <th>Position</th>
    <th>Country</th>
    <th>List of Cities</th>
    <th>Population</th>
  </tr>
  <tr>
    <td>1</td>
    <td>Italia</td>
    <td>Milano, Venezia</td>
    <td>6</td>
  </tr>
  <tr>
    <td>2</td>
    <td>France</td>
    <td>Lyon, Paris</td>
    <td>9</td>
  </tr>  
  <tr>
    <td>3</td>
    <td>Deutschland</td>
    <td>München</td>
    <td>4</td>
  </tr>  
</table>

Solution:

<table xsl:version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <tr>
    <th>Position</th>
    <th>Country</th>
    <th>City List</th>
    <th>Population</th>
  </tr>
  <xsl:for-each-group select="cities/city" group-by="@country">
    <tr>
      <td><xsl:value-of select="position()"/></td>
      <td><xsl:value-of select="current-grouping-key()"/></td>
      <td>
        <xsl:for-each select="current-group()/@name">
          <xsl:sort select="."/>
          <xsl:if test="position() ne 1">, </xsl:if>
          <xsl:value-of select="."/>
        </xsl:for-each>  
      </td>
      <td><xsl:value-of select="sum(current-group()/@pop)"/></td>
    </tr>
  </xsl:for-each-group>
</table>

 

Example: A Composite Grouping Key

Sometimes it is necessary to use a composite grouping key: for example, suppose the source document is similar to the one used in the previous examples, but allows multiple entries for the same country and city, such as:

<cities>
  <city name="Milano"  country="Italia"  year="1950"   pop="5.23"/>
  <city name="Milano"  country="Italia"  year="1960"   pop="5.29"/>  
  <city name="Padova"  country="Italia"  year="1950"   pop="0.69"/>
  <city name="Padova"  country="Italia"  year="1960"   pop="0.93"/>    
  <city name="Paris"   country="France"  year="1951"   pop="7.2"/>
  <city name="Paris"   country="France"  year="1961"   pop="7.6"/>
</cities>

Now suppose we want to list the average value of @pop for each (country, name) combination. One way to handle this is to concatenate the parts of the key, for example <xsl:for-each-group select="concat(@country, '/', @name)">. A second solution is to nest one xsl:for-each-group element directly inside another. XSLT 3.0 introduces a third option, which is to define the grouping key as composite:

<xsl:for-each-group select="cities/city" 
                    group-by="@name, @country" 
                    composite="yes">
  <p>
    <xsl:value-of select="current-grouping-key()[1] || ', ' ||
                          current-grouping-key()[2] || ': ' || 
                          avg(current-group()/@pop)"/>
  </p>
</xsl:for-each-group>

Note:

The string concatenation operator || is new in XPath 3.0.

 

Example: Identifying a Group by its Initial Element

The next example identifies a group not by the presence of a common value, but rather by adjacency in document order. A group consists of an h2 element, followed by all the p elements up to the next h2 element.

Source XML document:

<body>
  <h2>Introduction</h2>
  <p>XSLT is used to write stylesheets.</p>
  <p>XQuery is used to query XML databases.</p>
  <h2>What is a stylesheet?</h2>
  <p>A stylesheet is an XML document used to define a transformation.</p>
  <p>Stylesheets may be written in XSLT.</p>
  <p>XSLT 2.0 introduces new grouping constructs.</p>
</body>

Desired output:

<chapter>
  <section title="Introduction">
    <para>XSLT is used to write stylesheets.</para>
    <para>XQuery is used to query XML databases.</para>
  </section> 
  <section title="What is a stylesheet?">
    <para>A stylesheet is used to define a transformation.</para>
    <para>Stylesheets may be written in XSLT.</para>
    <para>XSLT 2.0 introduces new grouping constructs.</para>
  </section>
</chapter>

Solution:

<xsl:template match="body">
  <chapter>
    <xsl:for-each-group select="*" group-starting-with="h2">
      <section title="{self::h2}">
        <xsl:for-each select="current-group()[self::p]">
          <para><xsl:value-of select="."/></para>
        </xsl:for-each> 
      </section>
    </xsl:for-each-group>
  </chapter>
</xsl:template>

The use of title="{self::h2}" rather than title="{.}" is to handle the case where the first element is not an h2 element.

 

Example: Identifying a Group by its Final Element

The next example illustrates how a group of related elements can be identified by the last element in the group, rather than the first. Here the absence of the attribute continued="yes" indicates the end of the group.

Source XML document:

<doc>
  <page continued="yes">Some text</page>
  <page continued="yes">More text</page>    
  <page>Yet more text</page>
  <page continued="yes">Some words</page>
  <page continued="yes">More words</page>    
  <page>Yet more words</page>        
</doc>

Desired output:

<doc>
  <pageset>
    <page>Some text</page>
    <page>More text</page>    
    <page>Yet more text</page>
  </pageset>
  <pageset>
    <page>Some words</page>
    <page>More words</page>    
    <page>Yet more words</page>
  </pageset>
</doc>

Solution:

<xsl:template match="doc">
<doc>
  <xsl:for-each-group select="*" 
                      group-ending-with="page[not(@continued='yes')]">
    <pageset>
      <xsl:for-each select="current-group()">
        <page><xsl:value-of select="."/></page>
      </xsl:for-each> 
    </pageset>
  </xsl:for-each-group>
</doc>
</xsl:template>

 

Example: Adding an Element to Several Groups

The next example shows how an item can be added to multiple groups. Book titles will be added to one group for each indexing term marked up within the title.

Source XML document:

<titles>
    <title>A Beginner's Guide to <ix>Java</ix></title>
    <title>Learning <ix>XML</ix></title>
    <title>Using <ix>XML</ix> with <ix>Java</ix></title>
</titles>

Desired output:

<h2>Java</h2>
    <p>A Beginner's Guide to Java</p>
    <p>Using XML with Java</p>
<h2>XML</h2>
    <p>Learning XML</p>
    <p>Using XML with Java</p>

Solution:

<xsl:template match="titles">
    <xsl:for-each-group select="title" group-by="ix">
      <h2><xsl:value-of select="current-grouping-key()"/></h2>
      <xsl:for-each select="current-group()">
        <p><xsl:value-of select="."/></p>
      </xsl:for-each>
    </xsl:for-each-group>
</xsl:template>

 

Example: Grouping Alternating Sequences of Elements

In this example, the membership of a node within a group is based both on adjacency of the nodes in document order, and on common values. In this case, the grouping key is a boolean condition, true or false, so the effect is that a grouping establishes a maximal sequence of nodes for which the condition is true, followed by a maximal sequence for which it is false, and so on.

Source XML document:

<p>Do <em>not</em>:
    <ul>
    <li>talk,</li>
    <li>eat, or</li>
    <li>use your mobile telephone</li>
    </ul>
    while you are in the cinema.</p>

Desired output:

<p>Do <em>not</em>:</p>
    <ul>
    <li>talk,</li>
    <li>eat, or</li>
    <li>use your mobile telephone</li>
    </ul>
    <p>while you are in the cinema.</p>

Solution:

This requires creating a p element around the maximal sequence of sibling nodes that does not include a ul or ol element.

This can be done by using group-adjacent, with a grouping key that is true if the element is a ul or ol element, and false otherwise:

<xsl:template match="p">
    <xsl:for-each-group select="node()" 
            group-adjacent="self::ul or self::ol">
        <xsl:choose>
            <xsl:when test="current-grouping-key()">
                <xsl:copy-of select="current-group()"/>  
            </xsl:when>
            <xsl:otherwise>
                <p>
                    <xsl:copy-of select="current-group()"/>
                </p>
            </xsl:otherwise>  
        </xsl:choose>
    </xsl:for-each-group>
</xsl:template>

14.5 Non-Transitivity

If the population contains values of different numeric types that differ from each other by small amounts, then the eq operator is not transitive, because of rounding effects occurring during type promotion. It is thus possible to have three values A, B, and C among the grouping keys of the population such that A eq B, B eq C, but A ne C.

For example, this arises when computing

<xsl:for-each-group group-by="." select="
             xs:float('1.0'),
             xs:decimal('1.0000000000100000000001'),
             xs:double('1.00000000001')"/>

because the values of type xs:float and xs:double both compare equal to the value of type xs:decimal but not equal to each other.

In this situation the results must be equivalent to the results obtained by the following algorithm:

The effect of these rules is that (a) every item in a non-singleton group has a grouping key that is equal to that of at least one other item in that group, (b) for any two distinct groups, there is at least one pair of items (one from each group) whose grouping keys are not equal to each other.