[xquery-talk] Multiple output via a stream of filters

Michael Kay mike at saxonica.com
Tue Jan 14 00:42:51 PST 2014



> I only read the ginormous  XML once... I apply the 7 filters to each
> node read and it gets allocated to one of the 7 output buckets (hows
> that for a semantically neutral term).
> 

This is known within the XSL WG as the "coloured widgets" problem after a streaming use case put forward by Oliver Becker. (The problem is, given an input document containing widgets of different colours, produce N output documents, one for each colour present in the file. There are two variants of the problem, one where the set of colours is known statically, one where it is dynamic). The XSLT 3.0 streaming solution for the static case is:

<xsl:stream href="widgets.xml">
  <xsl:fork>
    <xsl:sequence>
      <xsl:result-document href="red.xml">
       <xsl:sequence select="*/widget[@colour='red']"/>
      </xsl:result-document>
   </xsl:sequence>
   <xsl:sequence>
      <xsl:result-document href="blue.xml">
       <xsl:sequence select="*/widget[@colour='blue']"/>
      </xsl:result-document>
   </xsl:sequence>
   <xsl:sequence>
     <xsl:result-document href="green.xml">
       <xsl:sequence select="*/widget[@colour='green']"/>
      </xsl:result-document>
  </xsl:sequence>
 </xsl:fork>
</xsl:stream>

A streaming processor is required to evaluate this in a single pass of the input document; the three "prongs" of the xsl:fork are effectively executed in parallel.

I mention this purely for academic interest, since there is no implementation available, unless you count the one I wrote last week.

I don't think XSLT 3.0 currently has an equivalent solution for the dynamic case, where the colours are not known in advance. The normal solution would use "group-by" but this is not streamable.

Michael Kay
Saxonica




More information about the talk mailing list