[xquery-talk] solutions to an Excel XML processing problem

Michael Kay mhk at mhk.me.uk
Thu Aug 24 01:11:43 PDT 2006


It's a classical positional grouping problem, as discussed in

http://www.ximep-2006.org/papers/Paper-Kay.pdf

though it's not actually one of the use cases presented in that paper.

In practice I've usually found it's more convenient for subsequent
processing to convert to a form (F) in which the absent cells are still
absent, but all the cells that are present have an explicit column number.
You can of course get from F to your preferred target form (G) quite simply
using something like

for $i in 1 to max(F/c/@i)
return (F/c[@i=$i], <c/>)[1]

(the hardest bit is deleting the @i attributes, which I left out because
it's absurdly difficult in XQuery 1.0 if the cells can have arbitrary
content)

Getting to form F can be done as

for $col in $row/c
return
  if ($col/@i) then $col
  else 
    let $p = $col/preceding-sibling::c,
        $p1 = $p[@i][last()]
        $pos = $p1/@i + count($p) - count($p1/preceding-sibling::c)
    return <c i="{$pos}">{$col/node()}</c>

(untested) or of course by a recursive function. This is assuming that the
preceding-sibling axis is supported - it's still an optional feature,
unfortunately.


In XSLT 2.0 (which is what I actually use for this problem, frequently!)
it's

<xsl:for-each-group select="c" group-starting-with="c[@i]">
  <xsl:for-each select="current-group()">
    <c i="{current-group()[1]/@i + position() - 1}">
      <xsl:copy-of select="child::node()"/>
    </c>
  </xsl:for-each>
</xsl:for-each-group>

Michael Kay
http://www.saxonica.com/




> -----Original Message-----
> From: talk-bounces at xquery.com 
> [mailto:talk-bounces at xquery.com] On Behalf Of Daniela Florescu
> Sent: 23 August 2006 23:33
> To: XQuery talk
> Subject: [xquery-talk] solutions to an Excel XML processing problem
> 
> Dear all,
> 
> I am trying to find various solutions to the following XML to 
> XML transformation problem.
>   It is taken out of real world Excel spreadsheets data processing.
> 
> (Input) The data is composed of cells and some of the cells 
> are empty (no content).
> This version of the data is in compressed form, i.e. does not 
> contain the empty cells and is using the attribute @i to 
> express jumps to the  next non empty cell (using the 
> position). It looks like this:
> =====================================================================
> 
> 
> <row cnt="30">
> <c i="3">third</c>
> <c>fouth</c>
> <c>fifth</c>
> <c i="10">tenth</c>
> <c>eleventh</c>
> <c i="20">twentieth</c>
> <c>twenty-one</c>
> <c>twenty-two</c>
> <c>twenty-three</c>
> </row>
> 
> 
> (Ouput) Write an XQuery to transform the XML data in the 
> uncompressed form that includes the empty cells, as follows:
> =============================================================
> 
> <row cnt="30">
>   <c/>
>    <c/>
> <c>third</c>
> <c>fouth</c>
> <c>fifth</c>
>   <c/>
>   <c/>
> <c/>
>   <c/>
> <c>tenth</c>
> <c>eleventh</c>
>   <c/>
>   <c/>
> <c/>
>   <c/>
> <c/>
>   <c/>
> <c/>
>   <c/>
> <c i="20">twentieth</c>
> <c>twenty-one</c>
> <c>twenty-two</c>
> <c>twenty-three</c>
>   ...
> </row>
> 
> 
> I know it can be done in a variety of ways, I am just looking 
> for the "best" (i.e. simplest) XQuery to achieve this purpose 
> that people on this list can help me find.
> 
> Thanks in advance,
> Dana
> 
> _______________________________________________
> talk at xquery.com
> http://xquery.com/mailman/listinfo/talk



More information about the talk mailing list