[xquery-talk] solutions to an Excel XML processing problem
Michael Kay
mhk at mhk.me.uk
Thu Aug 24 01:11:43 PDT 2006
It's a classical positional grouping problem, as discussed in
http://www.ximep-2006.org/papers/Paper-Kay.pdf
though it's not actually one of the use cases presented in that paper.
In practice I've usually found it's more convenient for subsequent
processing to convert to a form (F) in which the absent cells are still
absent, but all the cells that are present have an explicit column number.
You can of course get from F to your preferred target form (G) quite simply
using something like
for $i in 1 to max(F/c/@i)
return (F/c[@i=$i], <c/>)[1]
(the hardest bit is deleting the @i attributes, which I left out because
it's absurdly difficult in XQuery 1.0 if the cells can have arbitrary
content)
Getting to form F can be done as
for $col in $row/c
return
if ($col/@i) then $col
else
let $p = $col/preceding-sibling::c,
$p1 = $p[@i][last()]
$pos = $p1/@i + count($p) - count($p1/preceding-sibling::c)
return <c i="{$pos}">{$col/node()}</c>
(untested) or of course by a recursive function. This is assuming that the
preceding-sibling axis is supported - it's still an optional feature,
unfortunately.
In XSLT 2.0 (which is what I actually use for this problem, frequently!)
it's
<xsl:for-each-group select="c" group-starting-with="c[@i]">
<xsl:for-each select="current-group()">
<c i="{current-group()[1]/@i + position() - 1}">
<xsl:copy-of select="child::node()"/>
</c>
</xsl:for-each>
</xsl:for-each-group>
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: talk-bounces at xquery.com
> [mailto:talk-bounces at xquery.com] On Behalf Of Daniela Florescu
> Sent: 23 August 2006 23:33
> To: XQuery talk
> Subject: [xquery-talk] solutions to an Excel XML processing problem
>
> Dear all,
>
> I am trying to find various solutions to the following XML to
> XML transformation problem.
> It is taken out of real world Excel spreadsheets data processing.
>
> (Input) The data is composed of cells and some of the cells
> are empty (no content).
> This version of the data is in compressed form, i.e. does not
> contain the empty cells and is using the attribute @i to
> express jumps to the next non empty cell (using the
> position). It looks like this:
> =====================================================================
>
>
> <row cnt="30">
> <c i="3">third</c>
> <c>fouth</c>
> <c>fifth</c>
> <c i="10">tenth</c>
> <c>eleventh</c>
> <c i="20">twentieth</c>
> <c>twenty-one</c>
> <c>twenty-two</c>
> <c>twenty-three</c>
> </row>
>
>
> (Ouput) Write an XQuery to transform the XML data in the
> uncompressed form that includes the empty cells, as follows:
> =============================================================
>
> <row cnt="30">
> <c/>
> <c/>
> <c>third</c>
> <c>fouth</c>
> <c>fifth</c>
> <c/>
> <c/>
> <c/>
> <c/>
> <c>tenth</c>
> <c>eleventh</c>
> <c/>
> <c/>
> <c/>
> <c/>
> <c/>
> <c/>
> <c/>
> <c/>
> <c i="20">twentieth</c>
> <c>twenty-one</c>
> <c>twenty-two</c>
> <c>twenty-three</c>
> ...
> </row>
>
>
> I know it can be done in a variety of ways, I am just looking
> for the "best" (i.e. simplest) XQuery to achieve this purpose
> that people on this list can help me find.
>
> Thanks in advance,
> Dana
>
> _______________________________________________
> talk at xquery.com
> http://xquery.com/mailman/listinfo/talk
More information about the talk
mailing list