[xquery-talk] Removing douplicate elements

Michael Kay mhk at mhk.me.uk
Fri Aug 12 00:14:36 PDT 2005


The reason this can't use tail-call optimization is that the result of the
recursive call is combined with another value using the "and" operator - so
the recursive call isn't a tail call. It's quite possible to write a
non-recursive version of notIn, for example using "every $x in $list
satisfies not(deep-equal($pivot, $x)"

However, your algorithm is executing deep-equal n^2/2 times, which is going
to give horrible performance as the files get bigger. Improving it isn't
very easy. I think I would write a function that computes a checksum, such
that two elements that are deep-equal will always have the same checksum;
then sort by the checksum, and only use deep-equals to compare elements with
the same checksum. But choosing a good checksum algorithm probably depends
on knowing something about your data.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: talk-bounces at xquery.com 
> [mailto:talk-bounces at xquery.com] On Behalf Of EXTERNAL Kruse 
> Peter (Praktikant;CR/AEA1-Si)
> Sent: 10 August 2005 13:37
> To: talk at xquery.com
> Subject: [xquery-talk] Removing douplicate elements
> 
> Hi folks
> 
> I have an xml file with douplicate elements like this
> 
> <connect>
> 	<a start="luigi" target="mario"/>
> 	<b start="luigi" target="princess"/>
> 	<a start="princess" target="mario"/>
> 	<a start="joshi" target="luigi"/>
> 	<a start="luigi" target="mario"/>
> 	<b start="luigi" target="mario"/>
> </connect>
> 
> I want to remove all douplicates, so my result would look like:
> 
> <connect>
> 	<a start="luigi" target="mario"/>
> 	<b start="luigi" target="princess"/>
> 	<a start="princess" target="mario"/>
> 	<a start="joshi" target="luigi"/>
> 	<b start="luigi" target="mario"/>
> </connect>
> 
> 
> I managed to develope the following:
> -------------------
> declare function local:notIn($pivot as node(),$list as node()*) as
> xs:boolean {
> 	if (count($list)=0) then 
> 		true() 
> 	else
> 		(not(	deep-equal($pivot,$list[1]))
> 		and local:notIn($pivot, $list[position()>1]) )
> };
> 
> for $c at $cpos in $connections
> 	where local:notIn($c,$connections[position()<$c])
> 	return $c
> -------------------
> 
> With small files, this works, but with larger files, i get 
> 
> Error on line 20 of file:/.../connections.xq:
>   Too many nested function calls. May be due to infinite recursion.
> Query processing failed: Run-time errors were reported
> 
> 
> 
> Any hints ?
> 
> Thanks a lot
> 	Peter
> 
> _______________________________________________
> talk at xquery.com
> http://xquery.com/mailman/listinfo/talk
> 




More information about the talk mailing list