[xquery-talk] joining multiple queries into a single one

Wed Apr 12 09:32:16 PDT 2006

I must admit that the advice to avoid "//" is something I trot out almost by
habit, having been telling XSLT users this for about six years. XSLT
processors generally work on documents read from filestore or from a
processing pipeline so there's no opportunity to build an index and amortize
the cost of building it over many transformations.

In fact if you run the query //A on Saxon, you not only incur the cost of
scanning the whole document to find all the A elements, you incur the cost
of building an index as well, because the software reckons that if you do it
once you're quite likely to do it again. If you don't do it again, you get
hit twice!

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Jeff Dexter [mailto:jeff.dexter at rainingdata.com] 
> Sent: 12 April 2006 00:06
> To: talk at xquery.com
> Cc: 'Titash Neogi'; 'Jason Hunter'; 'Michael Kay'
> Subject: RE: [xquery-talk] joining multiple queries into a single one
> 
> The moral of the story is that XQuery implementations vary greatly and
> one has to be very careful about general advice on 
> optimization without
> first considering the engine on which the query is working.
> 
> Remember that there are (at least) two very different types 
> of people on
> this list: implementers and users. The former should be really careful
> giving out advice without letting the users know whether or not it's
> particular to their implementation and what their 
> implementation may be.
> The latter should be very careful with some of the advice 
> given because,
> like we see in this thread, it's often contradictory and may not apply
> to the engine or database you're using at the moment.
> 
> So, that all aside, I'll play by my own rules, and state that in my
> engine (TigerLogic), I agree with both Jason and Michael, but under
> quite different circumstances.
> 
> In terms of the core database engine, there's actually little 
> difference
> between //A/B and /A/B unless of course <A> elements are peppered
> throughout your document. If they are it's best to be explicit if only
> to limit the search space of your query to a subset of the 
> <A> elements.
> If you don't have a specific target, then you stick with //A 
> and let the
> optimizer have at it. In any case database engines typically 
> know a lot
> about the data they contain, so in our case //A/B and /A/B 
> are basically
> equivalent both in terms of what they end up searching and the cost to
> determine it. No index joins or anything like that.
> 
> One could also argue that when you get to the point where your XQuery
> spans 10 pages or so being explicit with paths and type declarations
> will help you maintain your sanity as you debug and run your queries
> over ever-changing collections of XML documents, but again 
> engines offer
> differing degrees of static typing so that may or may not help in all
> instances.
> 
> Where I agree with Michael is in the case of our streaming 
> engine, which
> knows absolutely nothing about the XML it's about to query, since it's
> coming from the web, the file system, some random ftp site, etc.
> Streaming processors need to scan data as its reported, so explicit
> paths can help terminate searches early, eliminate entire sections of
> content from the search space... the list goes on. In this case
> specifying //A literally means /descendant-or-self::node()/A, which
> boils down to "look for <A> everywhere", and limits the potential of a
> streaming processor to optimize its search path. This is 
> especially the
> case when a large number of paths appear in a query as the 
> processor may
> need to do a lot of matching.
> 
> Anyway, I don't want to besmirch any of the sound advice 
> these gentlemen
> are offering but users, especially beginners, have to be careful that
> they're following the best practices for the engine(s) 
> they're currently
> using.
> 
> Jeff Dexter.
> www.rainingdata.com
> 
> 
> -----Original Message-----
> From: talk-bounces at xquery.com 
> [mailto:talk-bounces at xquery.com] On Behalf
> Of Jason Hunter
> Sent: Tuesday, April 11, 2006 3:25 PM
> To: Michael Kay
> Cc: talk at xquery.com; 'Titash Neogi'
> Subject: Re: [xquery-talk] joining multiple queries into a single one
> 
> Michael Kay wrote:
> > Thirdly, the "task" seems to be the outermost element in your
> document. Use
> > /task rather than //task to avoid a search of the whole document.
> [This
> > might not apply to eXist: but it never does any harm to give the
> system more
> > information to narrow the search.]
> 
> Actually on an indexed system giving more information to "narrow the 
> search" can look to the system like more criteria to worry about.  :)
> 
> For example, a query for //foo is simple if you have an index 
> that knows
> 
> the placement of foo elements.  Doing /a/b/c/foo likely 
> requires joining
> 
> between indexes to ensure you only get foo elements in the right
> locations.
> 
> -jh-
> 
> _______________________________________________
> talk at xquery.com
> http://xquery.com/mailman/listinfo/talk
> 
> 
>