[xquery-talk] joining multiple queries into a single one

Jeff Dexter jeff.dexter at rainingdata.com
Tue Apr 11 17:06:03 PDT 2006


The moral of the story is that XQuery implementations vary greatly and
one has to be very careful about general advice on optimization without
first considering the engine on which the query is working.

Remember that there are (at least) two very different types of people on
this list: implementers and users. The former should be really careful
giving out advice without letting the users know whether or not it's
particular to their implementation and what their implementation may be.
The latter should be very careful with some of the advice given because,
like we see in this thread, it's often contradictory and may not apply
to the engine or database you're using at the moment.

So, that all aside, I'll play by my own rules, and state that in my
engine (TigerLogic), I agree with both Jason and Michael, but under
quite different circumstances.

In terms of the core database engine, there's actually little difference
between //A/B and /A/B unless of course <A> elements are peppered
throughout your document. If they are it's best to be explicit if only
to limit the search space of your query to a subset of the <A> elements.
If you don't have a specific target, then you stick with //A and let the
optimizer have at it. In any case database engines typically know a lot
about the data they contain, so in our case //A/B and /A/B are basically
equivalent both in terms of what they end up searching and the cost to
determine it. No index joins or anything like that.

One could also argue that when you get to the point where your XQuery
spans 10 pages or so being explicit with paths and type declarations
will help you maintain your sanity as you debug and run your queries
over ever-changing collections of XML documents, but again engines offer
differing degrees of static typing so that may or may not help in all
instances.

Where I agree with Michael is in the case of our streaming engine, which
knows absolutely nothing about the XML it's about to query, since it's
coming from the web, the file system, some random ftp site, etc.
Streaming processors need to scan data as its reported, so explicit
paths can help terminate searches early, eliminate entire sections of
content from the search space... the list goes on. In this case
specifying //A literally means /descendant-or-self::node()/A, which
boils down to "look for <A> everywhere", and limits the potential of a
streaming processor to optimize its search path. This is especially the
case when a large number of paths appear in a query as the processor may
need to do a lot of matching.

Anyway, I don't want to besmirch any of the sound advice these gentlemen
are offering but users, especially beginners, have to be careful that
they're following the best practices for the engine(s) they're currently
using.

Jeff Dexter.
www.rainingdata.com


-----Original Message-----
From: talk-bounces at xquery.com [mailto:talk-bounces at xquery.com] On Behalf
Of Jason Hunter
Sent: Tuesday, April 11, 2006 3:25 PM
To: Michael Kay
Cc: talk at xquery.com; 'Titash Neogi'
Subject: Re: [xquery-talk] joining multiple queries into a single one

Michael Kay wrote:
> Thirdly, the "task" seems to be the outermost element in your
document. Use
> /task rather than //task to avoid a search of the whole document.
[This
> might not apply to eXist: but it never does any harm to give the
system more
> information to narrow the search.]

Actually on an indexed system giving more information to "narrow the 
search" can look to the system like more criteria to worry about.  :)

For example, a query for //foo is simple if you have an index that knows

the placement of foo elements.  Doing /a/b/c/foo likely requires joining

between indexes to ensure you only get foo elements in the right
locations.

-jh-

_______________________________________________
talk at xquery.com
http://xquery.com/mailman/listinfo/talk




More information about the talk mailing list