[xquery-talk] outer join between 2 sequences
adam.retter at googlemail.com
Sun Sep 28 06:13:28 PDT 2014
So after a bit more coffee and a bit of research it seems to me that
the only way you are going to get this to be fast would be if you used
a hash based looked for one of your sequences. Something like a
HashMap or BloomFilter would do the job, see:
I have made some experimentations with the above in Scala, a HashMap
works nicely here in-terms of performance because it is simple and
available, and in your case with Integers you do not need to worry
about duplicates because the value of a number is it's identity.
So, if you don't want to implement a HashMap or BloomFilter in XQuery,
what is one to do? Well XQuery 3.1 will introduce the Map data-type
and some implementations already have done this. If you understand the
implementation (or even hazard a guess), there is a good chance that
that XQuery map may in-fact under the covers be a HashMap.
I won't say too much more about this as I have been discussing it with
Wolfgang this morning, and I think he will shortly post you a very
fast example when using XQuery 3.1 Maps in eXist...
On 28 September 2014 13:29, Adam Retter <adam.retter at googlemail.com> wrote:
>> I think that question is very implementation specific. If all of your
>> data is in RAM, as your dataset is relatively small and these are just
>> numbers, I would expect performance to be excellent. I am not sure
>> that sorting will make much of a difference, but it depends on the
>> implementation and how it initiates the search for a false comparison
>> in a large sequence.
> I retract any previous statements that may have alluded to good
> performance. A quick test on Saxon and eXist shows that this is a very
> slow problem. I am trying to see if there is not a short-cut that
> could be taken, I will come back to you...
> Adam Retter
> skype: adam.retter
> tweet: adamretter
More information about the talk