[xquery-talk] Multiple output via a stream of filters

David Lee dlee at calldei.com
Mon Jan 13 15:54:45 PST 2014


If your running in exist then pure XQuery is probably as good or better then anything else.
Could you expand on your problem ? I don't know exist that well but I cant think off hand of a better solution
Unless there is a shortcut to know ahead of time what transforms to apply ...
Do make sure you iterate over the docs *once* then the transforms N times ...
E.g.

For $d in doc() 
   For $transform in $transforms ...

Not the other way 
( i.e. DON'T do 
    For $n in $transforms  
        For $d in $doc 
   ...
)

I don't know exist that well but typically once a document is fetched into memory in a XML DB it can stay cached,
But if you are loading too many docs the cache will get full and it will have to reload the docs.

That is assuming that the size of your documents is bigger than the transforms.


----------------------------------------
David A. Lee
dlee at calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Ihe Onwuka [mailto:ihe.onwuka at gmail.com] 
Sent: Monday, January 13, 2014 1:13 PM
To: David Lee
Cc: talk at x-query.com
Subject: Re: [xquery-talk] Multiple output via a stream of filters

The documents are in an eXist database hence I was expecting and think I need an XQuery solution but am open to other approaches.

On Mon, Jan 13, 2014 at 8:40 PM, David Lee <dlee at calldei.com> wrote:
> This is the type of problem xmlsh and XProc were designed for ...
> What engine are you using?  I personally prefer designing with lots of small programs instead of a monolith.   This is practical only if the startup overhead for each is small and preferably if in memory data can be passed between steps.  XProc, xmlsh, and most xquery database engines support this model efficiently.    I find it so much easier to write and debug if I can work in small transformations and let the framework do the plumbing for me.
>
>
> Sent from my iPad (excuse the terseness) David A Lee dlee at calldei.com
>
>
>> On Jan 13, 2014, at 11:12 AM, "Ihe Onwuka" <ihe.onwuka at gmail.com> wrote:
>>
>> I am running through about a gigabyte worth of xml documents.
>>
>> The ideal processing scenario is to offer each node in the sequence 
>> to a list of filters and augment  different XML documents  (or 
>> different branches of one encompassing document) based on the outcome 
>> of the filter.
>>
>> If anyone has seen the example used to illustrate continuation 
>> passing style in Chapter 8 of the Little Schemer that is exactly what 
>> I have in mind  (albeit not necessarily in continuation passing style).
>>
>> What I am doing at the moment is cycling through the nodes n times 
>> where n is the number of filters I am applying. Clearly sub-optimal.
>> However it is not a priority to what I am actually doing (which is 
>> simply to get the result rather than to do so performantly) so I am 
>> not quite motivated enough to figure out how to do this.
>>
>> Hence I am asking instead what others have done in a similar scenario.
>> I suppose some sort of customised HOF  entailing head/tail recursion 
>> over the sequence and accepting a list of filter functions, would  be 
>> the likely form a solution would take.
>> _______________________________________________
>> talk at x-query.com
>> http://x-query.com/mailman/listinfo/talk



More information about the talk mailing list