[xquery-talk] Multiple output via a stream of filters
dlee at calldei.com
Mon Jan 13 16:03:15 PST 2014
XML Databases tend to be good at this (keeping things cached, avoiding unnecessary parsing and serialization, etc).
If it is a NxM problem that may be fine unless it goes too slow. Then you might want to see about ways of optimization ...
Generally though (not sure about Exist but I suspect it falls in the general guidelines of XML DBs) taking a document and applying multiple functions/xquery/xslt to it is just fine. The system will keep the document in memory and usually the setup time to apply exit one function and apply the next is small compared the the functions. And much easier than one big huge complicated function/transform. XML DB's also tend to cache the functions or xqueries after compilation so they don’t need to recompile them every iteration.
David A. Lee
dlee at calldei.com
From: Adam Retter [mailto:adam.retter at googlemail.com]
Sent: Monday, January 13, 2014 3:26 PM
To: Ihe Onwuka
Cc: David Lee; talk at x-query.com
Subject: Re: [xquery-talk] Multiple output via a stream of filters
I think the XQuery solution is exactly as you described it. A recursive descent, most likely starting with an identity transform, and a sequence of functions that can be combined and applied at each level of the descent.
On 13 January 2014 21:12, Ihe Onwuka <ihe.onwuka at gmail.com> wrote:
> The documents are in an eXist database hence I was expecting and think
> I need an XQuery solution but am open to other approaches.
> On Mon, Jan 13, 2014 at 8:40 PM, David Lee <dlee at calldei.com> wrote:
>> This is the type of problem xmlsh and XProc were designed for ...
>> What engine are you using? I personally prefer designing with lots of small programs instead of a monolith. This is practical only if the startup overhead for each is small and preferably if in memory data can be passed between steps. XProc, xmlsh, and most xquery database engines support this model efficiently. I find it so much easier to write and debug if I can work in small transformations and let the framework do the plumbing for me.
>> Sent from my iPad (excuse the terseness) David A Lee dlee at calldei.com
>>> On Jan 13, 2014, at 11:12 AM, "Ihe Onwuka" <ihe.onwuka at gmail.com> wrote:
>>> I am running through about a gigabyte worth of xml documents.
>>> The ideal processing scenario is to offer each node in the sequence
>>> to a list of filters and augment different XML documents (or
>>> different branches of one encompassing document) based on the
>>> outcome of the filter.
>>> If anyone has seen the example used to illustrate continuation
>>> passing style in Chapter 8 of the Little Schemer that is exactly
>>> what I have in mind (albeit not necessarily in continuation passing style).
>>> What I am doing at the moment is cycling through the nodes n times
>>> where n is the number of filters I am applying. Clearly sub-optimal.
>>> However it is not a priority to what I am actually doing (which is
>>> simply to get the result rather than to do so performantly) so I am
>>> not quite motivated enough to figure out how to do this.
>>> Hence I am asking instead what others have done in a similar scenario.
>>> I suppose some sort of customised HOF entailing head/tail recursion
>>> over the sequence and accepting a list of filter functions, would
>>> be the likely form a solution would take.
>>> talk at x-query.com
> talk at x-query.com
More information about the talk