[xquery-talk] A Poor Man's XPROC
CMisztur at macleanfogg.com
Thu Jan 23 05:25:55 PST 2014
Thanks for this.
I just noticed that an xproc module is now showing up in eXist's package manager and can work with calabash, which I want to try.
Also, Stylus Studio has a great pipeline.
> On Jan 22, 2014, at 10:16 AM, "Ihe Onwuka" <ihe.onwuka at gmail.com> wrote:
> Just sharing something I've used to get by with a bash shell
> transformation workflow. I have already been served a large portion
> of "Why don't you use XPROC/xmlsh" and the answer is because I didn't
> think of it at the outset.
> The two "tricks" here are process substitution - a means of piping one
> output into several subsequent steps and piping output direct to the
> shell for execution.
> I lay no claim to originality or shell expertise, it's just something
> that has worked for me and is worth sharing. Some of my terminology
> may also be off for the same reason but this works.
> Suppose we wish to parse some html - lets say amazon. We start out
> with a bog standard curl request piped into tagsoup to make it well
> curl -s --request GET "www.amazon.com" | java -jar
> $HOME/tagsoup-1.2.1.jar --nons |
> this is where the fun starts tagsoup has given us some xhtml and we
> want a) save it b) parse it further so.....
> tee amazon.xhtml | java -jar $HOME/saxon9he.jar -s:- -xsl:createMetadata.xsl |
> tee the xhtml and passes it on to the next step which is our first
> transform to create some Metadata from amazon.xhtml. Now we want to
> pass this metadata on to 3 processes.
> 1. a transformation to create a metadata Header
> 2. a transformation to create a metadata Tail record.
> 3 then we wish to process the xhtml (a step which I will explain later).
> This is where the first use of process substitution kicks in. A quick
> note. You need to start your script with #!/bin/bash rather than
> #!/bin/sh to get this to work.
> We invoke process substitution by tee >(some process) >(some other
> process) etc. What this will do is take the input that was piped into
> tee and pass it on to other subprocesses in the workflow where the
> thing in brackets is what I call a subprocess. Note you need the >
> sign that precedes the brackets. So lets apply this to the output of
> the createMetadata stage
> tee >(java -jar $HOME/saxon9he.jar -s:- -xsl:metadataHeader.xsl | curl
> -s --request PUT etc)
> that stores the metadata header in the database, but we want the same
> input to go into the transform that creates the metadata Tail so......
>> (java -jar $HOME/saxon9he.jar -s:- -xsl:metadataTail.xsl | curl -s --request PUT etc)
> and now we have a third use for the output of the createMetadata
> transform. We are going to generate the rest of the shell script to
> complete the workflow. What I did here was to write a transformation
> that produced bash shell code to do the rest of the work. The reason
> for doing so was to take advantage of the power of XSLT (or XQuery)
> to handle the conditional logic that would determine how the job would
> proceed. For example maybe a certain step is only to execute if a
> certain metadata element is present (or a certain XML file exists).
> Instead of testing for this and trying to introduce conditional logic
> into the bash script you could say use doc-available to check fro the
> file and then depending on the outcome generate the appropriate script
> code. To execute it all you need to do is pipe the output into bash.
> Here is the finally process substitution step. Recall we still have
> access to the output of the create metadata step.
>> (java -jar $HOME/saxon9he.jar -s:- -xsl:generateReviews.xsl | bash )
> where generateReviews transforms the metadata to the appropriate bash
> shell script code and that gets executed by piping it into bash.
> Not claiming it is elegant or efficient - it is just something that
> has worked for me. Somebody may find it or bits of it useful or I may
> get told of a better way (xmlsh/Xproc excepted).
> talk at x-query.com
The contents of this message may be privileged and confidential. Therefore, if this message has been received in error, please delete it without reading it. Your receipt of this message is not intended to waive any applicable privilege. Please do not disseminate this message without the permission of the author.
Please consider the environment before printing this e-mail
More information about the talk