[xquery-talk] File Systems & XQuery
Frans Englich
frans.englich at telia.com
Wed Feb 7 15:15:29 PST 2007
Hello all,
When writing XML applications, one currently needs glue and helper utilities
to compensate for missing parts, or tie steps together. This could be piping
the result of a schema validation to a transformation step, or determining
what files to open in an XQuery query. XProc[1] is one attempt to solve these
kind of problems.
For navigating and inspecting the file system from XQuery, one approach is
that of Saxon's[2], where a mini-homebrewn query is passed as an URI to
fn:collection().
I see some drawbacks with that approach:
* It invents a new "language", instead of using XPath, and is therefore
limited, comparatively.
* The result is not expressed with the XPath Data Model and therefore one
cannot use it as such; for example, transform it with a stylesheet.
I'm here venting the idea of another approach to inspecting the file system:
An absolute URI would be passed to fn:collection(). It would always be the
same regardless of what files being queried, just like a namespace.
fn:collection() would in turn return a node that represents the root of the
file system. In the case of MS Windows platforms(and other platforms), the
root node would be virtual, containing drives as children.
The returned node would mirror the file system, where each node representing a
file would have attributes such as mimeType, fileSize, absolutePath, and so
forth. Since it's a plain XDM node, the user has strong expressiveness with
XPath.
There are certain design issues with this, such as how the XML format would
be. This, for instance, is very friendly from a query-writing perspective:
declare variable $fs := fn:collection("http://fs-xquery.fs.net/");
$fs/home/frans/xmlExamples//*[@mimeType eq 'application/xml']
However, since this use dynamic elements, it's tricky to express with a Schema
and considered by many as bad design(which I would agree with, but I do think
it makes query writing elegant).
The alternative is rather messy for query writing:
$fs/directory[@name =
"home"]/directory[@name="frans"]/directory[@name="xmlExamples"]//*[@mimeType
eq 'application/xml']
Either alternative is equally horribly, but in their own way. Is there a third
alternative? Can they be combined? Is any alternative acceptable?
Many parts of this would be implementation defined(such as mime type detection
and pretty much everything else). One issue is node stability, especially
when put in relation to changes to the file system.
Such a "mini"-spec could have two levels of conformance: for statically typed
and not. For typed implementations, fn:collection's return value would have a
more specific return type, instead of "element()".
The XML format would be in a namespace. Should that namespace equal the
collection URI? That's simple, but maybe there's some issue that I not see.
Is this idea overkill? Useful? Doable? If it's possible, I think it would be
an elegant use of the XPath Data Model's abstraction to the underlying
representation, and an interoperable mechanism to query the filesystem. I
believe it would render many small scripts and Makefiles redundant.
If it's possible, I consider writing up a draft for this, but some initial
input would be appreciated!
Frans
1.
http://www.w3.org/TR/xproc/
2.
http://www.saxonica.com/documentation/sourcedocs/collections.html
More information about the talk
mailing list