[xquery-talk] File Systems & XQuery

Frans Englich frans.englich at telia.com
Wed Feb 7 15:15:29 PST 2007


Hello all,

When writing XML applications, one currently needs glue and helper utilities 
to compensate for missing parts, or tie steps together. This could be piping 
the result of a schema validation to a transformation step, or determining 
what files to open in an XQuery query. XProc[1] is one attempt to solve these 
kind of problems.

For navigating and inspecting the file system from XQuery, one approach is 
that of Saxon's[2], where a mini-homebrewn query is passed as an URI to 
fn:collection().

I see some drawbacks with that approach:

* It invents a new "language", instead of using XPath, and is therefore 
limited, comparatively.
* The result is not expressed with the XPath Data Model and therefore one 
cannot use it as such; for example, transform it with a stylesheet.

I'm here venting the idea of another approach to inspecting the file system:

An absolute URI would be passed to fn:collection(). It would always be the 
same regardless of what files being queried, just like a namespace.  
fn:collection() would in turn return a node that represents the root of the 
file system. In the case of MS Windows platforms(and other platforms), the 
root node would be virtual, containing drives as children.

The returned node would mirror the file system, where each node representing a 
file would have attributes such as mimeType, fileSize, absolutePath, and so 
forth. Since it's a plain XDM node, the user has strong expressiveness with 
XPath.

There are certain design issues with this, such as how the XML format would 
be. This, for instance, is very friendly from a query-writing perspective:

declare variable $fs := fn:collection("http://fs-xquery.fs.net/");
$fs/home/frans/xmlExamples//*[@mimeType eq 'application/xml']

However, since this use dynamic elements, it's tricky to express with a Schema 
and considered by many as bad design(which I would agree with, but I do think 
it makes query writing elegant).

The alternative is rather messy for query writing:

$fs/directory[@name = 
"home"]/directory[@name="frans"]/directory[@name="xmlExamples"]//*[@mimeType 
eq 'application/xml']

Either alternative is equally horribly, but in their own way. Is there a third 
alternative? Can they be combined? Is any alternative acceptable?

Many parts of this would be implementation defined(such as mime type detection 
and pretty much everything else). One issue is node stability, especially 
when put in relation to changes to the file system.

Such a "mini"-spec could have two levels of conformance: for statically typed 
and not. For typed implementations, fn:collection's return value would have a 
more specific return type, instead of "element()".

The XML format would be in a namespace. Should that namespace equal the 
collection URI? That's simple, but maybe there's some issue that I not see.

Is this idea overkill? Useful? Doable? If it's possible, I think it would be 
an elegant use of the XPath Data Model's abstraction to the underlying 
representation, and an interoperable mechanism to query the filesystem. I 
believe it would render many small scripts and Makefiles redundant.

If it's possible, I consider writing up a draft for this, but some initial 
input would be appreciated!


		Frans

1.
http://www.w3.org/TR/xproc/

2.
http://www.saxonica.com/documentation/sourcedocs/collections.html


More information about the talk mailing list