[xquery-talk] Random number generation : requirements

Ghislain Fourny ghislain.fourny at 28msec.com
Wed May 7 04:13:29 PDT 2014


Hi,

I think that Leo' s proposal, on a first read, is the most elegant way
I have seen so far that (pseudo-)randomness can be brought to XQuery
in a deterministic way: by exposing that it is actually
pseudo-randomness (seed) to the user.

In Zorba (not 3.0, but the latest, trunk revision, used by us), we
designed and implemented nondeterminism in a way fully integrated with
side effects. If this is of interest to the list(s), I can say a few
words about what we did. Of course, feedback is always appreciated.

The abstract model is as follows. Basically, a program is executed
against a sequence of snapshots. Each side effect jumps to a new
snapshot. Between snapshots, evaluation order is very important
("sequential"). Between side effects and within a snapshot though, it
is as if the engine were so fast that time did not elapse at all. Of
course no engine is that fast -- even though I could imagine Mike
proving me wrong here :-) -- but, joke aside. it can hide the elapse
of time from the user with, say, caching. This idea is crucial in our
design and was largely inspired by Michael Sperberg-McQueen's vision
in a discussion on functional languages a while ago.

For everything deterministic, this means that identical expressions
within the same snapshot and against the same dynamic context (mostly
variable bindings) always evaluate to the same result (except in the
case of newly created nodes where identity is new every time). Once a
side effect occurs and a new snapshot is reached, the result can again
be different (the cache gets reset).

By default, Zorba automatically caches every function call involving
atomics only, within a snapshot. If you ask many times for the current
time without side effects, it should give you the same answer. You can
also ask Zorba to "simulate" or "force" full determinism with XML or
JSON as well.

If a function, or transitively an expression, is stamped as
nondeterministic, it means in Zorba that it has to be evaluated every
time the specification semantics says it has to be. Within a snapshot
though, there is no evaluation order constraint and the implementation
is free to pick any order it wants.

Mike said:
> For example, if f() and g() are non-deterministic functions, how do we say in the spec that in the expression (f(), g()), f should be executed before g?
In Zorba, we don' t. It' s fine to evaluate g before f.
> $xxx[random() gt 0.5]
With our semantics, this should produce a new number every time,
because the spec says the predicate is evaluated for each item in the
lhs sequence. Evaluating the predicate only once is an optimization
that cannot be done here because of non-determinism.

On an abstract level, nondeterministim, to us (Zorba), is synonym with
the presence of (true/quantum) randomness, nothing less but also
nothing more.

Does it make sense? I hope I got the summary right.

Kind regards,
Ghislain


On Wed, May 7, 2014 at 1:20 AM, Leonard Wörteler <leo at woerteler.de> wrote:
> Hello Michael,
>
> Am 06.05.2014 18:53, schrieb Michael Kay:
>
>> If you have any applications that use or need such a function, please
>> could I have a brief description of the way it uses random numbers, e.g.
>> just wanting a single random number, a random permutation of 52 integers, an
>> arbitrary sequence of random numebrs for test data generation, etc.
>
>
> in contrast to everyone else (it seems) I implemented a *deterministic*
> random-number generator in XQuery [1] in order to generate reproducible
> fuzzy tests for my higher-order xq-modules project [2].
>
> I used a simple Linear Congruential Generator [3] and implemented the
> following functions:
>
>     rng:with-random-ints(
>       $seed as xs:integer,
>       $n as xs:integer,
>       $start as item()*,
>       $f as function(item()*, item()*) as item()*
>     )
>
> realizes a constant-space fold over `$n` randomly generated `xs:int`s with
> initial seed `$seed`. This is used by
>
>     rng:random-ints(
>       $seed as xs:integer,
>       $n as xs:integer
>     ) as xs:int* {
>       rng:with-random-ints($seed, $n, (), function($seq, $i) {
>         ($seq, $i)
>       })
>     };
>
> to generate a sequence of random `xs:int` values. The module is used in some
> tests [4,5].
>
> You could obviously also implement `rng:with-random-ints(...)` in terms of
> `rng:random-ints(...)`:
>
>     rng:with-random-ints($seed, $n, $start, $f) {
>       fn:fold-left(rng:random-ints($seed, $n), $start, $f)
>     };
>
> -- Leo
>
> [1]
> https://github.com/LeoWoerteler/xq-modules/blob/master/src/main/xquery/modules/rng.xqm
> [2] https://github.com/LeoWoerteler/xq-modules
> [3] http://en.wikipedia.org/wiki/Linear_congruential_generator
> [4]
> https://github.com/LeoWoerteler/xq-modules/blob/master/src/test/xquery/int-set-test.xq#L33-51
> [5]
> https://github.com/LeoWoerteler/xq-modules/blob/master/src/test/xquery/heap-test.xq#L10
> _______________________________________________
> talk at x-query.com
> http://x-query.com/mailman/listinfo/talk



More information about the talk mailing list