[xquery-talk] [xsl] Re: Random number generation : requirements
msokolov at safaribooksonline.com
Tue May 6 16:15:56 PDT 2014
I used an XQuery function based on Dmitry's version before; it works
fine although it's a little inconvenient to have to keep passing in the
I would say the most convenient (or at least the most familiar)
signature for a random function is random($n) returning a random number
between 0 inclusive and $n exclusive; ideally it would return integers
if $n is an integer, floating point numbers if $n is a floating point
number, empty if $n is empty ? and an error otherwise. And I would like
a seed function. Ideally this should be callable many times: I'm not
sure how that could be done non-deterministically though.
I suppose a sequence would be useful, but it isn't the first thing that
leaps to mind. What if I'm not sure how many I'll need?
For example, one use case for me was to load a huge amount of data, and
only include 1% of it, in order to generate a predictable test data
sub-set. I want to write an XSLT template that returns nothing 99% of
the time, and for the other 1% of the time it processed the content
normally. I want this to be based on an identifier in the content so
that for a given seed, the same "random" 1% are selected each time: it
should *not* be order-dependent, rather I would like to seed the random
number generator with a hash of a given seed that is a configuration
parameter, and a node-identifier, and then evaluate the next random
number to see if it is > 0.01 (say). Maybe there are other ways to do
that, but that is what I did using Java.
On 5/6/2014 6:58 PM, Michael Kay wrote:
> The big problem with a nondeterministic random() function is not defining the order of execution, but preventing it being optimised out of a loop. For example, how do we ensure that
> $xxx[random() gt 0.5]
> doesn't select either all the values or none?
> Anyway, we're not planning to do non-determinism. This exercise is about designing a deterministic way to meet the requirement.
> Michael Kay
> On 6 May 2014, at 23:48, Michael Sokolov <msokolov at safaribooksonline.com> wrote:
>> On 5/6/2014 6:41 PM, Michael Kay mike at saxonica.com wrote:
>>>> My policy on side effects is: all expressions containing side effects are going to be evaluated in order
>>> I do something like that in Saxon as well. But I don't attempt to define what "in order" means; for example, the order in which different global variables are evaluated. Doing this in the spec would be much more problematic.
>> You don't think it would be reasonable to say something to the effect that the order in which non-deterministic expressions are evaluated is non-deterministic (ie implementation-defined)? Certainly it would be reasonable enough in the case of a random number generator. Although I suppose if you are going to seed it, you would like the seed to effect the random numbers that are generated.
>> talk at x-query.com
More information about the talk