[xquery-talk] [xsl] Re: Random number generation : requirements

Michael Sokolov msokolov at safaribooksonline.com
Tue May 6 16:15:56 PDT 2014


My 2c:

I used an XQuery function based on Dmitry's version before; it works 
fine although it's a little inconvenient to have to keep passing in the 
prior value.

I would say the most convenient (or at least the most familiar) 
signature for a random function is random($n) returning a random number 
between 0 inclusive and $n exclusive; ideally it would return integers 
if $n is an integer, floating point numbers if $n is a floating point 
number, empty if $n is empty ? and an error otherwise.  And I would like 
a seed function.  Ideally this should be callable many times: I'm not 
sure how that could be done non-deterministically though.

I suppose a sequence would be useful, but it isn't the first thing that 
leaps to mind.  What if I'm not sure how many I'll need?

For example, one use case for me was to load a huge amount of data, and 
only include 1% of it, in order to generate a predictable test data 
sub-set. I want to write an XSLT template that returns nothing 99% of 
the time, and for the other 1% of the time it processed the content 
normally.  I want this to be based on an identifier in the content so 
that for a given seed, the same "random" 1% are selected each time: it 
should *not* be order-dependent, rather I would like to seed the random 
number generator with a hash of a given seed that is a configuration 
parameter, and a node-identifier, and then evaluate the next random 
number to see if it is > 0.01 (say).  Maybe there are other ways to do 
that, but that is what I did using Java.

-Mike


On 5/6/2014 6:58 PM, Michael Kay wrote:
> The big problem with a nondeterministic random() function is not defining the order of execution, but preventing it being optimised out of a loop. For example, how do we ensure that
>
> $xxx[random() gt 0.5]
>
> doesn't select either all the values or none?
>
> Anyway, we're not planning to do non-determinism. This exercise is about designing a deterministic way to meet the requirement.
>
> Michael Kay
> Saxonica
>
> On 6 May 2014, at 23:48, Michael Sokolov <msokolov at safaribooksonline.com> wrote:
>
>> On 5/6/2014 6:41 PM, Michael Kay mike at saxonica.com wrote:
>>>> My policy on side effects is: all expressions containing side effects are going to be evaluated in order
>>>>
>>> I do something like that in Saxon as well. But I don't attempt to define what "in order" means; for example, the order in which different global variables are evaluated. Doing this in the spec would be much more problematic.
>>>
>> You don't think it would be reasonable to say something to the effect that the order in which non-deterministic expressions are evaluated is non-deterministic (ie implementation-defined)? Certainly it would be reasonable enough in the case of a random number generator.  Although I suppose if you are going to seed it, you would like the seed to effect the random numbers that are generated.
>>
>> -Mike
>> _______________________________________________
>> talk at x-query.com
>> http://x-query.com/mailman/listinfo/talk



More information about the talk mailing list