May 11, 2015

Random Number Generators and Logfreedom

HomePrevious Lesson / Next Lesson

Related Dyspoissometer functions: dyspoissometer_logfreedom_median_get(), dyspoissometer_mask_list_pseudoramdom_get().

There are 2 basic types of random number generators: (1) pseudorandom number generators (PRNGs), which are deterministic integer systems which produce output that appears (to humans) to be random (apart from the fact that they can be made to recur, given the same seed (input value); and (2) true random number generators (TRNGs) which produce random bits from a physical noise source. Traditionally, the stated goal of PRNG design is to mimic TRNGs so faithfully as to be indistinct from them, apart from the aforementioned seed issue.

Logfreedom shows such a design goal to be fundamentally misguided. Here's why:

Imagine that we have an "ideal" binary TRNG -- one without any bitwise bias whatsoever. Assume that we use it to create a mask list. For maximal statistical significance, Q and Z should be equal, say, (2^16) for binary convenience. Should we expect this ideal TRNG to generate the most random possible mask list (with maximum logfreedom)?

No, because zero bias does not guarantee maximum randomness -- quite the opposite. What it does imply is that the output is just as likely to be any particular mask list as any other -- rarely, one with very low logfreedom (high dyspoissonism). So how can we tell the difference between an unbiased TRNG and a biased one, without sampling an astronomical number of mask lists?

One good approach is to look at the median logfreedom. The median is simply the middle value in a list (not to be confused with the mean, which is the average value). We use the median because computing the mean tends to involve a significant degree of error due to the tiny differences among most logfreedom samples, competing for precision with the enormous differences among a few others. Nevertheless, in theory, analyzing the mean should be useful.

Determining the median logfreedom is conceptually easy:

1. Create an unbiased pseudorandom mask list. (Obviously, we don't want to bias our notion of median logfreedom due to biases in the PRNG used to generate the list. So obviously we need to confident that the PRNG has an intractably long limit cycle, for starters.)

2. Measure its logfreedom and append it to a list.

3. Loop to step 1 until the desired number of samples has been reached.

4. Sort the list of logfreedoms.

5. Read the logfreedom halfway down the list.

Dyspoissometer uses various statistical tricks to emulate the above steps without a sort process, but nevertheless produces the correct median once the list has been populated. See dyspoissometer_logfreedom_median_get() in the source code for details.

An ideal TRNG should have very nearly the same median logfreedom as the purportedly unbiased PRNG employed above. If the medians do not closely agree, then at least one of the generators is weak.

But what if the TRNG consistently produces logfreedoms well above the PRNG's median? Assuming that the PRNG was indistinguishable from random, that would still be a sign of statistical bias, as though the TRNG were artificially contrived to do so. Think of the extreme case: one could create a PRNG masquerading as a TRNG which always outputs the same mask list, which happens to have been architected to have maximum logfreedom.

By the way, precisely computing the median, mean, or maximum logfreedom is computationally hard, and perhaps combinatorially so. So we must rest content with approximation pending further insights into this problem.

Thus an ideal TRNG actually creates mask lists with logfreedoms above and below the median with equal probability. TRNGs which do not do this are probably weak.

But by definition, a PRNG produces only one mask list. It could be intractably large, for example, the set of all (2 ^ 128) masks produced by AES128 encryption using equally many different keys as inputs; in which case, we would need a minimodel thereof in order to extrapolate its randomness quality (for example, a similar AES32 implementation). In any event, if we have only one mask list, with each output (mask) correspdoning to a unique input (mask index), then commonsense security considerations say that we should make it as random as possible within the constraints of time and storage. In other words, a good PRNG should have a minimodel implementation with as close to maximum logfreedom as possible.

In summary, an ideal TRNG exhibits no bias with respect to median logfreedom and certainly makes no attempt to restrict itself to maximum logfreedom; whereas the mask list produced by an ideal PRNG does indeed approximate maximum logfreedom to the degree attainable within its design constraints.

Fortunately, Dyspoissometer does contain a fast algo for computing maximum logfreedom. This is important because it can provide us with an accurate picture of the "information sparsity" of a mask list. This is the topic of the next lesson.


No comments:

Post a Comment