May 13, 2015

Dyspoissometer Demo Output

Home

Even if you cannot build Dyspoissometer on your system, the demo output is a tutorial in its own right. Here is what it printed out on my 64-bit Linux system, after compiling for quad precision ("make demo_quad"):

---------------------------------------------------------------------------
Dyspoissometer Demo
Copyright 2016 Russell Leidich
http://dyspoissonism.blogspot.com
build_id_in_hex=00000020

Follow along with main() in demo.c, so you can learn how to use Dyspoissometer.
The math details are explained at the webpage linked above.

First of all, we have already called dyspoissometer_init() to ensure that (1)
Dyspoissometer is compatible with the way we intend to use it and (2) all
critical fixes which this demo knows about are present.

So let's get on with the demo. Given that:

mask_idx_max=000000000000001D
mask_max=0000000000000008

We then have the following list (mask_list0_base) of (mask_idx_max+1) masks,
all on the interval [0, mask_max]. It looks random -- but how random?

mask_list0_base[000000000000001E]={
  0000000000000004,0000000000000001,0000000000000006,0000000000000007,
  0000000000000006,0000000000000008,0000000000000000,0000000000000004,
  0000000000000007,0000000000000004,0000000000000006,0000000000000001,
  0000000000000000,0000000000000004,0000000000000006,0000000000000003,
  0000000000000000,0000000000000001,0000000000000002,0000000000000008,
  0000000000000008,0000000000000005,0000000000000003,0000000000000001,
  0000000000000007,0000000000000002,0000000000000001,0000000000000004,
  0000000000000007,0000000000000003
}

In order to answer this question, we first need to count how many of each mask
there are in a process called "frequency accrual". Begin by calling
dyspoissometer_uint_list_malloc_zero().

OK, that worked. Now we need to determine the frequency of each mask using the
freq_list0_base which we just allocated and has been set to zero.
freq_list0_base[N] will then contain the frequency of mask N. To do this, call
dyspoissometer_freq_list_update_autoscale() with (mask_count_implied==0) as
required on the first call. Watch out for the return value, which is ignored for
the purposes of this demo.

Done. The output is:

freq_list0_base[0000000000000009]={
  0000000000000003,0000000000000005,0000000000000002,0000000000000003,
  0000000000000005,0000000000000001,0000000000000004,0000000000000004,
  0000000000000003
}

So this means that mask zero has frequency 3 (occurs 3 times), mask one has
frequency 5, etc. You can confirm this by looking at mask_list0_base above.

Now we need to create a population list corresponding to the frequency list
which we just generated. This can be done via
dyspoissometer_pop_list_init().

So then here is the resulting population list:

pop_list0_base[0000000000000005]={
  0000000000000001,0000000000000001,0000000000000003,0000000000000002,
  0000000000000002
}

So this means that frequency one has population one (occurs once), frequency 2
occurs once, frequency 3 occurs 3 times, etc. You can confirm this by looking
at freq_list0_base above. (You can think of population as the frequency of the
frequency.)

Now that we have a population list, we can answer the "how random"
question. Start by getting its logfreedom from
dyspoissometer_logfreedom_dense_get():

logfreedom=+6.2282493264059594837404470739206729E+01

While this may not be very informative on its own, if we take (e==2.71828...)
raised to logfreedom, we'll get the number of possible sets of
(mask_idx_max+1) masks, each of which on [0, mask_max], which would result in
exactly the pop_list0_base given above:

way_count=+1.1192913403533332190720000000000000E+27

Trust me, way_count is an integer, which is most clearly visible on account of
trailing zeroes in DYSPOISSOMETER_NUMBER_QUAD mode (make demo_quad).

But most usefully, ceil(logfreedom/(log(2)) will provide us with the number of
bits required to combinatorially encode a mask list adhering to
pop_list0_base -- in other words, to guarantee that we can fully represent an
integer on the interval [0, way_count-1]:

bits_in_mask_set=0000005A

That is, if we ignore the cost of encoding pop_list0_base itself, we would
require 0x5A (90) bits of storage to encode mask_list0_base, given
mask_idx_max and mask_max with their values as given above.

Now we can compute the dyspoissonism of pop_list0_base using
dyspoissometer_dyspoissonism_get():

dyspoissonism=+5.5133858315519750531462878265266815E-02

Now that we know the actual logfreedom, how does it compare to the maximum
possible logfreedom attainable, given the constraints of mask_idx_max and
mask_max? We can answer this question with dyspoissometer_logfreedom_max_get(),
which uses a Monte Carlo method to provide an accurate approximation. (In this
case, we'll use 100 iterations, but in practice, you might need orders of
magnitude more, as convergence happens in somewhat unpredictable leaps.) Note
that we need to check for a negative return value, indicating memory
exhaustion. Also, for the sake of parallel searching, the function requires a
random number seed. Here's the result:

logfreedom_max=+6.2687958372167759219382483854671102E+01

But how can we have any faith in the accuracy of logfreedom_max (which by the
way is much more likely to be understated than overstated)? Well, we know that
the dyspoissonism floor (D0) is positive but approaches zero as the mask and
mask counts approach infinity. The denominator is (Q ln Z), where Q is the
mask count and Z is the mask span. So if our estimate of maximum logfreedom
is accurate, then it should be slightly less than this value. Let's see:

logfreedom_upper_bound=+6.5916737320086581483714714215351545E+01

So that worked fine. Now that we have what appears to be an accurate
approximation logfreedom_max, we can find D0, the dyspoissonism floor, from
dyspoissometer_dyspoissonism_get():

d0=+4.8982687541710707816454996837645826E-02

We could assess randomness by computing the ratio of dyspoissonism to D0, but
that approach would only be valid with constant mask count and mask span.
Fortunately, there is a more generic basis of comparison...

Returning to the original question, how random is mask_list0_base? One way to
answer this is to compute (logfreedom/logfreedom_max), which will give the
ratio of the information content in mask_list0_base to what would be expected
in the most random mask list compliant with mask_idx_max and mask_max:

information_density=+9.9353200967718573769967784308194695E-01

But because the information density is usually close to one, a more useful
metric is information _sparsity_, which is just one minus that quantity (but
should be computed using dyspoissometer_sparsity_get() for safety reasons:)

information_sparsity=+6.4679903228142623003221569180530485E-03

The great thing about information sparsity is that it can be used as a fair,
normalized, and unambiguous comparator across sets of different mask counts and
mask spans in order to assess randomness quality. Its only downside is that it
requires finding logfreedom_max, which is easily approximated, although the
error in which is somewhat unclear at present. This is precisely why
dyspoissonism is a better randomness comparator, provided that mask counts and
mask spans are held constant, as is often the case in signal analysis.

Note that, while dyspoissonism can only reach zero in the limit of an infinite
set (although it can readily reach one), information sparsity can reach either
extreme with any set.

Back to the original question: information_sparsity suggests that
mask_list0_base is almost -- but not quite -- as random as it could possibly
be. This is unsurprising for such a small mask list. But for large lists, an
elevated dyspoissonism or information sparsity can help to identify unseen
statistical biases -- and perhaps even predictable mathematical structure -- in
the data source.

Now let me show you another way to compute logfreedom. Suppose that we
generate a frequency list from a mask list:

freq_list1_base[0000000000000009]={
  0000000000000008,000000000000000B,000000000000000C,0000000000000009,
  000000000000000A,000000000000000A,0000000000000009,000000000000000B,
  000000000000000A
}

This means that mask zero has frequency 8, mask one has frequency 11, mask 2
has frequency 12, etc. But now, without taking the time to produce a population
list, we can compute the logfreedom directly using
dyspoissometer_logfreedom_sparse_get() (and again, checking for success on
return).

In order to do this, though, we must first allocate one other frequency list
equal in size to freq_list1_base above -- namely, 9 (DYSPOISSOMETER_UINT)s
because (mask_max==8). We use dyspoissometer_uint_list_malloc() for this
purpose. Here we go:

logfreedom=+1.9126308750039845767695139189084438E+02

OK, that worked. Now we can deallocate those 2 frequency lists that we just
allocated, using dyspoissometer_free()...

Done. Now, just to be paranoid, let's compute the population list of
freq_list1_base above:

pop_list1_base[000000000000000C]={
  0000000000000000,0000000000000000,0000000000000000,0000000000000000,
  0000000000000000,0000000000000000,0000000000000000,0000000000000001,
  0000000000000002,0000000000000003,0000000000000002,0000000000000001
}

You can verify that this is consistent with freq_list1_base: it contains one
frequency 8, 2 frequency 9s, etc. So we can now call
dyspoissometer_logfreedom_dense_get():

logfreedom=+1.9126308750039845767695139189084438E+02

As you can see, this matches the value above within the limits of numerical
error.

So as mentioned above, the minimum frequency with nonzero population is
frequency 8. Conveniently, we can avoid wasting resources dealing with all
those leading zeroes by calling dyspoissometer_logfreedom_dense_get() with
(freq_min==8) and the same freq_max_minus_1:

pop_list2_base[0000000000000005]={
  0000000000000001,0000000000000002,0000000000000003,0000000000000002,
  0000000000000001
}
logfreedom=+1.9126308750039845767695139189084438E+02

As expected, the logfreedom is unchanged.

From time to time, you may find it useful to analyze integer approximations of
Poisson distributions. dyspoissometer_poisson_term_get() can help you with
this. Unfortunately, there are many ways to map (analog) Poisson distributions
to integer approximations thereof. In theory, as the mask counts and mask spans
grow, the particular approximation method becomes less important, and the
information sparsity of the resulting discrete distribution approaches zero.
Here's a simple example using successively larger integer approximations of the
lambda-one Poisson distribution (LOPD). At each step, the mask count and mask
span are both multiplied by about 256. Let's see what happens. (This might
take a few minutes):

poisson_mask_idx_max=0000000000000100
poisson_mask_max=00000000000000FF
poisson_information_sparsity=+1.1294726924381859360730610406606087E-04
poisson_mask_idx_max=000000000000FF04
poisson_mask_max=000000000000FF00
poisson_information_sparsity=+7.7909880882871078266079775867239554E-07
poisson_mask_idx_max=0000000000FEFFFE
poisson_mask_max=0000000000FEFFFE
poisson_information_sparsity=+6.0988639656266055913442929059338777E-10
poisson_mask_idx_max=00000000FEFFFFF1
poisson_mask_max=00000000FEFFFFFE
poisson_information_sparsity=+6.9691891013055711558023866053140997E-15
poisson_mask_idx_max=000000FF0000000A
poisson_mask_max=000000FF00000000
poisson_information_sparsity=+2.7435640484631022740993965157706537E-15
poisson_mask_idx_max=0000FEFFFFFFFFFA
poisson_mask_max=0000FEFFFFFFFFFF
poisson_information_sparsity=+1.6849273151579334872820991222149401E-18
poisson_mask_idx_max=00FEFFFFFFFFFFF4
poisson_mask_max=00FEFFFFFFFFFFFE
poisson_information_sparsity=+5.2962575165164824927570952982632527E-25
poisson_mask_idx_max=FEFFFFFFFFFFFFC9
poisson_mask_max=FEFFFFFFFFFFFFFC
poisson_information_sparsity=+1.6668441166413227665972389848858803E-21

The trend above is not convincingly monotonic (if you "make demo_quad"). Is
this merely reflective of a poor approximation of logfreedom_max at high mask
counts? Or are Poisson distributions asymptotically but nonmonotonically of zero
information sparsity? This is a hard open question. (I'm even slightly doubtful
that the asymptotic hypothesis is true, because I'm only mostly convinced that
maximum information density is expected behavior in large mask lists.) In any
event, we frequently speak of Poisson distributions as though they are discrete
because their domain is the whole numbers; their range is analog, however,
which leaves a lot to be desired.

Speaking of distributions, Dyspoissometer has a way to generate pseudorandom
mask lists which adhere to a given mask count and mask span, namely
dyspoissometer_mask_list_pseudorandom_get(). This function is particularly
useful for doing statistical surveys of such lists, which for example could
be used to find the variance of dyspoissonism. For instance, here's a random
mask list for (mask_idx_max==37) and (mask_max==14):

mask_list_base[0000000000000026]={
  000000000000000A,0000000000000002,000000000000000E,0000000000000004,
  0000000000000008,000000000000000A,0000000000000009,0000000000000006,
  000000000000000A,0000000000000000,000000000000000C,0000000000000001,
  000000000000000C,000000000000000B,000000000000000A,0000000000000006,
  0000000000000006,000000000000000C,0000000000000004,0000000000000001,
  000000000000000A,0000000000000003,0000000000000009,000000000000000C,
  0000000000000003,0000000000000002,0000000000000007,000000000000000B,
  0000000000000001,000000000000000C,0000000000000004,0000000000000003,
  0000000000000006,000000000000000E,0000000000000009,0000000000000001,
  0000000000000007,0000000000000000
}

As you can see, mask_list_base contains 38 masks on [0, 14]. If you run this
demo using a different numerical precision, you will get a different random
mask list due to variances in the evolutionary pathway of
dyspoissometer_logfreedom_max_get(), as its decision process and therefore the
number of times it iterates the random seed vary with numerical error, despite
having asymtotically constant output.

Let's get on with a more exciting application of this generator, which is the
asymptotic discovery of median logfreedom, courtesy of
dyspoissometer_logfreedom_median_get(). It works like this: generate lots of
random mask lists, all of which adhering to the same mask counts and mask
spans. Then find the logfreedom of each such mask list, and accumulate them in
a logfreedom list. Finally, find the median value thereof. Median logfreedom is
more important than expected (mean) logfreedom because only the former
essentially guarantees that any random modification of the mask list has equal
probability to decrease or increase the logfreedom; in other words, median (not
mean!) logfreedom is an entropy equilibrium point. So a good true random number
generator should make mask lists whose median logfreedom is the aforementioned
value. Just watch the return value for error status!

logfreedom_median=+9.7136567195470554483466520955731780E+01

So the value above is an approximation of the median logfreedom, obtained from
1000 random mask lists. If you found the logfreedom of _every_ possible mask
list, sorted those logfreedoms, then took the middle value thereof, you should
find it to be close to if not exactly the value above (which is, in fact, an
actual logfreedom as opposed to an interpolated value).

Sanity check: the value above is slightly less than ((37+1) ln (14+1)), which
is about 103. Pass!

Now I have something amazing to show you. Take a look at pop_list1_base
above. Notice how it looks quite like a Poisson distribution with lambda
(expected frequency) equal to 10. By contrast, look at pop_list3_base below,
which has the same mask count and mask span.

pop_list3_base[000000000000000D]={
  0000000000000000,0000000000000000,0000000000000000,0000000000000000,
  0000000000000000,0000000000000000,0000000000000001,0000000000000001,
  0000000000000002,0000000000000001,0000000000000002,0000000000000001,
  0000000000000001
}

In particular, note that pop_list3_base is bimodal; it has 2 peaks -- very
unlike a Poisson distribution. So logically, it should have lower logfreedom:

logfreedom=+1.9218634690158870608247890704275966E+02

Whoa! What's going on here? How could such a distribution have _greater_
logfreedom, corresponding to lesser dyspoissonism and thus greater entropy?
And that's only the tip of the iceburg...

Let's look at every single neighbor that pop_list1_base has. A "neighbor" of a
population list is simply another population list which corresponds to a single
mask change. In other words, the population of nonzero frequency A decreases by
one, while the population of frequency (A-1) increases by one; simultaneously,
the population of frequency B decreases by one, while the population of
frequency (B+1) increases by one, where (A!=(B+1)). (In the code, frequency A
is called the "down right" frequency, and B is called the "up left" frequency.)
This is simply what happens when one mask is changed into another.

In classical physics, the Second Law of Thermodynamics implies that the entropy
of a closed system always increases over time (because if we make ice in the
freezer, then enough heat is dissipated into the kitchen to create a net
positive change the entropy of the entire house. So logically, at least one
neighbor of pop_list1_base should have greater logfreedom, because we
demonstrated above that its logfreedom is less than that of a "camel humps"
distribution. You'll have to look at the code in demo.c for this exercise, but
here are the logfreedoms of each of its neighbors:

logfreedom_self=+1.9126308750039845767695139189084438E+02
logfreedom_neighbor=+1.9034679676852430261176786467907636E+02
logfreedom_neighbor=+1.9094463376927992306114114467725377E+02
logfreedom_neighbor=+1.9085762239229029329497337877538004E+02
logfreedom_neighbor=+1.9077757968461675686914960081365358E+02
logfreedom_neighbor=+1.8977143262362074075688942666708872E+02
logfreedom_neighbor=+1.9106241680493630651567993878672431E+02
logfreedom_neighbor=+1.9097540542794667674951217288485058E+02
logfreedom_neighbor=+1.9089536272027314032368839492312412E+02
logfreedom_neighbor=+1.8998215365493639335934442862876736E+02
logfreedom_neighbor=+1.9076231221248596843492942665209929E+02
logfreedom_neighbor=+1.9108076594360450305073967386568985E+02
logfreedom_neighbor=+1.9100072323593096662491589590396342E+02
logfreedom_neighbor=+1.8967199872663255383741036763658378E+02
logfreedom_neighbor=+1.8974895976776868216239458468089895E+02
logfreedom_neighbor=+1.8978978176228893729194916174605424E+02
logfreedom_neighbor=+1.9040288623517534617554261590578599E+02
logfreedom_neighbor=+1.8935354499551401922160012042299319E+02
logfreedom_neighbor=+1.8965364958796435730235063255761818E+02
logfreedom_neighbor=+1.8996380451626819682428469354980174E+02
logfreedom_self=+1.9126308750039845767695139189084438E+02

Notice how _every_ neighbor has _lesser_ logfreedom. In other words,
pop_list1_base exists at a _local_ entropy maximum even though we just proved
above that it does _not_ exist at a _global_ maximum; if we change any single
mask, the result will be a population list with lesser (never equal)
logfreedom. This is profound. It implies that some discrete systems are so
heavily constrained that _any_ minimum quantum of change will send them into a
lower entropy state! If the universe is ultimately digital, then perhaps there
are natural examples of this phenomenon. Note that this goes beyond simple
quantum systems which occasionally lose entropy by sheer chance; in this case,
_any_ single step we take will result in lesser entropy!

I should add that it's precisely this lack of entropy gradient which makes the
deterministic discovery of D0 so difficult. Indeed, I only discovered this
problem when I wrote a function intended to do so by riding down the gradient.
dyspoissometer_logfreedom_max_get() is therefore somewhat nondeterministic,
although it rides gradients eagerly when possible.

I hope you learned something from this demo and find Dyspoissometer useful.
I have just one more suggestion before concluding: if you want a direct and
easy way to find logfreedom, try one of the
dyspoissometer_u8/u16/u24/u32_list_logfreedom_get() functions. Yes, we could
have used them above, but then you would not have learned as much about
dyspoissonism.

Scroll up or redirect to a file in order to see what you missed!
---------------------------------------------------------------------------

No comments:

Post a Comment