May 10, 2015

The Terminology of Mask Lists

HomeNext Lesson

Related Dyspoissometer functions: dyspoissometer_uint_list_malloc_zero(), dyspoissometer_freq_max_minus_1_get(), dyspoissometer_pop_list_get(), dyspoissometer_freq_list_get(), dyspoissometer_freq_list_update_autoscale().

In order to understand dyspoissonism, it is necessary to start with a few basic definitions.

First of all, we have the concept of a "mask list". (Mathematicians would call this a "mask set", but computer programmers think of "set" as a verb with an entirely different meaning, so "list" is preferable. We use G in the math and "mask_list" in the source code.) A mask list is a nonnull list of whole numbers, each of which on some (inclusive) interval, [0, Z-1]. That is, no integer in the list can be less than zero or more than (Z-1), where Z is called the "mask span". For example, suppose that (Z=6). We could then have the following value of G:

G = {1, 5, 2, 2, 0, 3, 5, 1, 2}

Each whole number in G is called a "mask". But there is another set of whole numbers in G, relaying the position information of each mask: the mask indexes. Index zero contains (mask = 1); index one contains (mask = 5); and so on. (The terminology originated for historical reasons prompting the discovery of generalized dyspoissonism, although we now refer to "mask indexes" instead of "block numbers", "mask span" instead of "mask count", and "mask count" instead of "block count".)

In this case, we have 9 mask indexes, numbered zero through 8. So mask indexes are all on [0, 8]. The maximum such index, denoted mask_idx_max in the Dyspoissometer source code, is thus 8. Thus the "mask count", denoted Q in the math, is 9.

By the way, notice that masks all fall on a dense interval (with no skipped integers), so we are implicitly assuming that you have already performed a mapping from "real world symbols" to whole numbers. For example, the masks might correspond to English words, people, fruits, cars, or whatever.

Look up at mask_list. Notice that there is a single zero, two 1s, three 2s, one 3, no 4s, and two 5s. These are what we call the "frequencies" of these masks. We can make a list, denoted J in the math and "freq_list" in the code, which contains 6 items, one for each mask on [0, mask_max], showing its frequency:

J = {1, 2, 3, 1, 0, 2}

As a sanity check, note that the sum of the frequencies equals the mask count:

1 + 2 + 3 + 1 + 0 + 2 = 9

In turn, we can create a "population list", denoted as H in the math and "pop_list" in the code, containing the frequencies of each frequency. ("Population" and "frequency" mean basically the same thing in this case, but we choose to use the former as an unambiguous short form for "frequency of frequency".) Again, we can count the frequencies: we have one 0, two 1s, two 2s, and one 3:

H = {2, 2, 1}

Oops! What happened to that zero? Shouldn't it be:

H = {1, 2, 2, 1}

Well, not by convention in this case. We omit the population of frequency zero -- in other words, the number of possible (allowed) masks which do not occur in mask_list. The reason is that it's usually implicit because mask_max is given and we have all the other nonzero populations in H. The population of frequency zero is referred to as H0 ("H zero"). Similarly, the population of frequency F is denoted H[F] (note the brackets). Also by convention, population lists always terminate with a nonzero value and do not omit any nonzero populations (other than H0).

Once again, as a sanity check, note that the sum of populations equals the mask span:

(1 + 2 + 2 + 1) = 6

For further verification, note that the sum of (F H[F]) equals the mask count:

(1 * 2) + (2 * 2) + (3 * 1) = 9


No comments:

Post a Comment