May 11, 2015

The Logfreedom Formula

HomePrevious Lesson / Next Lesson

Related Dyspoissometer functions: dyspoissometer_logfreedom_dense_get(), dyspoissometer_logfreedom_sparse_get(), dyspoissometer_u16_list_logfreedom_get(), dyspoissometer_u24_list_logfreedom_get(), dyspoissometer_u32_list_logfreedom_get(), dyspoissometer_u8_list_logfreedom_get().

From the previous lesson, we have following formula for way count, W:

Formula 1.

X = ((Z!) / ((H0!) Π(F = (1, K), H[F]!))
Y = (Q!) / Π(F = (1, K), (F!) ^ H[F])
W = XY

Obviously, W gets too large to handle very quickly, even exceeding physical memory size for the sake of storing a single integer in some practical cases. What to do?

Fortunately, W happens to be composed of nothing more than the products and ratios of a bunch of whole numbers. So we can easily create an expression for its natural logarithm. We abbreviate the natural logarithm as "log" or "ln". The logarithm to base 2 is denoted "log2"; the logarithm to base 10 is denoted "log10"; etc. Fortunately, the log of the way count is indeed susceptible to accurate representation using everyday floating-point arithmetic. Recall that:

Identity 1.

log(AB) ≡ log(A) + log(B)

and

Identity 2.

log(A / B) ≡ log(A) - log(B), B nonzero

and

Identity 3.

log(A^B) ≡ B log(A)

Note that in the identities above we have used the "threequals" symbol ("≡") because they are identities as opposed to solvable equations.

Quiz: What is the log of ((7!)/(3!))?

Answer: (log(7) + log(6) + log(5) + log(4)), or about 6.73.

Given H, a population list the sum of whose K items, corresponding to frequencies one through K, is Z (the mask span). Let Q be the "implied mask count":

Q = Σ(F = (1, K, H[F])

Then the way count, W, is given by Formula 1 above. We define log(W) as the "logfreedom", L, of H and Z. Applying identities 1, 2, and 3 4 to Formula 1 gives:

Formula 2.

L = log(X) + log(Y)
L = log(Z!) - log(H0!) - Σ(F = (1, K), log(H[F]!)) + log(Q!) - Σ(F = (1, K), H[F] log(F!))
L = log(Q!) + log(Z!) - log(H0!) - Σ(F = (1, K), log(H[F]!)) - Σ(F = (1, K), H[F] log(F!))

Formula 2 is what we call the "logfreedom formula". It appears in many instances throughout the Dyspoissometer source code. It's quite easy to compute, given H and Z. The only troublesome terms are of the form log(N!). This uncomplicated expression actually gives rise to a fascinating mathematical odyssey, which is the subject of the next lesson.

By the way, the explicit reference to log(H0!) is redundant because 0! is simply one, so we could run the last 2 terms from (F = 0) instead of (F = 1) without introducing a singularity. But we isolate log(H0!) for safety reasons: some builtin functions for computing the log-of-factorial cannot take zero as an input.

No comments:

Post a Comment