In previous posts, I've shown that the Tamam Shud Cipher (TSC)
is almost certainly an initialism, a sequence of initials corresponding to
English text(s), and that a large corpus of prominent literature does not contain
the text from which the TSC was derived. This leads to the next question: Given
the strong possibility that the text was written by someone (presumably
non-famous) who handled the book in which it was found, is it possible for us
to decode it? Are initialisms based on English text in general decodable? In my
discussion here, I interpret the potentially ambiguous handwriting in a certain
common way. Other interpretations may be valid, and we can consider that
statistically, but I focus for now on this interpretation of the handwriting:
WRGOABABD
WTBIMPANETP
MLIABOAIAQC
ITTMTSAMSTGAB
Example
TSC Readings
Consider the following:
(1)
We rarely go onto Australia’s beaches and bed down.
Wade the beaches in majestic peace and nervously enter the
Pacific.
My love, I am blessedly opened, and I am quite certain
In the truth. Mercifully, the sleeper awakens me, stirs the
girl, and blossoms.
(2)
Western radar groups operate Australian bases and base
defenses.
Weapons testing base is militarily prepared. American nationals
electrified the perimeter. Military liaisons in Adelaide boarded on an international
aircraft. Queensland considering increases to territorial military trainees. South
American mercenaries sent to Guam and Borneo.
Both of these texts are lucid (if not sparkling) English
text that correspond to the TSC (give or take alternate interpretations of the
handwriting). They took me about 15 minutes each to write. One is a love poem
(somewhat like the text of the Rubaiyat
in which the TSC was found) and one looks like something a Soviet spy might
send to his superiors. A group of writers could surely come up with endless TSC
solutions on these or practically any other themes and probably never,
duplicate each other's work. The simple fact demonstrated here is: The TSC,
like most initialisms, is undecodable into the original text because it has
virtually limitless solutions. And therefore, if the TSC is not found in some
previously existing text, the source text that generated it will never be
known.
Grammatical Analysis
It is easy to lay out, analytically, why most initialisms
from English text have endless numbers of solutions. Most of what I say here
will apply to a great number of other languages, but I discuss English
specifically.
English vocabulary falls into two rough subsets. There are
function words, generally drawn from closed classes of words, and content
words, generally drawn from open classes of words. Nouns, verbs, adjectives,
and adverbs are content words. They comprise the vast majority of all of the
words in English. Pronouns, conjunctions, prepositions, and articles are
function words. Those parts of speech have only about 3 to 90 words each.
For all but the rarest letters in English, you can come up
with a very long list of any of the content word classes that begin with that
letter. For the function words, this is not possible. For example, there are
only three coordinating conjunctions: and,
or, and but. There are no coordinating conjunctions in English that begin
with 'z', ‘t’, or ‘e’.
Take any typical passage in English text, and write down the
initialism. You may then freely generate almost endless alternative readings of
the initialism by changing the content words to other examples of the same part
of speech.
"The platypus is one of the strangest animals in
Australian fauna."
"The pioneer is one of the strongest archetypes in
American fiction."
"The pastegh is one of the sweetest aliments in
Armenian food."
It's easy to do this with the content words in virtually any
sentence. So even if we kept the function words the same (as in the previous
examples), we can still generate many sentences with the same initialism and
warp the meaning entirely.
It is relatively difficult to play the same trick with
function words, because there are so few options to swap in. However, we could
choose different function words in different locations, in effect moving the
pivot of function words to another location and then manipulate the content
words in their new positions.
"Tommy, play in our old toyroom since Annie is acting
funny."
Most function words begin with relatively common letters in
English, which is true almost by definition. These letters can be used in other
words, content or function, and so the location of function words in an
initialism cannot be pinned down.
There are rare letters that could greatly restrict the
freedom of recombination shown above. A sequence like XXQXX in an initialism
might have actually no valid readings in English. However, the TSC has only one
rare letter, Q. Does the Q, or any other pattern in the TSC significantly
constrain the range of possibilities for TSC readings? If we can't determine
for absolute certain the reading of TSC, can we meaningfully narrow it down?
Learning from Examples
Using the same Project Gutenberg corpus which was searched
for matches of TSC substrings, we can search for shorter substrings to see
which phrases might match them. Substrings of length 6 are useful for providing
multiple matches (at least 7 unique readings) for each position in the TSC. If
the particular letters in the TSC constrain the possible readings
significantly, then we should see repeated patterns in the Gutenberg matches.
It should be noted first that the Gutenberg corpus has
multiple copies of some texts within it, which inflates the counts unnaturally.
This observation notwithstanding, there is no substring of length 6 in the TSC
that has any single reading which comprises the majority of its Gutenberg hits.
In other words, whichever reading we guess to be correct, it is wrong in the
majority of cases – over 60%, in fact. Therefore, the would-be sleuth who
writes a reading of the TSC and feels that their match is sure to be right is
being seduced by the fallacy that the solution they have in mind is rare in
matching the text. The sequence that achieves the best match is "do well
to bear in mind", which still only covers 38% of the matches for DWTBIM,
and is one of 59 different readings found in the Gutenberg corpus. For any
6gram in TSC, whichever guess you offer for the correct reading, you will
probably guess wrong.
Can we do better trying to pin down exact words? If we use the
6gram readings from the Gutenberg corpus and tally (counting each reading just
once, even if it appeared multiple times) how often particular words are used
to fill the specific positions in TSC, and call the share that each word has
for that position the derived probability. In these values, we see the same
inherent ambiguity as indicated above. Every position allows at least 7
different words to stand for that letter, and in very few cases is the most
common case more than 20% frequent, meaning that whatever word we guess in that
position, we will probably guess differently than the original text. There is
just one case where the most common case rises slightly above 50%: the A in
position 24 is filled 53% of the time by the article “a”, a poor, and in any
case ambiguous, starting point for interpreting the text. Almost all of the 33
derived probabilities that exceed 20% are exceedingly function words: “a”,
“the”, “and”, “in”, “of”, etc. These give not even the slightest indication of
topic, genre, or even tense or person.
Four words have a derived probability between 20% and 34%
and provide a slight indication of tense and person: There are two such
occurrences of “my” (positions 14 and 21), and one each of “is” (position 23)
and “am” (position 29). These effectively indicate three votes for the correct
TSC reading being written in the first person and two votes for the present
tense. These are difficult to interpret as probabilities, however, since each
of these votes is, in any case, less than 50% probable, and there’s no clear prior
probability of tense and person for a random unknown text. It should be noted
that a first person text still contains many third person references, and a
text that is primarily in past or present tense still may contain many
instances of the other. Therefore, we have a glimmering of an indication that
TSC may be an initialism of a first person text, but this is far from
conclusive, and in no other way helpful regarding the content. One related
observation: There are no occurrences of Y in the TSC, so there are no second
person pronouns (“you”, “your”, “yours”) although the second person can be
spoken of through circumlocution without those words.
Finally, the derived probability of “quite” for the Q in
position 30 is 43%, high among the derived probabilities, but still short of
50%, and utterly ambiguous regarding genre, topic, or content.
Summary
Cumulatively, we have conclusive evidence that the TSC is an
initialism, no source text has been identified in literature, and if the source
text is not found in an older source, it cannot be decoded into the original
text.
This is, most importantly, a strongly negative result for
those who have hoped that the TSC could be deciphered, helping to solve the
mystery of the Somerton Man, which is still quite a bizarre story even if the
TSC is left aside. It still leaves an intriguing situation in which the TSC,
which came to light because of the Somerton Man, is effectively a second
mystery, which one might have found and glossed over if it were not associated
with a dead body, but has been elevated in importance because of the body.
There are doubtlessly countless books in the world’s libraries that have
mysterious scribbles inside, and no one pays them any particular attention. (I
have found some in my older books, written in my own hand, mysterious to me
years after they were written.) However, since the case has gotten so much
attention, I’ll devote one more post to examining what the TSC might represent
– why an initialism might have been written, and what remaining, however
slight, possibilities exist for obtaining a definite reading of its content.
No comments:
Post a Comment