Thursday, July 24, 2014
Murder and NLP: The Taman Shud Case Splash Page
Taman Shud 1: Ngram frequency indicates that the Taman Shud "cipher" (TSC) is almost certainly an initialism (initial letters from the words in a text, in the same order as the original words) drawn from one or more short English texts.
Taman Shud 2: A search of over 20,000 books in the Project Gutenberg collection finds no complete or nearly-complete passages of which the TSC is an initialism. In a comment, Barry Traish reports on a similar search, also finding no matches, that doubles the number of books considered. This includes a plurality of prominent literary works dated 1925 or earlier, but does not exclude the possibility of a match with some more obscure text.
Taman Shud 3: A simple existence proof and discussion showing that initialisms are not, in general, decodable. By extension, initialisms are not useful as a code for espionage or any interpersonal communication in general.
Taman Shud 4: A discussion of six possible reasons why someone would write down an initialism. While it is impossible to choose among these and other possible reasons why the TSC was written, two or three seem more plausible than others. Three possible follow-up investigations are described, although these do not seem especially likely to provide definitive answers.
Overall, I believe that the TSC was written by a person for the purpose of quickly jotting down some idea, using initials as a sort of shorthand readable only by the person while they had those words fresh in their memory. This may have been an original composition or something by another writer/speaker that they were trying to remember/memorize. An initialism is not useful for a person who is trying to send a message to another person, so the TSC is not a code sent by a professional spy and is unlikely to be any attempt to communicate. Therefore, it is unlikely that anyone will ever decode the TSC and thereby learn more about the mysterious case of the Somerton Man's death.