you're reading...

Markov analysis and Indus script

Computation analysis is making a big difference in the attempt to decipher the enigmatic Indus script. But the effort to compute the pattern of Indus signs began with G R Hunter who hand-enumerated frequently occurring clusters of signs segmenting Indus texts into short ‘words’ of two or more signs. He began before the advent of computers.

He managed to infer important sentence characteristics of the Indus script and discovered that there were tendencies of certain symbols and words to occur at specific positions within texts. In fact, Hunter was the first to note that the ‘jar’ sign, a frequently occurring sign, acts as a ‘word-ender’ and that the sign of ‘fish’ frequently occurs in pairs occupying the same relative position within texts.

By 1960s, Finnish Indologist Asko Parpola and Russian linguist Yuri Knorozov (who was instrumental in deciphering the Mayan hieroglyphs) independently, with the aid of computers, bolstered Hunter’s thesis that the sign clusters have particular ‘positions’ within the Indus texts. In 2008, researcher N Yadav et at. proved the frequency of certain 2-, 3-, and 4- sign combinations is higher than chance would have dictated thus implying a probability for a ‘intentional pattern’ of signs within the text and that a majority of the texts longer than 5-signs can be segmented into smaller frequently occurring sign combinations. (“Segmentation of Indus Texts” – International Journal of Dravidian Linguistics).

Computer scientist Rajesh P.N. Rao (RPN Rao) of the University of Washington opines that “such regularities point to the existence of distinctive syntactic rules underlying the Indus texts,” and that “one way to capture such sequential order is to learn a Markov model for script from available texts.”  In his paper – ‘Probabilistic Analysis of an Ancient Undeciphered Script – RPN Rao explains that “the simplest  (first-order) model estimates the transition probabilities P (S-i|S-j) that sign-i follows sign-j. The obvious way of estimating P (S-i|S-j) – transition probabilities – is to count the number of times sign-i follows sign-j – an approach equivalent to maximum likelihood estimation.” This method should yield a sequential pattern of sign occurrence.

However, with 400 signs and only a few thousand texts, a large number of single pairs “will have a frequency of 0 even though their actual probability may not necessarily be 0.”  Applying a dual method – a modified technique (after Kneser-Ney algorithm) to smooth the statistical bumps with the data for training the model obtained from Iravatham Mahadevan’s The Indus Script: Texts, Concordance and Tables (ASI/1977) – the Markov model “can reveal interesting subunits of grammatical structure and recurring patterns,” points out RPN Rao.

Given the success of the first-order Markov model toward predicting signs “deliberately obliterated for testing purposes” with a 75% accuracy level in a five-fold cross-validation, it amplified the real possibility of restoring the missing or actually damaged signs in the text. In 2010, N Yadav et.al used a new N-gram model which was “essentially an (N-1)-th-order Markov chain” (to quote RPN Rao) “where the transition probability depends on the previous N-1 symbol instead of just the previous symbol.” The results suggest that a bigram model (N=2) captures a significant portion of the syntax, with trigrams and quadrigrams making more modest contributions,” points out RPN Rao.

RPN Rao explains that the smoothed first-order Markov model can be used to compute ‘contional entropy’ which measures the ‘average flexibility allowed in choosing the next sign given a preceding sign.’ The conditional entropy of Indus texts falls within the range of natural languages though “entropic similarity to the natural languages by itself is not sufficient to prove that the Indus script is linguistic,” points out RPN Rao who delves on conditional entropy in his paper “Entropic Evidence for Linguistic Structure in the Indus Script”  in Science/09.

Yet, as far as the Indus script goes, all these Markov modelling increases the probability in the Bayesian sense that the Indus script represents language. “Recently proposed algorithms for probabilistic grammar induction could allow construction of a partial grammar for the Indus texts, facilitating the identification of root words, suffixes, prefixes, and other modifiers,” claims Rajesh P.N. Rao who concludes thus: “the study of the
Indus script has emerged as an exciting area of interdisciplinary research, offering a unique opportunity for probabilistic models to shed new light on one of the world’s oldest civilizations.”

Computation analysis has opened up newer possibilities toward deciphering the Indus script in an environment imbued with mathematical logic and statistical correlations which should neutralize the acidic ideologies that gnaw at the bottom of the field.

TxtSource: Probabilistic Analysis of an Ancient Undeciphered Script by Rajesh P.N. Rao /IEEE Computer Society/.



No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s



%d bloggers like this: