The Indus Valley Script

The Indus valley script — is it a language or a bunch of pictograms? There is a school of thought which believes it’s a bunch of pictograms — typically of fish, rings, cows’ heads, and men. It seems now that this is not true.

In Ronojoy Adhikari’s (one of the authors of the work) words:

What we have done is to compare the entropy associated with the conditional probability of a token following a given token, in a sequence of tokens. These tokens could be letters in words, words in sentences, base pairs in DNA, keywords in computer, and so on. The functional role of tokens is not studied, but only the order in which they appear in sequences. In other words, we focus on syntax and not on semantics. The entropy of this conditional probability is, then, a measure of how much order there is in the sequence. If token order is irrelevant, as would be in a random collection of tokens, the entropy is large. If token order is highly constrained, the entropy is small.

With this in hand, we compare sequences of both linguistic and non-linguistic tokens : English, both words and letters, Sumerian, Old Tamil, Sanskrit (linguistic), and DNA code, Fortran code, Kudurru inscriptions and Vinca symbols (non-linguistic). The entropy of all the linguistic systems falls within a narrow band, while the non-linguist sequences either have large (DNA, …) or small (Fortran, …) entropy.

Repeating the same for the Indus sequences, we find that they fall right in the middle of the linguistic band. Thus, in the sense of syntax, the Indus script is far more akin to natural language, than to non-linguistic systems like DNA, Fortran, Kudurru and Vinca.

Authors of this work are Rajesh Rao of the University of Washington along with Nisha Yadav and Mayank Vahia at the Tata Institute of Fundamental Research in Mumbai, India; Hrishikesh Joglekar, Mumbai; R. Adhikari at the Institute of Mathematical Sciences in Chennai, India; and Iravatham Mahadevan at the Indus Research Center in Chennai. The research was supported by the Packard Foundation and the Sir Jamsetji Tata Trust.

More information about this work is in Science Daily, the earlier ‘foundation’ paper is on the arXiv and the Science paper is available here (requires subscription). Some information about the Indus Valley Civilisation is here.

Tailpiece: Steve Farmer et al., the original proponents of the ‘Indus Valley script is not a language’ have put up a refutation of the above work (in somewhat intemperate language methinks) here.


  2. Shubashree Says:

    Hi Rahul,

    I tried to read this on “Science ” but couldn’t, thanks for the reference and for summing up the physicists contribution to this problem. I’m still not clear about the initial problem – BY saying a bunch of pictograms, do you mean each symbol -fish or cow’s head or whatever would be an isolated drawing, which is not in any definite sequence?


  3. Ronojoy Adhikari Says:

    You can read the paper (and other work from our collaboration) at As is mentioned in the summary in Rahul’s post, we do not study the functional role of the symbols in sequences : they could be anything. It is this which allows us to study within the same framework, “languages” as diverse as the DNA, natural language, Fortran and so on.

  4. Shubashree Says:

    Thanks Ronojoy. I will look up the reference. I also read the story in The Hindu today with your explanation.

    That is interesting work… More after reading the wikidot page.

