Stylometry

This is not – as one might assume from the etymology – the measuring of pencils, but a method of text analysis. Lorenzo Valle could be described as the first stylometrist, who established the forgery of the Donation of Constantine through stylistic comparisons around 1440. The term itself was introduced in 1890 by the Polish philosopher Wincenty Lutosławski. Three years earlier, physicist T. C. Mendenhall theorised that writers use some aspects of their writing style unconsciously. This part is to be recorded with scientific methods, since the involuntary characteristics for each writer are as individual as their fingerprints. From now on, it should be much easier to attribute anonymous texts to the respective author. Mendenhall employed people to count both the number of words and their length in works by English-language writers in order to correlate the results. The method itself is quite simple. You take two different works that could well be by one author. Words of the same length are counted in each. The Pearson’s correlation coefficient is then calculated from the series of sums.1

By the way, the range of values for r is − 1, 0 ≦ r ≦ 1, 0. Accordingly, r = 1 means the highest, r =− 1 the lowest agreement.

In 1901, Mendenhall discovered: "[…] that in the characteristic curve of his plays Christopher Marlowe agrees with Shakespeare about as well as Shakespeare agrees with himself, as is shown in Fig. 9."2

Mendenhall (1901), 105

Since then, stylometry has developed enormously and expanded to include some areas such as plagiarism checking or forensic linguistics. Mendenhall continues to enjoy great popularity among Marlovians, who like to calculate at home on the œuvre of Marlowe and William Shakespeare. I have tried to follow some of their calculations. When I kept getting different results, I decided to put the research method per se to the test. The result was: Shakespeare was more likely to have written the screenplay for The Usual Suspects than Romeo and Juliet. In other words, counting letters and words in drama texts downloaded from the internet using a word processor does not provide scientific knowledge. Whereby this is not quite correct. After hours of counting words, I can say with certainty: In its present form, The Massacre at Paris is Marlowe’s shortest drama, while Edward II is his longest. But a look at the concordance3 would have told me that as well. Forensic linguists and stylometrists such as Thomas Merriam or Hartmut Ilsemann rely on technically difficult instruments, the results of which they know how to handle within the framework of science. Nevertheless, I have fundamental problems with stylometry. My scepticism relates less to the method than to the actual usability of the results. George Coffin Taylor, for example, noted that "now" is often the first or second word in many of Marlowe’s verses. He attached so much importance to this peculiarity that he regarded it as a typical stylistic feature of the author and suggested that it be used to clarify authorship. Based on Taylor’s figures, the word is used in this capacity an average of 43.29 times in a Marlowe drama. In Shakespeare, the average is 29.35. However, none of Marlowe’s plays begins with the word, whereas three of Shakespeare’s do4. Furthermore, in six of Shakespeare’s plays5, which are said to have been written before or around 1593, "now" appears more than 40 times at the beginning of verse. Taylor is certainly correct in finding 303 verses of Marlowe that have a "now" in the beginning, but he could not deduce from this whether the use of this word was accidental, intentional or even Marlowe’s own doing.

"Perhaps the most fascinating, if puzzling, aspect of the matter is, whether the frequency of the occurrence of this now is due to Marlowe’s extreme haste in writing, his unconscious carelessness in the use of it, or whether it is due in part to the actor, Alleyn, being responsible for it. He may possibly […] introduced it into lines in which Marlowe never wrote it. He could make it either a monosyllable or a dissyllable, and by so doing have time to remember what followed."6

Blank verse was just beginning to establish itself in public theatre when Marlowe made use of it. Perhaps he sometimes had problems with the meter and was missing a syllable. Contrary to the stylometric theory of function words7 , Marlowe would have used the "now" quite deliberately in this case. This would also explain why there is a downward tendency in Shakespeare’s use. The more practice he had with the metre, the less he needed filler words. Equally, it may mean that Marlowe wrote the Shakespearean dramas, or vice versa, since Shakespeare’s works with the highest number of "nows" are dated earliest. Marlowe’s heroes waste no thought on tomorrow, they all live in the now. Possibly the accumulation of "nows" is meant to underline exactly that. Ultimately, the only provable inference from Taylor’s analysis is that 303 verses of Marlowe have a "now" in the beginning.
In my opinion, the literary production of an era in which ambivalence was a zeitgeist8, collaboration was commonplace, imitation was virtually demanded, analogy was reinforced over individualism, and subsequent alteration was not questioned9 , and whose textual transmission is highly complex, eludes statistical methods of investigation that regard stylistic accumulations as an unconscious expression of an author’s individuality.


Masten, Jeffrey. 1997. “Playwrighting: Authorship and Collaboration.” In A New History of Early English Drama, edited by John D. Cox, 357–82. New York: Columbia University Press.
Mendenhall, Thomas Corwin. 1887. “The Characteristic Curves of Composition.” Science 9 (214): 237–46. https://doi.org/10.1126/science.ns-9.214S.237.
———. 1901. “A Mechanical Solution of a Literary Problem.” Popular Science Monthly 60 (7): 97–105.
Mosteller, Frederick, and David L. Wallace. 1964. Inference and Disputed Authorship: The Federalist. Reading: Addison Wesley.
Patterson, Annabel M. 1991. Censorship and Interpretation: The Conditions of Writing and Reading in Early Modern England. 2nd Ed. Madison: University of Wisconsin Press.

  1. Mendenhall (1887)↩︎
  2. Mendenhall (1901), 105↩︎
  3. Ule (1979)↩︎
  4. Richard III, King John und A Midsummer Night’s Dream↩︎
  5. 2 Henry VI, 3 Henry VI, 1 Henry VI, Richard III, Titus Andronicus and The Taming of the Shrew↩︎
  6. Taylor (1945), 100↩︎
  7. Mosteller and Wallace (1964)↩︎
  8. Patterson (1991)↩︎
  9. Masten (1997)↩︎

Aktualisiert am 24.05.2024

Comments are closed.