Can You Tell an Author’s Identity By Looking at Punctuation Alone? A Study Just Found Out.
In 2016, neuroscientist Adam J Calhoun wondered what his favorite books would look like if he removed the words and left nothing but the punctuation. The result was a stunning—and surprisingly beautiful—visual stream of commas, question marks, semicolons, em-dashes, and periods.
Recently, Calhoun’s inquiry piqued the interest of researchers in the United Kingdom, who wondered if it was possible to identify an author from his or her punctuation alone.
For decades, linguists have been able to use the quirks of written texts to pinpoint the author. The process, called stylometric analysis or stylometry, has dozens of legal and academic applications, helping researchers authenticate anonymous works of literature and even nab criminals like the Unabomber. But it usually focuses on an author's word choices and grammar or the length of his or her sentences. Until now, punctuation has been largely ignored.
But according to a recent paper led by Alexandra N. M. Darmon of the Oxford Centre for Industrial and Applied Mathematics, an author’s use of punctuation can be extremely revealing. Darmon’s team assembled nearly 15,000 documents from 651 different authors and “de-worded” each text. “Is it possible to distinguish literary genres based on their punctuation sequences?” the researchers asked. “Do the punctuation styles of authors evolve over time?”
Apparently, yes. The researchers crafted mathematical formulas that could identify individual authors with 72 percent accuracy. Their ability to detect a specific genre—from horror to philosophy to detective fiction—was accurate more than half the time, clocking in at a 65 percent success rate.
The results, published on the preprint server SocArXiv, also revealed how punctuation style has evolved. The researchers found that “the use of quotation marks and periods has increased over time (at least in our [sample]) but that the use of commas has decreased over time. Less noticeably, the use of semicolons has also decreased over time.”
You probably don’t need to develop a powerful algorithm to figure that last bit out—you just have to crack open something by Dickens.