Beinecke MS 408 · Voynich Manuscript
The world's most mysterious manuscript, measured instead of guessed.
The Voynich is not deciphered, and this research does not claim to have done so. What it offers is a characterization of the data and a tool that tells you, for any hypothesis, whether it beats chance or not.
The parchment is radiocarbon-dated to between 1404 and 1438, from the Italo-Germanic Alpine area. The book's structure, a herbal, an astrological-medical calendar, a balneological section, a pharmacy with albarello jars, is the knowledge of an early-fifteenth-century physician-apothecary. We know how the text is built: what is missing, to read it, is an anchor to meaning that no one has ever supplied with proof.
The method
Every hypothesis must be compared with chance
With a Latin dictionary of tens of thousands of short forms, almost any string of letters resembles a real word. A method that combines glyphs until a meaningful word appears finds plausible readings even in noise. We verified it: on page f111r, 38% of the words turn into Latin, against 35% for noise with the same statistics. Indistinguishable.
That is why every result on this site carries its baseline beside it: the score you would get by pure chance. A genuine decipherment is not a heap of separate readings, it is a single coherent rule that, applied blindly, makes many anchors work at once. That cross-constraint is what chance cannot imitate, and it is what the validator measures.
What we measured on real data
Low conditional entropy
At the glyph level h2 is about 2 bits, against the 3–4 of natural languages. It rules out a simple or polyalphabetic substitution cipher of a European language.
Vocabulary partitioned by topic
Overlap between sections (Jaccard) is 0.10–0.16. It lacks the glue of function words shared by every chapter that a transcribed language has.
The line as a functional unit
Word shape depends on the position on the line and paragraph. A simply transcribed language does not behave this way.
Slot-and-box structure
Glyphs occupy fixed positions: initial classes, final classes, prefixes. The internal grammar of the words can be reconstructed precisely.
Combinatorial generability
80 prefixes by 80 suffixes cover 87% of the words. This is what Cardan-grille model predicts.
No fixed key works
Mirroring, reversal, substitution, numbers, fixed-rule anagram: they are bijections, they do not change the entropy. For mathematical reasons they cannot yield Latin.
What remains open
Two serious hypotheses, both possible
An unusual natural language
Heavily abbreviated, syllabic or strongly inflected. The Voynich's positional rigidity (0.76) is close to that of real Latin (0.72). Zipf's law and the semantic networks per section are compatible with a real language.
A constructed system
An artificial language by categories, in the spirit of Hildegard's Lingua Ignota, or a text generated with a table and a mask. The 87% combinatorial coverage with 80 prefixes and 80 suffixes leans toward a generative mechanism.
The internal structure is compatible with both. What is missing to decide is an anchor: a confirmed word, a bilingual, a certain sound value. Hildegard left her glossary; the author of the Voynich did not.
The research
Diego «DMUX» De Maio and Simona «51m0» Fenoglio
Work carried out at ART AG (YurekAI), with reproducible tools and published data.