Codex MondragonisBeinecke MS 408 · research

Beinecke MS 408 · Voynich Manuscript

The world's most mysterious manuscript, measured instead of guessed.

The Voynich is not deciphered, and this research does not claim to have done so. What it offers is a characterization of the data and a tool that tells you, for any hypothesis, whether it beats chance or not.

Try the validatorBrowse the manuscript

The parchment is radiocarbon-dated to between 1404 and 1438, from the Italo-Germanic Alpine area. The book's structure, a herbal, an astrological-medical calendar, a balneological section, a pharmacy with albarello jars, is the knowledge of an early-fifteenth-century physician-apothecary. We know how the text is built: what is missing, to read it, is an anchor to meaning that no one has ever supplied with proof.

1404–1438
Radiocarbon dating
Parchment, northern Italy
~2 bits
Conditional entropy h2
European languages sit at 3–4
184
Navigable pages
Of 213 Yale scans
21
Hypotheses tested
18 closed against the baseline

The method

Every hypothesis must be compared with chance

With a Latin dictionary of tens of thousands of short forms, almost any string of letters resembles a real word. A method that combines glyphs until a meaningful word appears finds plausible readings even in noise. We verified it: on page f111r, 38% of the words turn into Latin, against 35% for noise with the same statistics. Indistinguishable.

That is why every result on this site carries its baseline beside it: the score you would get by pure chance. A genuine decipherment is not a heap of separate readings, it is a single coherent rule that, applied blindly, makes many anchors work at once. That cross-constraint is what chance cannot imitate, and it is what the validator measures.

How the validator worksThe gallery of false positives

What we measured on real data

Low conditional entropy

At the glyph level h2 is about 2 bits, against the 3–4 of natural languages. It rules out a simple or polyalphabetic substitution cipher of a European language.

Vocabulary partitioned by topic

Overlap between sections (Jaccard) is 0.10–0.16. It lacks the glue of function words shared by every chapter that a transcribed language has.

The line as a functional unit

Word shape depends on the position on the line and paragraph. A simply transcribed language does not behave this way.

Slot-and-box structure

Glyphs occupy fixed positions: initial classes, final classes, prefixes. The internal grammar of the words can be reconstructed precisely.

Combinatorial generability

80 prefixes by 80 suffixes cover 87% of the words. This is what Cardan-grille model predicts.

No fixed key works

Mirroring, reversal, substitution, numbers, fixed-rule anagram: they are bijections, they do not change the entropy. For mathematical reasons they cannot yield Latin.

What remains open

Two serious hypotheses, both possible

An unusual natural language

Heavily abbreviated, syllabic or strongly inflected. The Voynich's positional rigidity (0.76) is close to that of real Latin (0.72). Zipf's law and the semantic networks per section are compatible with a real language.

A constructed system

An artificial language by categories, in the spirit of Hildegard's Lingua Ignota, or a text generated with a table and a mask. The 87% combinatorial coverage with 80 prefixes and 80 suffixes leans toward a generative mechanism.

The internal structure is compatible with both. What is missing to decide is an anchor: a confirmed word, a bilingual, a certain sound value. Hildegard left her glossary; the author of the Voynich did not.

The research

Diego «DMUX» De Maio and Simona «51m0» Fenoglio

Work carried out at ART AG (YurekAI), with reproducible tools and published data.

About us