Show HN: I modeled the Voynich Manuscript with SBERT to test for structure

I built this project as a way to learn more about NLP by applying it to something weird and unsolved.

The Voynich Manuscript is a 15th-century book written in an unknown script. No one’s been able to translate it, and many think it’s a hoax, a cipher, or a constructed language. I wasn’t trying to decode it — I just wanted to see: does it behave like a structured language?

I stripped a handful of common suffix-like endings (aiin, dy, etc.) to isolate what looked like root forms. I know that’s a strong assumption — I call it out directly in the repo — but it helped clarify the clustering. From there, I used SBERT embeddings and KMeans to group similar roots, inferred POS-like roles based on position and frequency, and built a Markov transition matrix to visualize cluster-to-cluster flow.

It’s not translation. It’s not decryption. It’s structural modeling — and it revealed some surprisingly consistent syntax across the manuscript, especially when broken out by section (Botanical, Biological, etc.).

GitHub repo: https://github.com/brianmg/voynich-nlp-analysis Write-up: https://brig90.substack.com/p/modeling-the-voynich-manuscrip...

I’m new to the NLP space, so I’m sure there are things I got wrong — but I’d love feedback from people who’ve worked with structured language modeling or weird edge cases like this.


Comments URL: https://news.ycombinator.com/item?id=44022353

Points: 127

# Comments: 27

https://github.com/brianmg/voynich-nlp-analysis

Created 2mo | May 18, 2025, 6:10:11 PM


Login to add comment

Other posts in this group

Show HN: Modernized File Manager and Program Manager from Windows 3.x

This is a fork of Windows File Manager combined with a from-scratch remake of Program Manager. Fast, lightweight, and suitable for daily driver use.


Comments URL:

Jul 7, 2025, 4:50:09 AM | Hacker news
Show HN: A Language Server Implementation for SystemD Unit Files

A Language Server Protocol (LSP) implementation for systemd unit files, providing editing support with syntax highlighting, diagnostics, autocompletion, and documentation made with rust.


Jul 7, 2025, 4:50:08 AM | Hacker news