Show HN: I modeled the Voynich Manuscript with SBERT to test for structure

I built this project as a way to learn more about NLP by applying it to something weird and unsolved.

The Voynich Manuscript is a 15th-century book written in an unknown script. No one’s been able to translate it, and many think it’s a hoax, a cipher, or a constructed language. I wasn’t trying to decode it — I just wanted to see: does it behave like a structured language?

I stripped a handful of common suffix-like endings (aiin, dy, etc.) to isolate what looked like root forms. I know that’s a strong assumption — I call it out directly in the repo — but it helped clarify the clustering. From there, I used SBERT embeddings and KMeans to group similar roots, inferred POS-like roles based on position and frequency, and built a Markov transition matrix to visualize cluster-to-cluster flow.

It’s not translation. It’s not decryption. It’s structural modeling — and it revealed some surprisingly consistent syntax across the manuscript, especially when broken out by section (Botanical, Biological, etc.).

GitHub repo: https://github.com/brianmg/voynich-nlp-analysis Write-up: https://brig90.substack.com/p/modeling-the-voynich-manuscrip...

I’m new to the NLP space, so I’m sure there are things I got wrong — but I’d love feedback from people who’ve worked with structured language modeling or weird edge cases like this.

Comments URL: https://news.ycombinator.com/item?id=44022353

Points: 127

# Comments: 27

https://github.com/brianmg/voynich-nlp-analysis

Creată 2mo | 18 mai 2025, 18:10:11

Autentifică-te pentru a adăuga comentarii

Alte posturi din acest grup

The Year of Peak Might and Magic

Article URL: https://www.filfre.net/2025/07/the-year-of-peak-might-and-magic/

Comments URL:

18 iul. 2025, 21:30:21 | Hacker news

Third patient dies from acute liver failure caused by a Sarepta gene therapy

Article URL: https://www.biocentury.com/article/656520/third-death-from-a-sarepta-gene-therapy

18 iul. 2025, 21:30:20 | Hacker news

How I keep up with AI progress

Article URL: https://blog.nilenso.com/blog/2025/06/23/how-i-keep-up-with-ai-progress/

Comments URL:

18 iul. 2025, 21:30:19 | Hacker news

Cancer DNA is detectable in blood years before diagnosis

Article URL: https://www.sciencenews.org/article/cancer-tumor-dna-blood-test-screening

Comments URL:

18 iul. 2025, 21:30:17 | Hacker news

Show HN: Molab, a cloud-hosted Marimo notebook workspace

We launched marimo [1], an open-source reactive Python notebook, last year on HackerNews. Today, the most popular recent feature request in Google Colab’s issue tracker asks for marimo support in

18 iul. 2025, 21:30:16 | Hacker news

Replication of Quantum Factorisation Records with a VIC-20, an Abacus, and a Dog

Article URL: https://eprint.iacr.org/2025/1237

Comments URL: https://news.ycombinator

18 iul. 2025, 21:30:15 | Hacker news

Asynchrony Is Not Concurrency

Article URL: https://kristoff.it/blog/asynchrony-is-not-concurrency/

Comments URL:

18 iul. 2025, 21:30:14 | Hacker news

Techie