Show HN: I modeled the Voynich Manuscript with SBERT to test for structure

I built this project as a way to learn more about NLP by applying it to something weird and unsolved.

The Voynich Manuscript is a 15th-century book written in an unknown script. No one’s been able to translate it, and many think it’s a hoax, a cipher, or a constructed language. I wasn’t trying to decode it — I just wanted to see: does it behave like a structured language?

I stripped a handful of common suffix-like endings (aiin, dy, etc.) to isolate what looked like root forms. I know that’s a strong assumption — I call it out directly in the repo — but it helped clarify the clustering. From there, I used SBERT embeddings and KMeans to group similar roots, inferred POS-like roles based on position and frequency, and built a Markov transition matrix to visualize cluster-to-cluster flow.

It’s not translation. It’s not decryption. It’s structural modeling — and it revealed some surprisingly consistent syntax across the manuscript, especially when broken out by section (Botanical, Biological, etc.).

GitHub repo: https://github.com/brianmg/voynich-nlp-analysis Write-up: https://brig90.substack.com/p/modeling-the-voynich-manuscrip...

I’m new to the NLP space, so I’m sure there are things I got wrong — but I’d love feedback from people who’ve worked with structured language modeling or weird edge cases like this.


Comments URL: https://news.ycombinator.com/item?id=44022353

Points: 127

# Comments: 27

https://github.com/brianmg/voynich-nlp-analysis

Erstellt 3h | 18.05.2025, 18:10:11


Melden Sie sich an, um einen Kommentar hinzuzufügen

Andere Beiträge in dieser Gruppe

Show HN: Vaev – A browser engine built from scratch (It renders google.com)

We’ve been working on Vaev, a minimal web browser engine built from scratch. It supports HTML/XHTML, the CSS cascade, @page rules for pagination, and print-to-PDF rendering. It even handles calc()

18.05.2025, 20:30:03 | Hacker news
Show HN: Stack Error – ergonomic error handling for Rust

Stack Error reduces the up-front cost of designing an error handling solution for your project, so that you focus on writing great libraries and applications.

Stack Error has three goals:

1. P

18.05.2025, 20:30:02 | Hacker news
Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust

Hey HN! We’ve just open-sourced model2vec-rs, a Rust crate for loading and running Model2Vec static embedding models with zero Python dependency. This allows you to embed text at (very) high throu

18.05.2025, 18:10:16 | Hacker news
Show HN: Buckaroo – Data table UI for Notebooks

Buckaroo is my open source project. It is a dataframe viewer that has the basic features we expect in a modern table - scroll, search, sort. In addition there are summary stats, and histograms ava

18.05.2025, 18:10:13 | Hacker news