I built this project as a way to learn more about NLP by applying it to something weird and unsolved.
The Voynich Manuscript is a 15th-century book written in an unknown script. No one’s been able to translate it, and many think it’s a hoax, a cipher, or a constructed language. I wasn’t trying to decode it — I just wanted to see: does it behave like a structured language?
I stripped a handful of common suffix-like endings (aiin, dy, etc.) to isolate what looked like root forms. I know that’s a strong assumption — I call it out directly in the repo — but it helped clarify the clustering. From there, I used SBERT embeddings and KMeans to group similar roots, inferred POS-like roles based on position and frequency, and built a Markov transition matrix to visualize cluster-to-cluster flow.
It’s not translation. It’s not decryption. It’s structural modeling — and it revealed some surprisingly consistent syntax across the manuscript, especially when broken out by section (Botanical, Biological, etc.).
GitHub repo: https://github.com/brianmg/voynich-nlp-analysis Write-up: https://brig90.substack.com/p/modeling-the-voynich-manuscrip...
I’m new to the NLP space, so I’m sure there are things I got wrong — but I’d love feedback from people who’ve worked with structured language modeling or weird edge cases like this.
Comments URL: https://news.ycombinator.com/item?id=44022353
Points: 127
# Comments: 27
Autentifică-te pentru a adăuga comentarii
Alte posturi din acest grup

We’ve been working on Vaev, a minimal web browser engine built from scratch. It supports HTML/XHTML, the CSS cascade, @page rules for pagination, and print-to-PDF rendering. It even handles calc()

Stack Error reduces the up-front cost of designing an error handling solution for your project, so that you focus on writing great libraries and applications.
Stack Error has three goals:
1. P
Article URL: https://telcontar.net/Misc/GUI/RISCOS/
Comments URL: https://news.y
Article URL: https://fabiensanglard.net/2168/index.html
Comments URL: https:

Hey HN! We’ve just open-sourced model2vec-rs, a Rust crate for loading and running Model2Vec static embedding models with zero Python dependency. This allows you to embed text at (very) high throu

Buckaroo is my open source project. It is a dataframe viewer that has the basic features we expect in a modern table - scroll, search, sort. In addition there are summary stats, and histograms ava